Difference between revisions of "Pattern matching"

From MorphOS Library

m (Added some commas, replaced "wildchar" with wildcard character, fixed some spelling issues, rephrased a few small parts)
Line 1: Line 1:
 
==Introduction==
 
==Introduction==
  
String pattern matching is comparing a text string with a pattern. The pattern uses specific notation to specify matching criteria. A wery simple and well known form of pattern matching are MS-DOS wildchars: "?" and "*". The most common usage of pattern matching in computers is matching files for shell commands. Of course MorphOS uses pattern matching in its shell, but it is also used in [[Requesters#File_Requesters|file requesters]]. Many applications use pattern matching to filter information presented, for example [[Snoopium]] and [[MediaLogger]] match names of logged processes against pattern specified by user.
+
String pattern matching is comparing a text string with a pattern. The pattern uses specific notation to specify matching criteria. A very simple and well known form of pattern matching are MS-DOS wildcard characters "?" and "*". The most common usage of pattern matching in computers is matching files for shell commands. Of course, MorphOS uses pattern matching in its shell, but it is also used in [[Requesters#File_Requesters|file requesters]]. Many applications use pattern matching to filter information presented. For example, [[Snoopium]] and [[MediaLogger]] match names of logged processes against user-specified patterns.
  
MorphOS uses its own pattern notation, inherited from Amiga OS. While not as powerful as Unix regular expressions, it is easier to read and understand. On the other hand MorphOS pattern matching is more flexible than wildchars. Let's look at some example:
+
MorphOS uses its own pattern notation, inherited from Amiga OS. While not as powerful as Unix regular expressions, it is easier to read and understand. On the other hand, MorphOS pattern matching is more flexible than wildcard characters. Here is an example:
  
 
<tt>delete "the #? - #?come#?.(mp3|aiff|wav)"</tt>
 
<tt>delete "the #? - #?come#?.(mp3|aiff|wav)"</tt>
  
This command will delete all files with "mp3", "aiff" or "wav" extension, starting with "the" as a single word and having "come" string (maybe as a part of word) anywhere after a hyphen surrounded by spaces. While this example may look complicated, it shows the power of MorphOS pattern matching. It will be clean and understandable after reading explanation of MorphOS pattern notation. For now let's state that the whole pattern has been doublequoted only because it has spaces inside.
+
This command will delete all files with "mp3", "aiff" or "wav" extension, starting with "the" as a single word and having the string "come" (as a word or part of a word) anywhere after a hyphen surrounded by spaces. While this example may look complicated on first sight, it shows the power of MorphOS pattern matching. It will be clean and understandable after reading the following description of MorphOS pattern notation. For now, let's state that the whole pattern has been double-quoted only because it has spaces inside.
  
 
==Notation==
 
==Notation==
Line 13: Line 13:
 
* '''"?"''' (question mark)  - matches one and exactly one character. For example "????" matches all 4-letter strings. "t??" matches all three-letter strings starting with "t".
 
* '''"?"''' (question mark)  - matches one and exactly one character. For example "????" matches all 4-letter strings. "t??" matches all three-letter strings starting with "t".
  
* '''"#"''' (hash) - this is repetition operator. Matches expression standing on the right side, repeated zero or more times. For example "#a" matches any number of letters "a", but also matches an empty string. To match at least one "a", the pattern should be denoted as "a#a". A very common pattern "#?" matches any string, so<tt> delete #? </tt>is an equivalent to unix<tt> rm *</tt>.
+
* '''"#"''' (hash) - this is a repetition operator. Matches expression standing on the right side, repeated zero or more times. For example, "#a" matches any number of letters "a", but also matches an empty string. To match at least one "a", the pattern should be denoted as "a#a". A very common pattern "#?" matches any string, so<tt> delete #? </tt>is an equivalent to unix<tt> rm *</tt>.
  
* '''"()"''' (parentheses) - used for changing priority for other operators, as in math. For example "a#?a" matches any string starting and ending with "a", but "a#(?a)" matches any string in which every odd character is "a", and every even character is anything. More examples will be given below.
+
* '''"()"''' (parentheses) - used for changing priority for other operators, as in math. For example "a#?a" matches any string starting and ending with "a", but "a#(?a)" matches any string in which every odd character is "a", and every even character is anything. More examples can be found below.
  
* '''"|"''' (vertical bar) - means alternative. For example "(a|b)#?" matches any string starting with "a" or "b". "#?(cat|dog)" matches any string ending with "cat" or with "dog". A typical example is matching a set of file extenstions like in the example in [[#Introduction|introduction]]. Another more simple example: "#?.(txt|doc|rtf)" matches names ended with any of three document extensions.
+
* '''"|"''' (vertical bar) - means alternative. For example, "(a|b)#?" matches any string starting with "a" or "b". "#?(cat|dog)" matches any string ending with "cat" or with "dog". A typical example is matching a set of file extenstions like in the example in [[#Introduction|introduction]]. An even simpler example is this: "#?.(txt|doc|rtf)" matches names ended with any of three document extensions.
  
* '''"~"''' (tilde) - means negation, may be read "all except of" expression standing on the right. For example "~a#?" matches all strings not starting with "a". Similarly "~(foo)#?" matches all strings not starting with "foo". Note the usage of parentheses, without them "~foo#? will match all strings not starting with "f", but having "o" as the second and the third character.
+
* '''"~"''' (tilde) - means negation, may be read as an "all except of" expression standing on the right. For example "~a#?" matches all strings not starting with "a". Similarly "~(foo)#?" matches all strings not starting with "foo". Note the usage of parentheses, without them "~foo#? will match all strings not starting with "f", but having "o" as the second and the third character.
  
 
==Examples==
 
==Examples==

Revision as of 11:24, 15 December 2009

Introduction

String pattern matching is comparing a text string with a pattern. The pattern uses specific notation to specify matching criteria. A very simple and well known form of pattern matching are MS-DOS wildcard characters "?" and "*". The most common usage of pattern matching in computers is matching files for shell commands. Of course, MorphOS uses pattern matching in its shell, but it is also used in file requesters. Many applications use pattern matching to filter information presented. For example, Snoopium and MediaLogger match names of logged processes against user-specified patterns.

MorphOS uses its own pattern notation, inherited from Amiga OS. While not as powerful as Unix regular expressions, it is easier to read and understand. On the other hand, MorphOS pattern matching is more flexible than wildcard characters. Here is an example:

delete "the #? - #?come#?.(mp3|aiff|wav)"

This command will delete all files with "mp3", "aiff" or "wav" extension, starting with "the" as a single word and having the string "come" (as a word or part of a word) anywhere after a hyphen surrounded by spaces. While this example may look complicated on first sight, it shows the power of MorphOS pattern matching. It will be clean and understandable after reading the following description of MorphOS pattern notation. For now, let's state that the whole pattern has been double-quoted only because it has spaces inside.

Notation

  • "?" (question mark) - matches one and exactly one character. For example "????" matches all 4-letter strings. "t??" matches all three-letter strings starting with "t".
  • "#" (hash) - this is a repetition operator. Matches expression standing on the right side, repeated zero or more times. For example, "#a" matches any number of letters "a", but also matches an empty string. To match at least one "a", the pattern should be denoted as "a#a". A very common pattern "#?" matches any string, so delete #? is an equivalent to unix rm *.
  • "()" (parentheses) - used for changing priority for other operators, as in math. For example "a#?a" matches any string starting and ending with "a", but "a#(?a)" matches any string in which every odd character is "a", and every even character is anything. More examples can be found below.
  • "|" (vertical bar) - means alternative. For example, "(a|b)#?" matches any string starting with "a" or "b". "#?(cat|dog)" matches any string ending with "cat" or with "dog". A typical example is matching a set of file extenstions like in the example in introduction. An even simpler example is this: "#?.(txt|doc|rtf)" matches names ended with any of three document extensions.
  • "~" (tilde) - means negation, may be read as an "all except of" expression standing on the right. For example "~a#?" matches all strings not starting with "a". Similarly "~(foo)#?" matches all strings not starting with "foo". Note the usage of parentheses, without them "~foo#? will match all strings not starting with "f", but having "o" as the second and the third character.

Examples

???#? matches all strings having at least three characters.

#(0|1|2|3|4|5|6|7|8|9) matches only numbers.