Pattern matching

From MorphOS Library

Revision as of 11:06, 15 December 2009 by Krashan (talk | contribs) (Notation)

Introduction

String pattern matching is comparing a text string with a pattern. The pattern uses specific notation to specify matching criteria. A wery simple and well known form of pattern matching are MS-DOS wildchars: "?" and "*". The most common usage of pattern matching in computers is matching files for shell commands. Of course MorphOS uses pattern matching in its shell, but it is also used in file requesters. Many applications use pattern matching to filter information presented, for example Snoopium and MediaLogger match names of logged processes against pattern specified by user.

MorphOS uses its own pattern notation, inherited from Amiga OS. While not as powerful as Unix regular expressions, it is easier to read and understand. On the other hand MorphOS pattern matching is more flexible than wildchars. Let's look at some example:

delete "the #? - #?come#?.(mp3|aiff|wav)"

This command will delete all files with "mp3", "aiff" or "wav" extension, starting with "the" as a single word and having "come" string (maybe as a part of word) anywhere after a hyphen surrounded by spaces. While this example may look complicated, it shows the power of MorphOS pattern matching. It will be clean and understandable after reading explanation of MorphOS pattern notation. For now let's state that the whole pattern has been doublequoted only because it has spaces inside.

Notation

  • "?" (question mark) - matches one and exactly one character. For example "????" matches all 4-letter strings. "t??" matches all three-letter strings starting with "t".
  • "#" (hash) - this is repetition operator. Matches expression standing on the right side, repeated zero or more times. For example "#a" matches any number of letters "a", but also matches an empty string. To match at least one "a", the pattern should be denoted as "a#a". A very common pattern "#?" matches any string, so delete #? is an equivalent to unix rm *.
  • "()" (parentheses) - used for changing priority for other operators, as in math. For example "a#?a" matches any string starting and ending with "a", but "a#(?a)" matches any string in which every odd character is "a", and every even character is anything. More examples will be given below.
  • "|" (vertical bar) - means alternative. For example "(a|b)#?" matches any string starting with "a" or "b". "#?(cat|dog)" matches any string ending with "cat" or with "dog". A typical example is matching a set of file extenstions like in the example in introduction. Another more simple example: "#?.(txt|doc|rtf)" matches names ended with any of three document extensions.
  • "~" (tilde) - means negation, may be read "all except of" expression standing on the right. For example "~a#?" matches all strings not starting with "a". Similarly "~(foo)#?" matches all strings not starting with "foo". Note the usage of parentheses, without them "~foo#? will match all strings not starting with "f", but having "o" as the second and the third character.