Newbie dot Org HomePage
Visit one of our web buddies
Xxaxx's Xperimints #2
Regular Expressions

"Regular expressions" look like Greek or some other totally incomprehensible language. I'm told they are actually reasonably simple, once you get the hang of them. Well, I can tell you that in the beginning they look like a cross between gibberish and magical incantations. Hopefully this tutorial will spread some light on the subject.

Regular Expressions are used for "matching". This is all part of the process of finding stuff that contains something or is equal to something.

Any character that isn't a special character mentioned below matches itself. This includes all letters and numbers, and some punctuation.

For example:

Newbie -- matches the string "Newbie". It's important to realize that this is case-sensitive; this expression won't match "newbie". Example

One of the first special characters is the dot '.' A dot '.' matches any character except a newline. So, the expression

.ewbie -- will match the string "Newbie", and "Rewbie" and "Sewbie", etc. The program doesn't care if it makes sense. It just looks for a match. Example

The next special characters is the star '*' Any character followed by a star '*' matches that character repeated 0 or more times. Thus,

N*ewbie -- matches "Newbie", "NNewbie", or "NNNNNewbie", or "ewbie". Example

One special form called dot-star ".*" matches any number of unspecified characters. This is a totally useful expression and you will find it as part of many expressions. Example

Related to "*" is the "+" and "?" special characters.

The plus "+" will match one or more of the preceding character.

N+ewbie -- matches "Newbie", "NNewbie", or "NNNNNewbie". "ewbie" is not matched. Example

The question mark "?" will match one or none of the preceding character -- not multiple.
N?ewbie -- matches "Newbie" or "ewbie". "NNewbie" is not matched. Example

o* Zero or more o:s (i.e., "" or "o" or "oo" or "ooo" or "oooo" or ...)
o+ One or more o:s (i.e., "o" or "oo" or "ooo" or "oooo" or...)
o? Zero or one o:s (i.e., "" or "o")

You can use parentheses to group an expression for use with a modifier. So, the expression

N(ew)+bie -- matches "Newbie", "Newewbie", "Newewewbie", etc. Example

If one character in a pattern could be one of several, you can use a character class. This is defined using the [ and ]. For example:

N[aeiou]wbie -- matches "Nawbie", "Newbie", "Niwbie", "Nowbie", "Nuwbie". Example

A special case of the [ and ] character class definition is created by using '^' as the first character of a class.

For example:

N[^aeiou]wbie -- matches "Nbwbie", "Ncwbie", "Ndwbie", etc. as long as it is NOT a e i o or u. Example

You can combine the class definition with the multiplier thing to get something like:

[aeiou]+ -- matches any series of one or more vowel characters. Such as aeeiaouaaeuui. Example

[^aeiou]+ -- matches any series of one or more non-vowel characters. Such as jjdklmnw. Example

There is one more extremely important special character -- the '|' (vertical-bar) character. It is used to match either of two expressions. For example:

Newbie|Oldbie -- will match "Newbie" or "Oldbie" Example.

When not used inside a class definition [], the "^" indicates the beginning of a line. (I guess they didn't figure this was confusing enough.).

The '$' indicates the end of a line.

For example:

^Newbie -- matches Newbie at the beginning of a line. Example

Newbie$ -- matches Newbie at the end of a line Example

^Newbie.*Newbie$ -- matches a line with Newbie at the beginning and at the end. Example