Regex.h
Classes
- Regex -- Regular expression class (full description)
Interface
- Public Members
- Regex()
- Regex(const String &exp, Bool fast = False, Int sz = 40, const Char *translation = 0)
- Regex(const Regex &that)
- virtual ~Regex()
- Regex &operator=(const Regex &that)
- Regex &operator=(const String &strng)
- static String fromPattern(const String &pattern)
- static String fromSQLPattern(const String &pattern)
- static String fromString(const String &strng)
- const String ®exp() const
- const Char *transtable() const
- virtual String::size_type match(const Char *s, String::size_type len, String::size_type pos=0) const
- virtual String::size_type search(const Char *s, String::size_type len, Int &matchlen, Int pos=0) const
- virtual String::size_type find(const Char *s, String::size_type len, Int &matchlen, String::size_type pos=0) const
- Int match_info(Int& start, Int& length, Int nth = 0) const
- Bool OK() const
- friend ostream &operator<<(ostream &ios, const Regex &exp)
- Protected Members
- void create(const String&, Int, Int, const Char*)
- void dealloc()
Review Status
- Reviewed By:
- Friso Olnon
- Date Reviewed:
- 1995/03/20
- Programs:
- Tests:
Synopsis
This class provides regular expression functionality, such as
matching and searching in strings, comparison of expressions, and
input/output. It is built on the regular expression functions in the
GNU library (see files cregex.h and cregex.cc).
cregex.cc supports many syntaxes. Regex supports
only one syntax, the extended regular expression with { and not \{
as a special character. The special characters are:
- ^
- matches the beginning of a line.
- $
- matches the end of a line.
- .
- matches any character
- *
- zero or more times the previous subexpression.
- +
- one or more times the previous subexpression.
- ?
- zero or one time the previous subexpression.
- {n,m}
- interval operator to specify how many times a subexpression
can match. See man page of egrep or regexp for more detail.
- []
- matches any character inside the brackets; e.g. [abc].
A hyphen can be used for a character range; e.g. [a-z].
A ^ right after the opening bracket indicates "not";
e.g. [^abc] means any character but a, b, and c.
If ^ is not the first character, it is a literal caret.
If - is the last character, it is a literal hyphen.
If ] is the first character, it is a literal closing bracket.
Special character classes are
[:alpha:], [:upper:], [:lower:], [:digit:], [:alnum:], [:xdigit:],
[:space:], [:print:], [:punct:], [:graph:], and [:cntrl:].
The brackets are part of the name; e.g.
[^[:upper:]] is equal to [^A-Z].
Note that [:upper:] is more portable, because A-Z fails
for the EBCDIC character set.
- ( )
- grouping to change the normal operator precedence.
- |
- or operator. Matches left side or right side.
Special characters have to be escaped with a backslash to use them
literally. Only inside the square brackets, escaping should not be done.
See the man page of egrep or regexp for more information about
regular expressions.
Several global Regex objects are predefined for common functionality.
- RXwhite
- one or more whitespace characters
- RXint
- integer number (also negative)
- RXdouble
- double number (with e or E as exponent)
- RXalpha
- one or more alphabetic characters (lowercase and/or uppercase)
- RXlowercase
- lowercase alphabetic
- RXuppercase
- uppercase alphabetic
- RXalphanum
- one or more alphabetic/numeric characters (lowercase and/or uppercase)
- RXidentifier
- identifier name (first alphabetic or underscore, then zero or
more alphanumeric and/or underscores
The static member function fromPattern converts a shell-like
pattern to a String which can be used to create a Regex from it.
A pattern has the following special characters:
- *
- Zero or more arbitrary characters.
- ?
- One arbitrary character
- []
- The same as [] in a regular expression (see above).
In addition to ^ a ! can be used to indicate "not".
- {,}
- A brace expression which is like brace expansion in some shells.
It is similar to the | construct in a regular expression.
E.g. {abc,defg} means abc or defg.
Brace expressions can be nested and can contain other
special characters.
E.g. St{Man*.{h,cc},Col?*.{h,cc,l,y}}
A literal comma or brace in a brace expression can be given by
escaping it with a backslash.
The static member function fromSQLPattern converts an SQL-like
pattern to a String which can be used to create a Regex from it.
A pattern has the following special characters:
- %
- Zero or more arbitrary characters.
- _
- One arbitrary character
The static member function fromString converts a normal
string to a regular expression. This function escapes characters in
the string which are special in a regular expression. In this way a
normal string can be passed to a function taking a regular expression.
Example
Regex RXwhite("[ \n\t\r\v\f]+", 1);
(blank, newline, tab, return, vertical tab, formfeed)
Regex RXint("-?[0-9]+", 1);
Regex RXdouble("-?(([0-9]+\\.[0-9]*)|([0-9]+)|(\\.[0-9]+))([eE][+-]?[0-9]+)?", 1, 200);
Regex RXalpha("[A-Za-z]+", 1);
Regex RXlowercase("[a-z]+", 1);
Regex RXuppercase("[A-Z]+", 1);
Regex RXalphanum("[0-9A-Za-z]+", 1);
Regex RXidentifier("[A-Za-z_][A-Za-z0-9_]*", 1);
In RXdouble the . is escaped via a backslash to get it literally.
The second backslash is needed to escape the backslash in C++.
Regex rx1 (Regex::fromPattern ("St*.{h,cc}");
results in regexp "St.*\.((h)|(cc))"
Regex rx2 (Regex::fromString ("tRegex.cc");
results in regexp "tRegex\.cc"
To Do
- Let sgi ifdef go
- Decide on documentation of GNU stuff (cregex.h, cregex.cc)
Member Description
Default constructor uses a zero-length regular expression.
Thrown Exceptions
Regex(const String &exp, Bool fast = False, Int sz = 40, const Char *translation = 0)
Construct a regular expression.
Optionally a fast map can be created, a buffer size can be given
and a translation table (of 256 chars) can be applied.
The translation table can, for instance, be used to map
lowercase characters to uppercase.
See cregex.cc (the extended regular expression matching and search
library) for detailed information.
Thrown Exceptions
Regex(const Regex &that)
Copy constructor (copy semantics).
Thrown Exceptions
Regex &operator=(const Regex &that)
Regex &operator=(const String &strng)
Assignment (copy semantics).
Thrown Exceptions
Convert a shell-like pattern to a regular expression.
This is useful for people who are more familiar with patterns
than with regular expressions.
Convert an SQL-like pattern to a regular expression.
This is useful TaQL which mimics SQL.
Convert a normal string to a regular expression.
This consists of escaping the special characters.
This is useful when one wants to provide a normal string
(which may contain special characters) to a function working
on regular expressions.
Get the regular expression string.
Get the translation table (can be a zero pointer).
virtual String::size_type match(const Char *s, String::size_type len, String::size_type pos=0) const
Test if the regular expression matches (part of) string s.
The return value gives the length of the matching string part,
or String::npos if there is no match or an error.
The string has len characters and the test starts at
position pos. The string may contain null characters.
Negative p is allowed to match at end.
Use the appropriate String functions
to test if a string matches a regular expression.
Regex::match is pretty low-level.
virtual String::size_type search(const Char *s, String::size_type len, Int &matchlen, Int pos=0) const
virtual String::size_type find(const Char *s, String::size_type len, Int &matchlen, String::size_type pos=0) const
Test if the regular expression occurs in string s.
The return value gives the position of the first substring
matching the regular expression. The length of that substring
is returned in matchlen.
The string has len characters and the test starts at
position pos. The string may contain null characters.
The search will do a reverse search if the pos given is less than 0.
Use the appropriate String functions
to test if a regular expression occurs in a string.
Regex::search is pretty low-level.
Int match_info(Int& start, Int& length, Int nth = 0) const
Return some internal cregex info.
Bool OK() const
Does it contain a valid Regex?
friend ostream &operator<<(ostream &ios, const Regex &exp)
Write as ASCII.
void create(const String&, Int, Int, const Char*)
internal reg.exp. stuff
Compile the regular expression
Thrown Exceptions
Deallocate the stuff allocated by create.