casa::Regex Class Reference
[Utilities]

#include <Regex.h>

Inheritance diagram for casa::Regex:

Inheritance graph
[legend]
Collaboration diagram for casa::Regex:

Collaboration graph
[legend]
List of all members.

Detailed Description

Regular expression class.

Intended use:

Part of API

Review Status

Reviewed By:
Friso Olnon
Date Reviewed:
1995/03/20
Test programs:
tRegex

Synopsis

This class provides regular expression functionality, such as matching and searching in strings, comparison of expressions, and input/output. It is built on the regular expression functions in the GNU library (see files cregex.h and cregex.cc).
cregex.cc supports many syntaxes. Regex supports only one syntax, the extended regular expression with { and not \{ as a special character. The special characters are:

^
matches the beginning of a line.
$
matches the end of a line.
.
matches any character
*
zero or more times the previous subexpression.
+
one or more times the previous subexpression.
?
zero or one time the previous subexpression.
{n,m}
interval operator to specify how many times a subexpression can match. See man page of egrep or regexp for more detail.
[]
matches any character inside the brackets; e.g. [abc]. A hyphen can be used for a character range; e.g. [a-z].
A ^ right after the opening bracket indicates "not"; e.g. [^abc] means any character but a, b, and c. If ^ is not the first character, it is a literal caret. If - is the last character, it is a literal hyphen. If ] is the first character, it is a literal closing bracket.
Special character classes are [:alpha:], [:upper:], [:lower:], [:digit:], [:alnum:], [:xdigit:], [:space:], [:print:], [:punct:], [:graph:], and [:cntrl:]. The brackets are part of the name; e.g. [^[:upper:]] is equal to [^A-Z]. Note that [:upper:] is more portable, because A-Z fails for the EBCDIC character set.
( )
grouping to change the normal operator precedence.
|
or operator. Matches left side or right side.
Special characters have to be escaped with a backslash to use them literally. Only inside the square brackets, escaping should not be done. See the man page of egrep or regexp for more information about regular expressions.

Several global Regex objects are predefined for common functionality.

RXwhite
one or more whitespace characters
RXint
integer number (also negative)
RXdouble
double number (with e or E as exponent)
RXalpha
one or more alphabetic characters (lowercase and/or uppercase)
RXlowercase
lowercase alphabetic
RXuppercase
uppercase alphabetic
RXalphanum
one or more alphabetic/numeric characters (lowercase and/or uppercase)
RXidentifier
identifier name (first alphabetic or underscore, then zero or more alphanumeric and/or underscores
The static member function fromPattern converts a shell-like pattern to a String which can be used to create a Regex from it. A pattern has the following special characters:
*
Zero or more arbitrary characters.
?
One arbitrary character
[]
The same as [] in a regular expression (see above). In addition to ^ a ! can be used to indicate "not".
{,}
A brace expression which is like brace expansion in some shells. It is similar to the | construct in a regular expression.
E.g. {abc,defg} means abc or defg. Brace expressions can be nested and can contain other special characters.
E.g. St{Man*.{h,cc},Col?*.{h,cc,l,y}}
A literal comma or brace in a brace expression can be given by escaping it with a backslash.
The static member function fromSQLPattern converts an SQL-like pattern to a String which can be used to create a Regex from it. A pattern has the following special characters:
%
Zero or more arbitrary characters.
_
One arbitrary character
The static member function fromString converts a normal string to a regular expression. This function escapes characters in the string which are special in a regular expression. In this way a normal string can be passed to a function taking a regular expression.

Example

    Regex RXwhite("[ \n\t\r\v\f]+", 1);
           (blank, newline, tab, return, vertical tab, formfeed)
    Regex RXint("-?[0-9]+", 1);
    Regex RXdouble("-?(([0-9]+\\.[0-9]*)|([0-9]+)|(\\.[0-9]+))([eE][+-]?[0-9]+)?", 1, 200);
    Regex RXalpha("[A-Za-z]+", 1);
    Regex RXlowercase("[a-z]+", 1);
    Regex RXuppercase("[A-Z]+", 1);
    Regex RXalphanum("[0-9A-Za-z]+", 1);
    Regex RXidentifier("[A-Za-z_][A-Za-z0-9_]*", 1);
In RXdouble the . is escaped via a backslash to get it literally. The second backslash is needed to escape the backslash in C++.
    Regex rx1 (Regex::fromPattern ("St*.{h,cc}");
               results in regexp "St.*\.((h)|(cc))"
    Regex rx2 (Regex::fromString ("tRegex.cc");
               results in regexp "tRegex\.cc"

To Do

Definition at line 189 of file Regex.h.

Public Member Functions

 Regex ()
 Default constructor uses a zero-length regular expression.
 Regex (const String &exp, Bool fast=False, Int sz=40, const Char *translation=0)
 Construct a regular expression.
 Regex (const Regex &that)
 Copy constructor (copy semantics).
virtual ~Regex ()
const Stringregexp () const
 Get the regular expression string.
const Chartranstable () const
 Get the translation table (can be a zero pointer).
virtual String::size_type match (const Char *s, String::size_type len, String::size_type pos=0) const
 Test if the regular expression matches (part of) string s.
Int match_info (Int &start, Int &length, Int nth=0) const
 Return some internal cregex info.
Bool OK () const
 Does it contain a valid Regex?
Regexoperator= (const Regex &that)
 Assignment (copy semantics).
Regexoperator= (const String &strng)
virtual String::size_type search (const Char *s, String::size_type len, Int &matchlen, Int pos=0) const
 Test if the regular expression occurs in string s.
virtual String::size_type find (const Char *s, String::size_type len, Int &matchlen, String::size_type pos=0) const
 Search string s of length len, starting at position pos.

Static Public Member Functions

static String fromPattern (const String &pattern)
 Convert a shell-like pattern to a regular expression.
static String fromSQLPattern (const String &pattern)
 Convert an SQL-like pattern to a regular expression.
static String fromString (const String &strng)
 Convert a normal string to a regular expression.

Protected Member Functions

void create (const String &, Int, Int, const Char *)
 Compile the regular expression.
void dealloc ()
 Deallocate the stuff allocated by create.

Protected Attributes

Stringstr
Int fastval
Int bufsz
Chartrans
re_pattern_bufferbuf
re_registersreg

Friends

ostream & operator<< (ostream &ios, const Regex &exp)
 Write as ASCII.


Constructor & Destructor Documentation

casa::Regex::Regex (  ) 

Default constructor uses a zero-length regular expression.

Thrown Exceptions

casa::Regex::Regex ( const String exp,
Bool  fast = False,
Int  sz = 40,
const Char translation = 0 
)

Construct a regular expression.

Optionally a fast map can be created, a buffer size can be given and a translation table (of 256 chars) can be applied. The translation table can, for instance, be used to map lowercase characters to uppercase. See cregex.cc (the extended regular expression matching and search library) for detailed information.

Thrown Exceptions

casa::Regex::Regex ( const Regex that  ) 

Copy constructor (copy semantics).

Thrown Exceptions

virtual casa::Regex::~Regex (  )  [virtual]


Member Function Documentation

Regex& casa::Regex::operator= ( const Regex that  ) 

Assignment (copy semantics).

Thrown Exceptions

Regex& casa::Regex::operator= ( const String strng  ) 

static String casa::Regex::fromPattern ( const String pattern  )  [static]

Convert a shell-like pattern to a regular expression.

This is useful for people who are more familiar with patterns than with regular expressions.

static String casa::Regex::fromSQLPattern ( const String pattern  )  [static]

Convert an SQL-like pattern to a regular expression.

This is useful TaQL which mimics SQL.

static String casa::Regex::fromString ( const String strng  )  [static]

Convert a normal string to a regular expression.

This consists of escaping the special characters. This is useful when one wants to provide a normal string (which may contain special characters) to a function working on regular expressions.

const String& casa::Regex::regexp (  )  const

Get the regular expression string.

const Char* casa::Regex::transtable (  )  const

Get the translation table (can be a zero pointer).

virtual String::size_type casa::Regex::match ( const Char s,
String::size_type  len,
String::size_type  pos = 0 
) const [virtual]

Test if the regular expression matches (part of) string s.

The return value gives the length of the matching string part, or String::npos if there is no match or an error. The string has len characters and the test starts at position pos. The string may contain null characters. Negative p is allowed to match at end.

Tip: Use the appropriate String functions to test if a string matches a regular expression. Regex::match is pretty low-level.

Reimplemented from casa::RegexBase.

virtual String::size_type casa::Regex::search ( const Char s,
String::size_type  len,
Int matchlen,
Int  pos = 0 
) const [virtual]

Test if the regular expression occurs in string s.

The return value gives the position of the first substring matching the regular expression. The length of that substring is returned in matchlen. The string has len characters and the test starts at position pos. The string may contain null characters. The search will do a reverse search if the pos given is less than 0. Tip: Use the appropriate String functions to test if a regular expression occurs in a string. Regex::search is pretty low-level.

Reimplemented from casa::RegexBase.

virtual String::size_type casa::Regex::find ( const Char s,
String::size_type  len,
Int matchlen,
String::size_type  pos = 0 
) const [virtual]

Search string s of length len, starting at position pos.

Returned is the address of the first character of the substring found (or String::npos if not found). The matched length is returned in matchlen

Implements casa::RegexBase.

Int casa::Regex::match_info ( Int start,
Int length,
Int  nth = 0 
) const

Return some internal cregex info.

Bool casa::Regex::OK (  )  const

Does it contain a valid Regex?

void casa::Regex::create ( const String ,
Int  ,
Int  ,
const Char  
) [protected]

Compile the regular expression.

Thrown Exceptions

void casa::Regex::dealloc (  )  [protected]

Deallocate the stuff allocated by create.


Friends And Related Function Documentation

ostream& operator<< ( ostream &  ios,
const Regex exp 
) [friend]

Write as ASCII.


Member Data Documentation

String* casa::Regex::str [protected]

Definition at line 296 of file Regex.h.

Int casa::Regex::fastval [protected]

Definition at line 297 of file Regex.h.

Int casa::Regex::bufsz [protected]

Definition at line 298 of file Regex.h.

Char* casa::Regex::trans [protected]

Definition at line 299 of file Regex.h.

re_pattern_buffer* casa::Regex::buf [protected]

Definition at line 300 of file Regex.h.

re_registers* casa::Regex::reg [protected]

Definition at line 301 of file Regex.h.


The documentation for this class was generated from the following file:
Generated on Mon Sep 1 22:44:30 2008 for NRAOCASA by  doxygen 1.5.1