UCREGEXP Class Reference

Unicode Regular Expression parser. More...

#include <mi32/ucregexp.h>

List of all members.

Public Member Functions

 UCREGEXP ()
 ~UCREGEXP ()
int Compile (const MIUNICODE *expr, bool bIgnoreCase)
bool Exec (const MIUNICODE *text, UINT32 *pMatchStart=0, UINT32 *pMatchEnd=0, UINT32 textlen=~0, int flags=0) const

Detailed Description

Unicode Regular Expression parser.

Assumptions:

A discussion of regular expressions is beyond the scope of this document.

Definitions:

Operators:

^

matches beginning of string.

$

matches end of string.

.

match any character.

*

match zero or more of the last subexpression.

+

match one or more of the last subexpression.

?

match zero or one of the last subexpression.

()

subexpression grouping.

Notes:

The "." operator normally does not match separators, but a flag is available for the Exec() method that will allow this operator to match a separator (URE_DOT_MATCHES_SEPARATORS).

Literals and Constants:

c

literal UCS2 character.}}

\x....

hexadecimal number of up to 4 digits.

\X....

hexadecimal number of up to 4 digits.

\u....

hexadecimal number of up to 4 digits.

\U....

hexadecimal number of up to 4 digits.

Character classes:

[...]

Character class.

[^...]

Negated character class.

\pN1,N2,...,Nn

Character properties class.

\PN1,N2,...,Nn

Negated character properties class.

POSIX character classes recognized:

Notes:

Character property classes are \p or \P followed by a comma separated list of integers between 1 and 21. These integers are references to the following character properties:

N

Character Property

--

------------------------

1

NONSPACING

2

COMBINING

3

NUMDIGIT

4

NUMOTHER

5

SPACESEP

6

LINESEP

7

PARASEP

8

CNTRL

9

PUA

10

UPPER

11

LOWER

12

TITLE

13

MODIFIER

14

OTHERLETTER

15

DASHPUNCT

16

OPENPUNCT

17

CLOSEPUNCT

18

OTHERPUNCT

19

MATHSYM

20

CURRENCYSYM

21

OTHERSYM


Constructor & Destructor Documentation

UCREGEXP::UCREGEXP (  ) 
UCREGEXP::~UCREGEXP (  ) 

Member Function Documentation

int UCREGEXP::Compile ( const MIUNICODE expr,
bool  bIgnoreCase 
)

Compile a Regular Expression.

A discussion of regular expressions is beyond the scope of this document. For more information, log on to most any flavor of Unix (Linux, for example) and type "man 7 regex"

Example The following example will form a regular expression which will search for one or more digits followed by either a "-" or "/" and one or more digits.

      strtouc(expr, "[[:digit:]]+[-/][[:digit:]]+");
      regexp.Compile(expr);
Returns:
0 if expression compiled, negative value if not. Note, return codes are not MicroImages error codes.
bool UCREGEXP::Exec ( const MIUNICODE text,
UINT32 pMatchStart = 0,
UINT32 pMatchEnd = 0,
UINT32  textlen = ~0,
int  flags = 0 
) const

Execute a regular expression.

Returns true if a match was found, false if not. If a match was found, *pMatchStart and *pMatchEnd will be set to the offsets into the text where the matching text was.

Parameters:
flags URE_IGNORE_NONSPACING URE_DOT_MATCHES_SEPARATORS
Parameters:
text Text to search
pMatchStart Offset into text of start of matching text (pass NULL if don't care)
pMatchEnd Offset into text of end of matching text (pass NULL if don't care)
textlen Default is to use ucstrlen(text)

The documentation for this class was generated from the following file:

Generated on Sun Oct 7 21:33:58 2012 for TNTsdk 2012 by  doxygen 1.6.1