Unicode Regular Expression parser. More...
#include <mi32/ucregexp.h>
Public Member Functions | |
| UCREGEXP () | |
| ~UCREGEXP () | |
| int | Compile (const MIUNICODE *expr, bool bIgnoreCase) |
| bool | Exec (const MIUNICODE *text, UINT32 *pMatchStart=0, UINT32 *pMatchEnd=0, UINT32 textlen=~0, int flags=0) const |
Unicode Regular Expression parser.
Assumptions:
A discussion of regular expressions is beyond the scope of this document.
Definitions:
Operators:
^ | matches beginning of string. |
$ | matches end of string. |
. | match any character. |
* | match zero or more of the last subexpression. |
+ | match one or more of the last subexpression. |
? | match zero or one of the last subexpression. |
() | subexpression grouping. |
Notes:
The "." operator normally does not match separators, but a flag is available for the Exec() method that will allow this operator to match a separator (URE_DOT_MATCHES_SEPARATORS).
Literals and Constants:
c | literal UCS2 character.}} |
\x.... | hexadecimal number of up to 4 digits. |
\X.... | hexadecimal number of up to 4 digits. |
\u.... | hexadecimal number of up to 4 digits. |
\U.... | hexadecimal number of up to 4 digits. |
Character classes:
[...] | Character class. |
[^...] | Negated character class. |
\pN1,N2,...,Nn | Character properties class. |
\PN1,N2,...,Nn | Negated character properties class. |
POSIX character classes recognized:
Notes:
Character property classes are \p or \P followed by a comma separated list of integers between 1 and 21. These integers are references to the following character properties:
N | Character Property |
-- | ------------------------ |
1 | NONSPACING |
2 | COMBINING |
3 | NUMDIGIT |
4 | NUMOTHER |
5 | SPACESEP |
6 | LINESEP |
7 | PARASEP |
8 | CNTRL |
9 | PUA |
10 | UPPER |
11 | LOWER |
12 | TITLE |
13 | MODIFIER |
14 | OTHERLETTER |
15 | DASHPUNCT |
16 | OPENPUNCT |
17 | CLOSEPUNCT |
18 | OTHERPUNCT |
19 | MATHSYM |
20 | CURRENCYSYM |
21 | OTHERSYM |
| UCREGEXP::UCREGEXP | ( | ) |
| UCREGEXP::~UCREGEXP | ( | ) |
| int UCREGEXP::Compile | ( | const MIUNICODE * | expr, | |
| bool | bIgnoreCase | |||
| ) |
Compile a Regular Expression.
A discussion of regular expressions is beyond the scope of this document. For more information, log on to most any flavor of Unix (Linux, for example) and type "man 7 regex"
Example The following example will form a regular expression which will search for one or more digits followed by either a "-" or "/" and one or more digits.
strtouc(expr, "[[:digit:]]+[-/][[:digit:]]+"); regexp.Compile(expr);
| bool UCREGEXP::Exec | ( | const MIUNICODE * | text, | |
| UINT32 * | pMatchStart = 0, |
|||
| UINT32 * | pMatchEnd = 0, |
|||
| UINT32 | textlen = ~0, |
|||
| int | flags = 0 | |||
| ) | const |
Execute a regular expression.
Returns true if a match was found, false if not. If a match was found, *pMatchStart and *pMatchEnd will be set to the offsets into the text where the matching text was.
| flags | URE_IGNORE_NONSPACING URE_DOT_MATCHES_SEPARATORS |
| text | Text to search | |
| pMatchStart | Offset into text of start of matching text (pass NULL if don't care) | |
| pMatchEnd | Offset into text of end of matching text (pass NULL if don't care) | |
| textlen | Default is to use ucstrlen(text) |
1.6.1