annotate doc/regex.texi @ 17274:69f030e5cec4

doc: avoid small caps * doc/parse-datetime.texi, doc/regex.texi: Don't use small caps; they're more trouble than they're worth. Suggested by Karl Berry in <http://bugs.gnu.org/13360>.
author Paul Eggert <eggert@cs.ucla.edu>
date Sat, 05 Jan 2013 17:23:52 -0800
parents a712776b11ce
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
13533
ca70a11e70e2 Integrate the regex documentation.
Bruno Haible <bruno@clisp.org>
parents: 13532
diff changeset
1 @node Overview
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2 @chapter Overview
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
3
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
4 A @dfn{regular expression} (or @dfn{regexp}, or @dfn{pattern}) is a text
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
5 string that describes some (mathematical) set of strings. A regexp
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
6 @var{r} @dfn{matches} a string @var{s} if @var{s} is in the set of
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
7 strings described by @var{r}.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
8
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
9 Using the Regex library, you can:
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
10
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
11 @itemize @bullet
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
12
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
13 @item
13532
b0bea693e638 Whitespace cleanup.
Bruno Haible <bruno@clisp.org>
parents: 13531
diff changeset
14 see if a string matches a specified pattern as a whole, and
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
15
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
16 @item
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
17 search within a string for a substring matching a specified pattern.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
18
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
19 @end itemize
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
20
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
21 Some regular expressions match only one string, i.e., the set they
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
22 describe has only one member. For example, the regular expression
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
23 @samp{foo} matches the string @samp{foo} and no others. Other regular
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
24 expressions match more than one string, i.e., the set they describe has
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
25 more than one member. For example, the regular expression @samp{f*}
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
26 matches the set of strings made up of any number (including zero) of
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
27 @samp{f}s. As you can see, some characters in regular expressions match
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
28 themselves (such as @samp{f}) and some don't (such as @samp{*}); the
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
29 ones that don't match themselves instead let you specify patterns that
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
30 describe many different strings.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
31
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
32 To either match or search for a regular expression with the Regex
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
33 library functions, you must first compile it with a Regex pattern
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
34 compiling function. A @dfn{compiled pattern} is a regular expression
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
35 converted to the internal format used by the library functions. Once
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
36 you've compiled a pattern, you can use it for matching or searching any
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
37 number of times.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
38
13553
8fc3314fe460 Document not_eol and remove mention of regex.c.
Reuben Thomas <rrt@sc3d.org>
parents: 13549
diff changeset
39 The Regex library is used by including @file{regex.h}.
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
40 @pindex regex.h
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
41 Regex provides three groups of functions with which you can operate on
17274
69f030e5cec4 doc: avoid small caps
Paul Eggert <eggert@cs.ucla.edu>
parents: 16358
diff changeset
42 regular expressions. One group---the GNU group---is more
13647
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
43 powerful but not completely compatible with the other two, namely the
17274
69f030e5cec4 doc: avoid small caps
Paul Eggert <eggert@cs.ucla.edu>
parents: 16358
diff changeset
44 POSIX and Berkeley Unix groups; its interface was designed
69f030e5cec4 doc: avoid small caps
Paul Eggert <eggert@cs.ucla.edu>
parents: 16358
diff changeset
45 specifically for GNU.
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
46
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
47 We wrote this chapter with programmers in mind, not users of
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
48 programs---such as Emacs---that use Regex. We describe the Regex
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
49 library in its entirety, not how to write regular expressions that a
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
50 particular program understands.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
51
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
52
13533
ca70a11e70e2 Integrate the regex documentation.
Bruno Haible <bruno@clisp.org>
parents: 13532
diff changeset
53 @node Regular Expression Syntax
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
54 @chapter Regular Expression Syntax
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
55
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
56 @cindex regular expressions, syntax of
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
57 @cindex syntax of regular expressions
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
58
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
59 @dfn{Characters} are things you can type. @dfn{Operators} are things in
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
60 a regular expression that match one or more characters. You compose
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
61 regular expressions from operators, which in turn you specify using one
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
62 or more characters.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
63
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
64 Most characters represent what we call the match-self operator, i.e.,
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
65 they match themselves; we call these characters @dfn{ordinary}. Other
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
66 characters represent either all or parts of fancier operators; e.g.,
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
67 @samp{.} represents what we call the match-any-character operator
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
68 (which, no surprise, matches (almost) any character); we call these
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
69 characters @dfn{special}. Two different things determine what
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
70 characters represent what operators:
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
71
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
72 @enumerate
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
73 @item
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
74 the regular expression syntax your program has told the Regex library to
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
75 recognize, and
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
76
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
77 @item
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
78 the context of the character in the regular expression.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
79 @end enumerate
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
80
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
81 In the following sections, we describe these things in more detail.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
82
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
83 @menu
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
84 * Syntax Bits::
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
85 * Predefined Syntaxes::
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
86 * Collating Elements vs. Characters::
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
87 * The Backslash Character::
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
88 @end menu
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
89
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
90
13533
ca70a11e70e2 Integrate the regex documentation.
Bruno Haible <bruno@clisp.org>
parents: 13532
diff changeset
91 @node Syntax Bits
13532
b0bea693e638 Whitespace cleanup.
Bruno Haible <bruno@clisp.org>
parents: 13531
diff changeset
92 @section Syntax Bits
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
93
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
94 @cindex syntax bits
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
95
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
96 In any particular syntax for regular expressions, some characters are
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
97 always special, others are sometimes special, and others are never
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
98 special. The particular syntax that Regex recognizes for a given
13647
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
99 regular expression depends on the current syntax (as set by
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
100 @code{re_set_syntax}) when the pattern buffer of that regular expression
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
101 was compiled.
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
102
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
103 You get a pattern buffer by compiling a regular expression. @xref{GNU
13647
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
104 Pattern Buffers}, for more information on pattern buffers. @xref{GNU
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
105 Regular Expression Compiling}, and @ref{BSD Regular Expression
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
106 Compiling}, for more information on compiling.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
107
13647
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
108 Regex considers the current syntax to be a collection of bits; we refer
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
109 to these bits as @dfn{syntax bits}. In most cases, they affect what
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
110 characters represent what operators. We describe the meanings of the
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
111 operators to which we refer in @ref{Common Operators}, @ref{GNU
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
112 Operators}, and @ref{GNU Emacs Operators}.
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
113
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
114 For reference, here is the complete list of syntax bits, in alphabetical
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
115 order:
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
116
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
117 @table @code
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
118
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
119 @cnindex RE_BACKSLASH_ESCAPE_IN_LIST
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
120 @item RE_BACKSLASH_ESCAPE_IN_LISTS
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
121 If this bit is set, then @samp{\} inside a list (@pxref{List Operators}
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
122 quotes (makes ordinary, if it's special) the following character; if
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
123 this bit isn't set, then @samp{\} is an ordinary character inside lists.
16236
8d0c35a0ae1d doc: fix minor quoting issues, mostly with `
Paul Eggert <eggert@cs.ucla.edu>
parents: 15563
diff changeset
124 (@xref{The Backslash Character}, for what @samp{\} does outside of lists.)
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
125
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
126 @cnindex RE_BK_PLUS_QM
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
127 @item RE_BK_PLUS_QM
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
128 If this bit is set, then @samp{\+} represents the match-one-or-more
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
129 operator and @samp{\?} represents the match-zero-or-more operator; if
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
130 this bit isn't set, then @samp{+} represents the match-one-or-more
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
131 operator and @samp{?} represents the match-zero-or-one operator. This
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
132 bit is irrelevant if @code{RE_LIMITED_OPS} is set.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
133
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
134 @cnindex RE_CHAR_CLASSES
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
135 @item RE_CHAR_CLASSES
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
136 If this bit is set, then you can use character classes in lists; if this
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
137 bit isn't set, then you can't.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
138
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
139 @cnindex RE_CONTEXT_INDEP_ANCHORS
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
140 @item RE_CONTEXT_INDEP_ANCHORS
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
141 If this bit is set, then @samp{^} and @samp{$} are special anywhere outside
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
142 a list; if this bit isn't set, then these characters are special only in
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
143 certain contexts. @xref{Match-beginning-of-line Operator}, and
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
144 @ref{Match-end-of-line Operator}.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
145
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
146 @cnindex RE_CONTEXT_INDEP_OPS
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
147 @item RE_CONTEXT_INDEP_OPS
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
148 If this bit is set, then certain characters are special anywhere outside
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
149 a list; if this bit isn't set, then those characters are special only in
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
150 some contexts and are ordinary elsewhere. Specifically, if this bit
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
151 isn't set then @samp{*}, and (if the syntax bit @code{RE_LIMITED_OPS}
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
152 isn't set) @samp{+} and @samp{?} (or @samp{\+} and @samp{\?}, depending
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
153 on the syntax bit @code{RE_BK_PLUS_QM}) represent repetition operators
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
154 only if they're not first in a regular expression or just after an
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
155 open-group or alternation operator. The same holds for @samp{@{} (or
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
156 @samp{\@{}, depending on the syntax bit @code{RE_NO_BK_BRACES}) if
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
157 it is the beginning of a valid interval and the syntax bit
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
158 @code{RE_INTERVALS} is set.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
159
13647
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
160 @cnindex RE_CONTEXT_INVALID_DUP
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
161 @item RE_CONTEXT_INVALID_DUP
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
162 If this bit is set, then an open-interval operator cannot occur at the
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
163 start of a regular expression, or immediately after an alternation,
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
164 open-group or close-interval operator.
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
165
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
166 @cnindex RE_CONTEXT_INVALID_OPS
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
167 @item RE_CONTEXT_INVALID_OPS
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
168 If this bit is set, then repetition and alternation operators can't be
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
169 in certain positions within a regular expression. Specifically, the
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
170 regular expression is invalid if it has:
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
171
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
172 @itemize @bullet
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
173
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
174 @item
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
175 a repetition operator first in the regular expression or just after a
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
176 match-beginning-of-line, open-group, or alternation operator; or
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
177
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
178 @item
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
179 an alternation operator first or last in the regular expression, just
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
180 before a match-end-of-line operator, or just after an alternation or
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
181 open-group operator.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
182
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
183 @end itemize
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
184
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
185 If this bit isn't set, then you can put the characters representing the
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
186 repetition and alternation characters anywhere in a regular expression.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
187 Whether or not they will in fact be operators in certain positions
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
188 depends on other syntax bits.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
189
13647
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
190 @cnindex RE_DEBUG
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
191 @item RE_DEBUG
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
192 If this bit is set, and the regex library was compiled with
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
193 @code{-DDEBUG}, then internal debugging is turned on; if unset, then
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
194 it is turned off.
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
195
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
196 @cnindex RE_DOT_NEWLINE
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
197 @item RE_DOT_NEWLINE
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
198 If this bit is set, then the match-any-character operator matches
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
199 a newline; if this bit isn't set, then it doesn't.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
200
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
201 @cnindex RE_DOT_NOT_NULL
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
202 @item RE_DOT_NOT_NULL
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
203 If this bit is set, then the match-any-character operator doesn't match
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
204 a null character; if this bit isn't set, then it does.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
205
13647
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
206 @cnindex RE_HAT_LISTS_NOT_NEWLINE
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
207 @item RE_HAT_LISTS_NOT_NEWLINE
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
208 If this bit is set, nonmatching lists @samp{[^...]} do not match
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
209 newline; if not set, they do.
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
210
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
211 @cnindex RE_ICASE
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
212 @item RE_ICASE
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
213 If this bit is set, then ignore case when matching; otherwise, case is
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
214 significant.
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
215
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
216 @cnindex RE_INTERVALS
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
217 @item RE_INTERVALS
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
218 If this bit is set, then Regex recognizes interval operators; if this bit
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
219 isn't set, then it doesn't.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
220
13647
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
221 @cnindex RE_INVALID_INTERVAL_ORD
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
222 @item RE_INVALID_INTERVAL_ORD
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
223 If this bit is set, a syntactically invalid interval is treated as a
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
224 string of ordinary characters. For example, the extended regular
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
225 expression @samp{a@{1} is treated as @samp{a\@{1}.
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
226
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
227 @cnindex RE_LIMITED_OPS
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
228 @item RE_LIMITED_OPS
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
229 If this bit is set, then Regex doesn't recognize the match-one-or-more,
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
230 match-zero-or-one or alternation operators; if this bit isn't set, then
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
231 it does.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
232
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
233 @cnindex RE_NEWLINE_ALT
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
234 @item RE_NEWLINE_ALT
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
235 If this bit is set, then newline represents the alternation operator; if
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
236 this bit isn't set, then newline is ordinary.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
237
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
238 @cnindex RE_NO_BK_BRACES
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
239 @item RE_NO_BK_BRACES
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
240 If this bit is set, then @samp{@{} represents the open-interval operator
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
241 and @samp{@}} represents the close-interval operator; if this bit isn't
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
242 set, then @samp{\@{} represents the open-interval operator and
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
243 @samp{\@}} represents the close-interval operator. This bit is relevant
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
244 only if @code{RE_INTERVALS} is set.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
245
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
246 @cnindex RE_NO_BK_PARENS
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
247 @item RE_NO_BK_PARENS
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
248 If this bit is set, then @samp{(} represents the open-group operator and
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
249 @samp{)} represents the close-group operator; if this bit isn't set, then
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
250 @samp{\(} represents the open-group operator and @samp{\)} represents
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
251 the close-group operator.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
252
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
253 @cnindex RE_NO_BK_REFS
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
254 @item RE_NO_BK_REFS
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
255 If this bit is set, then Regex doesn't recognize @samp{\}@var{digit} as
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
256 the back reference operator; if this bit isn't set, then it does.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
257
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
258 @cnindex RE_NO_BK_VBAR
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
259 @item RE_NO_BK_VBAR
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
260 If this bit is set, then @samp{|} represents the alternation operator;
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
261 if this bit isn't set, then @samp{\|} represents the alternation
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
262 operator. This bit is irrelevant if @code{RE_LIMITED_OPS} is set.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
263
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
264 @cnindex RE_NO_EMPTY_RANGES
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
265 @item RE_NO_EMPTY_RANGES
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
266 If this bit is set, then a regular expression with a range whose ending
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
267 point collates lower than its starting point is invalid; if this bit
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
268 isn't set, then Regex considers such a range to be empty.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
269
13647
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
270 @cnindex RE_NO_GNU_OPS
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
271 @item RE_NO_GNU_OPS
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
272 If this bit is set, GNU regex operators are not recognized; otherwise,
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
273 they are.
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
274
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
275 @cnindex RE_NO_POSIX_BACKTRACKING
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
276 @item RE_NO_POSIX_BACKTRACKING
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
277 If this bit is set, succeed as soon as we match the whole pattern,
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
278 without further backtracking. This means that a match may not be
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
279 the leftmost longest; @pxref{What Gets Matched?} for what this means.
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
280
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
281 @cnindex RE_NO_SUB
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
282 @item RE_NO_SUB
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
283 If this bit is set, then @code{no_sub} will be set to one during
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
284 @code{re_compile_pattern}. This causes matching and searching routines
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
285 not to record substring match information.
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
286
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
287 @cnindex RE_UNMATCHED_RIGHT_PAREN_ORD
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
288 @item RE_UNMATCHED_RIGHT_PAREN_ORD
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
289 If this bit is set and the regular expression has no matching open-group
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
290 operator, then Regex considers what would otherwise be a close-group
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
291 operator (based on how @code{RE_NO_BK_PARENS} is set) to match @samp{)}.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
292
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
293 @end table
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
294
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
295
13533
ca70a11e70e2 Integrate the regex documentation.
Bruno Haible <bruno@clisp.org>
parents: 13532
diff changeset
296 @node Predefined Syntaxes
13532
b0bea693e638 Whitespace cleanup.
Bruno Haible <bruno@clisp.org>
parents: 13531
diff changeset
297 @section Predefined Syntaxes
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
298
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
299 If you're programming with Regex, you can set a pattern buffer's
13647
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
300 (@pxref{GNU Pattern Buffers})
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
301 syntax either to an arbitrary combination of syntax bits
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
302 (@pxref{Syntax Bits}) or else to the configurations defined by Regex.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
303 These configurations define the syntaxes used by certain
17274
69f030e5cec4 doc: avoid small caps
Paul Eggert <eggert@cs.ucla.edu>
parents: 16358
diff changeset
304 programs---GNU Emacs,
13532
b0bea693e638 Whitespace cleanup.
Bruno Haible <bruno@clisp.org>
parents: 13531
diff changeset
305 @cindex Emacs
17274
69f030e5cec4 doc: avoid small caps
Paul Eggert <eggert@cs.ucla.edu>
parents: 16358
diff changeset
306 POSIX Awk,
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
307 @cindex POSIX Awk
13532
b0bea693e638 Whitespace cleanup.
Bruno Haible <bruno@clisp.org>
parents: 13531
diff changeset
308 traditional Awk,
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
309 @cindex Awk
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
310 Grep,
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
311 @cindex Grep
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
312 @cindex Egrep
17274
69f030e5cec4 doc: avoid small caps
Paul Eggert <eggert@cs.ucla.edu>
parents: 16358
diff changeset
313 Egrep---in addition to syntaxes for POSIX basic and extended
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
314 regular expressions.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
315
13549
bb0ceefd22dc avoid some overlong lines from posix urls, etc.
Karl Berry <karl@freefriends.org>
parents: 13537
diff changeset
316 The predefined syntaxes---taken directly from @file{regex.h}---are:
bb0ceefd22dc avoid some overlong lines from posix urls, etc.
Karl Berry <karl@freefriends.org>
parents: 13537
diff changeset
317
bb0ceefd22dc avoid some overlong lines from posix urls, etc.
Karl Berry <karl@freefriends.org>
parents: 13537
diff changeset
318 @smallexample
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
319 #define RE_SYNTAX_EMACS 0
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
320
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
321 #define RE_SYNTAX_AWK \
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
322 (RE_BACKSLASH_ESCAPE_IN_LISTS | RE_DOT_NOT_NULL \
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
323 | RE_NO_BK_PARENS | RE_NO_BK_REFS \
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
324 | RE_NO_BK_VBAR | RE_NO_EMPTY_RANGES \
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
325 | RE_UNMATCHED_RIGHT_PAREN_ORD)
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
326
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
327 #define RE_SYNTAX_POSIX_AWK \
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
328 (RE_SYNTAX_POSIX_EXTENDED | RE_BACKSLASH_ESCAPE_IN_LISTS)
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
329
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
330 #define RE_SYNTAX_GREP \
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
331 (RE_BK_PLUS_QM | RE_CHAR_CLASSES \
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
332 | RE_HAT_LISTS_NOT_NEWLINE | RE_INTERVALS \
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
333 | RE_NEWLINE_ALT)
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
334
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
335 #define RE_SYNTAX_EGREP \
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
336 (RE_CHAR_CLASSES | RE_CONTEXT_INDEP_ANCHORS \
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
337 | RE_CONTEXT_INDEP_OPS | RE_HAT_LISTS_NOT_NEWLINE \
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
338 | RE_NEWLINE_ALT | RE_NO_BK_PARENS \
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
339 | RE_NO_BK_VBAR)
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
340
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
341 #define RE_SYNTAX_POSIX_EGREP \
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
342 (RE_SYNTAX_EGREP | RE_INTERVALS | RE_NO_BK_BRACES)
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
343
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
344 /* P1003.2/D11.2, section 4.20.7.1, lines 5078ff. */
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
345 #define RE_SYNTAX_ED RE_SYNTAX_POSIX_BASIC
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
346
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
347 #define RE_SYNTAX_SED RE_SYNTAX_POSIX_BASIC
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
348
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
349 /* Syntax bits common to both basic and extended POSIX regex syntax. */
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
350 #define _RE_SYNTAX_POSIX_COMMON \
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
351 (RE_CHAR_CLASSES | RE_DOT_NEWLINE | RE_DOT_NOT_NULL \
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
352 | RE_INTERVALS | RE_NO_EMPTY_RANGES)
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
353
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
354 #define RE_SYNTAX_POSIX_BASIC \
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
355 (_RE_SYNTAX_POSIX_COMMON | RE_BK_PLUS_QM)
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
356
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
357 /* Differs from ..._POSIX_BASIC only in that RE_BK_PLUS_QM becomes
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
358 RE_LIMITED_OPS, i.e., \? \+ \| are not recognized. Actually, this
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
359 isn't minimal, since other operators, such as \`, aren't disabled. */
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
360 #define RE_SYNTAX_POSIX_MINIMAL_BASIC \
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
361 (_RE_SYNTAX_POSIX_COMMON | RE_LIMITED_OPS)
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
362
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
363 #define RE_SYNTAX_POSIX_EXTENDED \
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
364 (_RE_SYNTAX_POSIX_COMMON | RE_CONTEXT_INDEP_ANCHORS \
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
365 | RE_CONTEXT_INDEP_OPS | RE_NO_BK_BRACES \
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
366 | RE_NO_BK_PARENS | RE_NO_BK_VBAR \
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
367 | RE_UNMATCHED_RIGHT_PAREN_ORD)
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
368
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
369 /* Differs from ..._POSIX_EXTENDED in that RE_CONTEXT_INVALID_OPS
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
370 replaces RE_CONTEXT_INDEP_OPS and RE_NO_BK_REFS is added. */
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
371 #define RE_SYNTAX_POSIX_MINIMAL_EXTENDED \
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
372 (_RE_SYNTAX_POSIX_COMMON | RE_CONTEXT_INDEP_ANCHORS \
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
373 | RE_CONTEXT_INVALID_OPS | RE_NO_BK_BRACES \
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
374 | RE_NO_BK_PARENS | RE_NO_BK_REFS \
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
375 | RE_NO_BK_VBAR | RE_UNMATCHED_RIGHT_PAREN_ORD)
13549
bb0ceefd22dc avoid some overlong lines from posix urls, etc.
Karl Berry <karl@freefriends.org>
parents: 13537
diff changeset
376 @end smallexample
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
377
13533
ca70a11e70e2 Integrate the regex documentation.
Bruno Haible <bruno@clisp.org>
parents: 13532
diff changeset
378 @node Collating Elements vs. Characters
13532
b0bea693e638 Whitespace cleanup.
Bruno Haible <bruno@clisp.org>
parents: 13531
diff changeset
379 @section Collating Elements vs.@: Characters
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
380
17274
69f030e5cec4 doc: avoid small caps
Paul Eggert <eggert@cs.ucla.edu>
parents: 16358
diff changeset
381 POSIX generalizes the notion of a character to that of a
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
382 collating element. It defines a @dfn{collating element} to be ``a
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
383 sequence of one or more bytes defined in the current collating sequence
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
384 as a unit of collation.''
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
385
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
386 This generalizes the notion of a character in
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
387 two ways. First, a single character can map into two or more collating
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
388 elements. For example, the German
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
389 @tex
16236
8d0c35a0ae1d doc: fix minor quoting issues, mostly with `
Paul Eggert <eggert@cs.ucla.edu>
parents: 15563
diff changeset
390 ``\ss''
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
391 @end tex
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
392 @ifinfo
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
393 ``es-zet''
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
394 @end ifinfo
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
395 collates as the collating element @samp{s} followed by another collating
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
396 element @samp{s}. Second, two or more characters can map into one
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
397 collating element. For example, the Spanish @samp{ll} collates after
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
398 @samp{l} and before @samp{m}.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
399
17274
69f030e5cec4 doc: avoid small caps
Paul Eggert <eggert@cs.ucla.edu>
parents: 16358
diff changeset
400 Since POSIX's ``collating element'' preserves the essential idea of
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
401 a ``character,'' we use the latter, more familiar, term in this document.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
402
13533
ca70a11e70e2 Integrate the regex documentation.
Bruno Haible <bruno@clisp.org>
parents: 13532
diff changeset
403 @node The Backslash Character
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
404 @section The Backslash Character
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
405
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
406 @cindex \
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
407 The @samp{\} character has one of four different meanings, depending on
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
408 the context in which you use it and what syntax bits are set
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
409 (@pxref{Syntax Bits}). It can: 1) stand for itself, 2) quote the next
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
410 character, 3) introduce an operator, or 4) do nothing.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
411
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
412 @enumerate
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
413 @item
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
414 It stands for itself inside a list
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
415 (@pxref{List Operators}) if the syntax bit
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
416 @code{RE_BACKSLASH_ESCAPE_IN_LISTS} is not set. For example, @samp{[\]}
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
417 would match @samp{\}.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
418
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
419 @item
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
420 It quotes (makes ordinary, if it's special) the next character when you
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
421 use it either:
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
422
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
423 @itemize @bullet
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
424 @item
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
425 outside a list,@footnote{Sometimes
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
426 you don't have to explicitly quote special characters to make
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
427 them ordinary. For instance, most characters lose any special meaning
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
428 inside a list (@pxref{List Operators}). In addition, if the syntax bits
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
429 @code{RE_CONTEXT_INVALID_OPS} and @code{RE_CONTEXT_INDEP_OPS}
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
430 aren't set, then (for historical reasons) the matcher considers special
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
431 characters ordinary if they are in contexts where the operations they
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
432 represent make no sense; for example, then the match-zero-or-more
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
433 operator (represented by @samp{*}) matches itself in the regular
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
434 expression @samp{*foo} because there is no preceding expression on which
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
435 it can operate. It is poor practice, however, to depend on this
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
436 behavior; if you want a special character to be ordinary outside a list,
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
437 it's better to always quote it, regardless.} or
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
438
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
439 @item
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
440 inside a list and the syntax bit @code{RE_BACKSLASH_ESCAPE_IN_LISTS} is set.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
441
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
442 @end itemize
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
443
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
444 @item
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
445 It introduces an operator when followed by certain ordinary
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
446 characters---sometimes only when certain syntax bits are set. See the
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
447 cases @code{RE_BK_PLUS_QM}, @code{RE_NO_BK_BRACES}, @code{RE_NO_BK_VAR},
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
448 @code{RE_NO_BK_PARENS}, @code{RE_NO_BK_REF} in @ref{Syntax Bits}. Also:
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
449
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
450 @itemize @bullet
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
451 @item
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
452 @samp{\b} represents the match-word-boundary operator
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
453 (@pxref{Match-word-boundary Operator}).
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
454
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
455 @item
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
456 @samp{\B} represents the match-within-word operator
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
457 (@pxref{Match-within-word Operator}).
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
458
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
459 @item
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
460 @samp{\<} represents the match-beginning-of-word operator @*
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
461 (@pxref{Match-beginning-of-word Operator}).
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
462
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
463 @item
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
464 @samp{\>} represents the match-end-of-word operator
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
465 (@pxref{Match-end-of-word Operator}).
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
466
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
467 @item
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
468 @samp{\w} represents the match-word-constituent operator
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
469 (@pxref{Match-word-constituent Operator}).
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
470
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
471 @item
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
472 @samp{\W} represents the match-non-word-constituent operator
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
473 (@pxref{Match-non-word-constituent Operator}).
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
474
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
475 @item
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
476 @samp{\`} represents the match-beginning-of-buffer
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
477 operator and @samp{\'} represents the match-end-of-buffer operator
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
478 (@pxref{Buffer Operators}).
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
479
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
480 @item
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
481 If Regex was compiled with the C preprocessor symbol @code{emacs}
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
482 defined, then @samp{\s@var{class}} represents the match-syntactic-class
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
483 operator and @samp{\S@var{class}} represents the
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
484 match-not-syntactic-class operator (@pxref{Syntactic Class Operators}).
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
485
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
486 @end itemize
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
487
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
488 @item
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
489 In all other cases, Regex ignores @samp{\}. For example,
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
490 @samp{\n} matches @samp{n}.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
491
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
492 @end enumerate
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
493
13533
ca70a11e70e2 Integrate the regex documentation.
Bruno Haible <bruno@clisp.org>
parents: 13532
diff changeset
494 @node Common Operators
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
495 @chapter Common Operators
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
496
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
497 You compose regular expressions from operators. In the following
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
498 sections, we describe the regular expression operators specified by
17274
69f030e5cec4 doc: avoid small caps
Paul Eggert <eggert@cs.ucla.edu>
parents: 16358
diff changeset
499 POSIX; GNU also uses these. Most operators have more than one
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
500 representation as characters. @xref{Regular Expression Syntax}, for
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
501 what characters represent what operators under what circumstances.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
502
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
503 For most operators that can be represented in two ways, one
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
504 representation is a single character and the other is that character
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
505 preceded by @samp{\}. For example, either @samp{(} or @samp{\(}
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
506 represents the open-group operator. Which one does depends on the
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
507 setting of a syntax bit, in this case @code{RE_NO_BK_PARENS}. Why is
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
508 this so? Historical reasons dictate some of the varying
17274
69f030e5cec4 doc: avoid small caps
Paul Eggert <eggert@cs.ucla.edu>
parents: 16358
diff changeset
509 representations, while POSIX dictates others.
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
510
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
511 Finally, almost all characters lose any special meaning inside a list
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
512 (@pxref{List Operators}).
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
513
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
514 @menu
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
515 * Match-self Operator:: Ordinary characters.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
516 * Match-any-character Operator:: .
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
517 * Concatenation Operator:: Juxtaposition.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
518 * Repetition Operators:: * + ? @{@}
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
519 * Alternation Operator:: |
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
520 * List Operators:: [...] [^...]
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
521 * Grouping Operators:: (...)
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
522 * Back-reference Operator:: \digit
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
523 * Anchoring Operators:: ^ $
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
524 @end menu
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
525
13533
ca70a11e70e2 Integrate the regex documentation.
Bruno Haible <bruno@clisp.org>
parents: 13532
diff changeset
526 @node Match-self Operator
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
527 @section The Match-self Operator (@var{ordinary character})
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
528
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
529 This operator matches the character itself. All ordinary characters
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
530 (@pxref{Regular Expression Syntax}) represent this operator. For
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
531 example, @samp{f} is always an ordinary character, so the regular
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
532 expression @samp{f} matches only the string @samp{f}. In
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
533 particular, it does @emph{not} match the string @samp{ff}.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
534
13533
ca70a11e70e2 Integrate the regex documentation.
Bruno Haible <bruno@clisp.org>
parents: 13532
diff changeset
535 @node Match-any-character Operator
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
536 @section The Match-any-character Operator (@code{.})
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
537
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
538 @cindex @samp{.}
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
539
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
540 This operator matches any single printing or nonprinting character
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
541 except it won't match a:
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
542
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
543 @table @asis
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
544 @item newline
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
545 if the syntax bit @code{RE_DOT_NEWLINE} isn't set.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
546
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
547 @item null
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
548 if the syntax bit @code{RE_DOT_NOT_NULL} is set.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
549
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
550 @end table
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
551
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
552 The @samp{.} (period) character represents this operator. For example,
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
553 @samp{a.b} matches any three-character string beginning with @samp{a}
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
554 and ending with @samp{b}.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
555
13533
ca70a11e70e2 Integrate the regex documentation.
Bruno Haible <bruno@clisp.org>
parents: 13532
diff changeset
556 @node Concatenation Operator
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
557 @section The Concatenation Operator
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
558
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
559 This operator concatenates two regular expressions @var{a} and @var{b}.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
560 No character represents this operator; you simply put @var{b} after
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
561 @var{a}. The result is a regular expression that will match a string if
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
562 @var{a} matches its first part and @var{b} matches the rest. For
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
563 example, @samp{xy} (two match-self operators) matches @samp{xy}.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
564
13533
ca70a11e70e2 Integrate the regex documentation.
Bruno Haible <bruno@clisp.org>
parents: 13532
diff changeset
565 @node Repetition Operators
13532
b0bea693e638 Whitespace cleanup.
Bruno Haible <bruno@clisp.org>
parents: 13531
diff changeset
566 @section Repetition Operators
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
567
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
568 Repetition operators repeat the preceding regular expression a specified
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
569 number of times.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
570
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
571 @menu
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
572 * Match-zero-or-more Operator:: *
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
573 * Match-one-or-more Operator:: +
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
574 * Match-zero-or-one Operator:: ?
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
575 * Interval Operators:: @{@}
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
576 @end menu
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
577
13533
ca70a11e70e2 Integrate the regex documentation.
Bruno Haible <bruno@clisp.org>
parents: 13532
diff changeset
578 @node Match-zero-or-more Operator
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
579 @subsection The Match-zero-or-more Operator (@code{*})
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
580
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
581 @cindex @samp{*}
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
582
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
583 This operator repeats the smallest possible preceding regular expression
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
584 as many times as necessary (including zero) to match the pattern.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
585 @samp{*} represents this operator. For example, @samp{o*}
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
586 matches any string made up of zero or more @samp{o}s. Since this
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
587 operator operates on the smallest preceding regular expression,
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
588 @samp{fo*} has a repeating @samp{o}, not a repeating @samp{fo}. So,
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
589 @samp{fo*} matches @samp{f}, @samp{fo}, @samp{foo}, and so on.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
590
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
591 Since the match-zero-or-more operator is a suffix operator, it may be
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
592 useless as such when no regular expression precedes it. This is the
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
593 case when it:
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
594
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
595 @itemize @bullet
13532
b0bea693e638 Whitespace cleanup.
Bruno Haible <bruno@clisp.org>
parents: 13531
diff changeset
596 @item
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
597 is first in a regular expression, or
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
598
13532
b0bea693e638 Whitespace cleanup.
Bruno Haible <bruno@clisp.org>
parents: 13531
diff changeset
599 @item
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
600 follows a match-beginning-of-line, open-group, or alternation
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
601 operator.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
602
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
603 @end itemize
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
604
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
605 @noindent
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
606 Three different things can happen in these cases:
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
607
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
608 @enumerate
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
609 @item
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
610 If the syntax bit @code{RE_CONTEXT_INVALID_OPS} is set, then the
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
611 regular expression is invalid.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
612
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
613 @item
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
614 If @code{RE_CONTEXT_INVALID_OPS} isn't set, but
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
615 @code{RE_CONTEXT_INDEP_OPS} is, then @samp{*} represents the
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
616 match-zero-or-more operator (which then operates on the empty string).
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
617
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
618 @item
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
619 Otherwise, @samp{*} is ordinary.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
620
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
621 @end enumerate
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
622
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
623 @cindex backtracking
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
624 The matcher processes a match-zero-or-more operator by first matching as
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
625 many repetitions of the smallest preceding regular expression as it can.
13532
b0bea693e638 Whitespace cleanup.
Bruno Haible <bruno@clisp.org>
parents: 13531
diff changeset
626 Then it continues to match the rest of the pattern.
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
627
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
628 If it can't match the rest of the pattern, it backtracks (as many times
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
629 as necessary), each time discarding one of the matches until it can
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
630 either match the entire pattern or be certain that it cannot get a
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
631 match. For example, when matching @samp{ca*ar} against @samp{caaar},
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
632 the matcher first matches all three @samp{a}s of the string with the
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
633 @samp{a*} of the regular expression. However, it cannot then match the
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
634 final @samp{ar} of the regular expression against the final @samp{r} of
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
635 the string. So it backtracks, discarding the match of the last @samp{a}
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
636 in the string. It can then match the remaining @samp{ar}.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
637
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
638
13533
ca70a11e70e2 Integrate the regex documentation.
Bruno Haible <bruno@clisp.org>
parents: 13532
diff changeset
639 @node Match-one-or-more Operator
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
640 @subsection The Match-one-or-more Operator (@code{+} or @code{\+})
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
641
13532
b0bea693e638 Whitespace cleanup.
Bruno Haible <bruno@clisp.org>
parents: 13531
diff changeset
642 @cindex @samp{+}
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
643
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
644 If the syntax bit @code{RE_LIMITED_OPS} is set, then Regex doesn't recognize
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
645 this operator. Otherwise, if the syntax bit @code{RE_BK_PLUS_QM} isn't
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
646 set, then @samp{+} represents this operator; if it is, then @samp{\+}
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
647 does.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
648
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
649 This operator is similar to the match-zero-or-more operator except that
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
650 it repeats the preceding regular expression at least once;
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
651 @pxref{Match-zero-or-more Operator}, for what it operates on, how some
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
652 syntax bits affect it, and how Regex backtracks to match it.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
653
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
654 For example, supposing that @samp{+} represents the match-one-or-more
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
655 operator; then @samp{ca+r} matches, e.g., @samp{car} and
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
656 @samp{caaaar}, but not @samp{cr}.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
657
13533
ca70a11e70e2 Integrate the regex documentation.
Bruno Haible <bruno@clisp.org>
parents: 13532
diff changeset
658 @node Match-zero-or-one Operator
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
659 @subsection The Match-zero-or-one Operator (@code{?} or @code{\?})
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
660 @cindex @samp{?}
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
661
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
662 If the syntax bit @code{RE_LIMITED_OPS} is set, then Regex doesn't
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
663 recognize this operator. Otherwise, if the syntax bit
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
664 @code{RE_BK_PLUS_QM} isn't set, then @samp{?} represents this operator;
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
665 if it is, then @samp{\?} does.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
666
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
667 This operator is similar to the match-zero-or-more operator except that
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
668 it repeats the preceding regular expression once or not at all;
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
669 @pxref{Match-zero-or-more Operator}, to see what it operates on, how
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
670 some syntax bits affect it, and how Regex backtracks to match it.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
671
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
672 For example, supposing that @samp{?} represents the match-zero-or-one
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
673 operator; then @samp{ca?r} matches both @samp{car} and @samp{cr}, but
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
674 nothing else.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
675
13533
ca70a11e70e2 Integrate the regex documentation.
Bruno Haible <bruno@clisp.org>
parents: 13532
diff changeset
676 @node Interval Operators
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
677 @subsection Interval Operators (@code{@{} @dots{} @code{@}} or @code{\@{} @dots{} @code{\@}})
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
678
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
679 @cindex interval expression
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
680 @cindex @samp{@{}
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
681 @cindex @samp{@}}
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
682 @cindex @samp{\@{}
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
683 @cindex @samp{\@}}
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
684
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
685 If the syntax bit @code{RE_INTERVALS} is set, then Regex recognizes
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
686 @dfn{interval expressions}. They repeat the smallest possible preceding
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
687 regular expression a specified number of times.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
688
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
689 If the syntax bit @code{RE_NO_BK_BRACES} is set, @samp{@{} represents
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
690 the @dfn{open-interval operator} and @samp{@}} represents the
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
691 @dfn{close-interval operator} ; otherwise, @samp{\@{} and @samp{\@}} do.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
692
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
693 Specifically, supposing that @samp{@{} and @samp{@}} represent the
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
694 open-interval and close-interval operators; then:
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
695
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
696 @table @code
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
697 @item @{@var{count}@}
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
698 matches exactly @var{count} occurrences of the preceding regular
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
699 expression.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
700
13537
77dd6d58a96b erroneous commas inside @var
Karl Berry <karl@freefriends.org>
parents: 13533
diff changeset
701 @item @{@var{min},@}
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
702 matches @var{min} or more occurrences of the preceding regular
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
703 expression.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
704
13537
77dd6d58a96b erroneous commas inside @var
Karl Berry <karl@freefriends.org>
parents: 13533
diff changeset
705 @item @{@var{min}, @var{max}@}
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
706 matches at least @var{min} but no more than @var{max} occurrences of
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
707 the preceding regular expression.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
708
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
709 @end table
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
710
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
711 The interval expression (but not necessarily the regular expression that
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
712 contains it) is invalid if:
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
713
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
714 @itemize @bullet
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
715 @item
13532
b0bea693e638 Whitespace cleanup.
Bruno Haible <bruno@clisp.org>
parents: 13531
diff changeset
716 @var{min} is greater than @var{max}, or
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
717
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
718 @item
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
719 any of @var{count}, @var{min}, or @var{max} are outside the range
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
720 zero to @code{RE_DUP_MAX} (which symbol @file{regex.h}
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
721 defines).
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
722
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
723 @end itemize
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
724
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
725 If the interval expression is invalid and the syntax bit
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
726 @code{RE_NO_BK_BRACES} is set, then Regex considers all the
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
727 characters in the would-be interval to be ordinary. If that bit
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
728 isn't set, then the regular expression is invalid.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
729
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
730 If the interval expression is valid but there is no preceding regular
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
731 expression on which to operate, then if the syntax bit
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
732 @code{RE_CONTEXT_INVALID_OPS} is set, the regular expression is invalid.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
733 If that bit isn't set, then Regex considers all the characters---other
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
734 than backslashes, which it ignores---in the would-be interval to be
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
735 ordinary.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
736
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
737
13533
ca70a11e70e2 Integrate the regex documentation.
Bruno Haible <bruno@clisp.org>
parents: 13532
diff changeset
738 @node Alternation Operator
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
739 @section The Alternation Operator (@code{|} or @code{\|})
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
740
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
741 @kindex |
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
742 @kindex \|
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
743 @cindex alternation operator
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
744 @cindex or operator
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
745
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
746 If the syntax bit @code{RE_LIMITED_OPS} is set, then Regex doesn't
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
747 recognize this operator. Otherwise, if the syntax bit
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
748 @code{RE_NO_BK_VBAR} is set, then @samp{|} represents this operator;
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
749 otherwise, @samp{\|} does.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
750
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
751 Alternatives match one of a choice of regular expressions:
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
752 if you put the character(s) representing the alternation operator between
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
753 any two regular expressions @var{a} and @var{b}, the result matches
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
754 the union of the strings that @var{a} and @var{b} match. For
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
755 example, supposing that @samp{|} is the alternation operator, then
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
756 @samp{foo|bar|quux} would match any of @samp{foo}, @samp{bar} or
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
757 @samp{quux}.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
758
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
759 The alternation operator operates on the @emph{largest} possible
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
760 surrounding regular expressions. (Put another way, it has the lowest
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
761 precedence of any regular expression operator.)
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
762 Thus, the only way you can
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
763 delimit its arguments is to use grouping. For example, if @samp{(} and
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
764 @samp{)} are the open and close-group operators, then @samp{fo(o|b)ar}
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
765 would match either @samp{fooar} or @samp{fobar}. (@samp{foo|bar} would
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
766 match @samp{foo} or @samp{bar}.)
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
767
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
768 @cindex backtracking
13532
b0bea693e638 Whitespace cleanup.
Bruno Haible <bruno@clisp.org>
parents: 13531
diff changeset
769 The matcher usually tries all combinations of alternatives so as to
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
770 match the longest possible string. For example, when matching
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
771 @samp{(fooq|foo)*(qbarquux|bar)} against @samp{fooqbarquux}, it cannot
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
772 take, say, the first (``depth-first'') combination it could match, since
13532
b0bea693e638 Whitespace cleanup.
Bruno Haible <bruno@clisp.org>
parents: 13531
diff changeset
773 then it would be content to match just @samp{fooqbar}.
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
774
13647
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
775 Note that since the default behavior is to return the leftmost longest
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
776 match, when more than one of a series of alternatives matches the actual
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
777 match will be the longest matching alternative, not necessarily the
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
778 first in the list.
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
779
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
780
13533
ca70a11e70e2 Integrate the regex documentation.
Bruno Haible <bruno@clisp.org>
parents: 13532
diff changeset
781 @node List Operators
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
782 @section List Operators (@code{[} @dots{} @code{]} and @code{[^} @dots{} @code{]})
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
783
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
784 @cindex matching list
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
785 @cindex @samp{[}
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
786 @cindex @samp{]}
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
787 @cindex @samp{^}
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
788 @cindex @samp{-}
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
789 @cindex @samp{\}
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
790 @cindex @samp{[^}
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
791 @cindex nonmatching list
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
792 @cindex matching newline
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
793 @cindex bracket expression
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
794
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
795 @dfn{Lists}, also called @dfn{bracket expressions}, are a set of one or
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
796 more items. An @dfn{item} is a character,
13532
b0bea693e638 Whitespace cleanup.
Bruno Haible <bruno@clisp.org>
parents: 13531
diff changeset
797 a collating symbol, an equivalence class expression,
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
798 a character class expression, or a range expression. The syntax bits
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
799 affect which kinds of items you can put in a list. We explain the last
13647
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
800 four items in subsections below. Empty lists are invalid.
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
801
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
802 A @dfn{matching list} matches a single character represented by one of
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
803 the list items. You form a matching list by enclosing one or more items
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
804 within an @dfn{open-matching-list operator} (represented by @samp{[})
13532
b0bea693e638 Whitespace cleanup.
Bruno Haible <bruno@clisp.org>
parents: 13531
diff changeset
805 and a @dfn{close-list operator} (represented by @samp{]}).
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
806
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
807 For example, @samp{[ab]} matches either @samp{a} or @samp{b}.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
808 @samp{[ad]*} matches the empty string and any string composed of just
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
809 @samp{a}s and @samp{d}s in any order. Regex considers invalid a regular
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
810 expression with a @samp{[} but no matching
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
811 @samp{]}.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
812
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
813 @dfn{Nonmatching lists} are similar to matching lists except that they
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
814 match a single character @emph{not} represented by one of the list
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
815 items. You use an @dfn{open-nonmatching-list operator} (represented by
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
816 @samp{[^}@footnote{Regex therefore doesn't consider the @samp{^} to be
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
817 the first character in the list. If you put a @samp{^} character first
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
818 in (what you think is) a matching list, you'll turn it into a
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
819 nonmatching list.}) instead of an open-matching-list operator to start a
13532
b0bea693e638 Whitespace cleanup.
Bruno Haible <bruno@clisp.org>
parents: 13531
diff changeset
820 nonmatching list.
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
821
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
822 For example, @samp{[^ab]} matches any character except @samp{a} or
13532
b0bea693e638 Whitespace cleanup.
Bruno Haible <bruno@clisp.org>
parents: 13531
diff changeset
823 @samp{b}.
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
824
13647
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
825 If the syntax bit @code{RE_HAT_LISTS_NOT_NEWLINE} is set, then
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
826 nonmatching lists do not match a newline.
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
827
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
828 Most characters lose any special meaning inside a list. The special
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
829 characters inside a list follow.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
830
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
831 @table @samp
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
832 @item ]
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
833 ends the list if it's not the first list item. So, if you want to make
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
834 the @samp{]} character a list item, you must put it first.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
835
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
836 @item \
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
837 quotes the next character if the syntax bit @code{RE_BACKSLASH_ESCAPE_IN_LISTS} is
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
838 set.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
839
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
840 @item [.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
841 represents the open-collating-symbol operator (@pxref{Collating Symbol
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
842 Operators}).
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
843
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
844 @item .]
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
845 represents the close-collating-symbol operator.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
846
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
847 @item [=
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
848 represents the open-equivalence-class operator (@pxref{Equivalence Class
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
849 Operators}).
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
850
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
851 @item =]
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
852 represents the close-equivalence-class operator.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
853
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
854 @item [:
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
855 represents the open-character-class operator (@pxref{Character Class
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
856 Operators}) if the syntax bit @code{RE_CHAR_CLASSES} is set and what
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
857 follows is a valid character class expression.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
858
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
859 @item :]
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
860 represents the close-character-class operator if the syntax bit
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
861 @code{RE_CHAR_CLASSES} is set and what precedes it is an
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
862 open-character-class operator followed by a valid character class name.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
863
13532
b0bea693e638 Whitespace cleanup.
Bruno Haible <bruno@clisp.org>
parents: 13531
diff changeset
864 @item -
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
865 represents the range operator (@pxref{Range Operator}) if it's
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
866 not first or last in a list or the ending point of a range.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
867
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
868 @end table
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
869
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
870 @noindent
13532
b0bea693e638 Whitespace cleanup.
Bruno Haible <bruno@clisp.org>
parents: 13531
diff changeset
871 All other characters are ordinary. For example, @samp{[.*]} matches
b0bea693e638 Whitespace cleanup.
Bruno Haible <bruno@clisp.org>
parents: 13531
diff changeset
872 @samp{.} and @samp{*}.
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
873
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
874 @menu
13647
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
875 * Collating Symbol Operators:: [.elem.]
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
876 * Equivalence Class Operators:: [=class=]
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
877 * Character Class Operators:: [:class:]
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
878 * Range Operator:: start-end
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
879 @end menu
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
880
13647
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
881
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
882 @node Collating Symbol Operators
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
883 @subsection Collating Symbol Operators (@code{[.} @dots{} @code{.]})
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
884
13648
40fe4f708fa8 regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13647
diff changeset
885 Collating symbols can be represented inside lists.
13647
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
886 You form a @dfn{collating symbol} by
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
887 putting a collating element between an @dfn{open-collating-symbol
14774
70d101744577 maint: correct misuse of "a" and "an"
Jim Meyering <meyering@redhat.com>
parents: 13648
diff changeset
888 operator} and a @dfn{close-collating-symbol operator}. @samp{[.}
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
889 represents the open-collating-symbol operator and @samp{.]} represents
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
890 the close-collating-symbol operator. For example, if @samp{ll} is a
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
891 collating element, then @samp{[[.ll.]]} would match @samp{ll}.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
892
13647
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
893 @node Equivalence Class Operators
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
894 @subsection Equivalence Class Operators (@code{[=} @dots{} @code{=]})
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
895 @cindex equivalence class expression in regex
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
896 @cindex @samp{[=} in regex
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
897 @cindex @samp{=]} in regex
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
898
13647
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
899 Regex recognizes equivalence class
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
900 expressions inside lists. A @dfn{equivalence class expression} is a set
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
901 of collating elements which all belong to the same equivalence class.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
902 You form an equivalence class expression by putting a collating
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
903 element between an @dfn{open-equivalence-class operator} and a
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
904 @dfn{close-equivalence-class operator}. @samp{[=} represents the
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
905 open-equivalence-class operator and @samp{=]} represents the
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
906 close-equivalence-class operator. For example, if @samp{a} and @samp{A}
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
907 were an equivalence class, then both @samp{[[=a=]]} and @samp{[[=A=]]}
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
908 would match both @samp{a} and @samp{A}. If the collating element in an
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
909 equivalence class expression isn't part of an equivalence class, then
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
910 the matcher considers the equivalence class expression to be a collating
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
911 symbol.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
912
13533
ca70a11e70e2 Integrate the regex documentation.
Bruno Haible <bruno@clisp.org>
parents: 13532
diff changeset
913 @node Character Class Operators
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
914 @subsection Character Class Operators (@code{[:} @dots{} @code{:]})
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
915
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
916 @cindex character classes
15563
ebf52f657a28 avoid literal : in index entries
Karl Berry <karl@freefriends.org>
parents: 14775
diff changeset
917 @cindex @samp{[colon} in regex
ebf52f657a28 avoid literal : in index entries
Karl Berry <karl@freefriends.org>
parents: 14775
diff changeset
918 @cindex @samp{colon]} in regex
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
919
13647
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
920 If the syntax bit @code{RE_CHAR_CLASSES} is set, then Regex recognizes
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
921 character class expressions inside lists. A @dfn{character class
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
922 expression} matches one character from a given class. You form a
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
923 character class expression by putting a character class name between
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
924 an @dfn{open-character-class operator} (represented by @samp{[:}) and
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
925 a @dfn{close-character-class operator} (represented by @samp{:]}).
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
926 The character class names and their meanings are:
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
927
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
928 @table @code
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
929
13532
b0bea693e638 Whitespace cleanup.
Bruno Haible <bruno@clisp.org>
parents: 13531
diff changeset
930 @item alnum
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
931 letters and digits
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
932
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
933 @item alpha
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
934 letters
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
935
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
936 @item blank
17274
69f030e5cec4 doc: avoid small caps
Paul Eggert <eggert@cs.ucla.edu>
parents: 16358
diff changeset
937 system-dependent; for GNU, a space or tab
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
938
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
939 @item cntrl
17274
69f030e5cec4 doc: avoid small caps
Paul Eggert <eggert@cs.ucla.edu>
parents: 16358
diff changeset
940 control characters (in the ASCII encoding, code 0177 and codes
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
941 less than 040)
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
942
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
943 @item digit
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
944 digits
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
945
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
946 @item graph
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
947 same as @code{print} except omits space
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
948
13532
b0bea693e638 Whitespace cleanup.
Bruno Haible <bruno@clisp.org>
parents: 13531
diff changeset
949 @item lower
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
950 lowercase letters
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
951
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
952 @item print
17274
69f030e5cec4 doc: avoid small caps
Paul Eggert <eggert@cs.ucla.edu>
parents: 16358
diff changeset
953 printable characters (in the ASCII encoding, space
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
954 tilde---codes 040 through 0176)
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
955
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
956 @item punct
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
957 neither control nor alphanumeric characters
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
958
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
959 @item space
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
960 space, carriage return, newline, vertical tab, and form feed
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
961
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
962 @item upper
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
963 uppercase letters
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
964
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
965 @item xdigit
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
966 hexadecimal digits: @code{0}--@code{9}, @code{a}--@code{f}, @code{A}--@code{F}
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
967
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
968 @end table
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
969
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
970 @noindent
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
971 These correspond to the definitions in the C library's @file{<ctype.h>}
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
972 facility. For example, @samp{[:alpha:]} corresponds to the standard
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
973 facility @code{isalpha}. Regex recognizes character class expressions
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
974 only inside of lists; so @samp{[[:alpha:]]} matches any letter, but
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
975 @samp{[:alpha:]} outside of a bracket expression and not followed by a
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
976 repetition operator matches just itself.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
977
13533
ca70a11e70e2 Integrate the regex documentation.
Bruno Haible <bruno@clisp.org>
parents: 13532
diff changeset
978 @node Range Operator
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
979 @subsection The Range Operator (@code{-})
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
980
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
981 Regex recognizes @dfn{range expressions} inside a list. They represent
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
982 those characters
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
983 that fall between two elements in the current collating sequence. You
13532
b0bea693e638 Whitespace cleanup.
Bruno Haible <bruno@clisp.org>
parents: 13531
diff changeset
984 form a range expression by putting a @dfn{range operator} between two
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
985 of any of the following: characters, collating elements, collating symbols,
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
986 and equivalence class expressions. The starting point of the range and
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
987 the ending point of the range don't have to be the same kind of item,
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
988 e.g., the starting point could be a collating element and the ending
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
989 point could be an equivalence class expression. If a range's ending
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
990 point is an equivalence class, then all the collating elements in that
13647
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
991 class will be in the range.@footnote{You can't use a character class for the starting
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
992 or ending point of a range, since a character class is not a single
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
993 character.} @samp{-} represents the range operator. For example,
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
994 @samp{a-f} within a list represents all the characters from @samp{a}
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
995 through @samp{f}
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
996 inclusively.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
997
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
998 If the syntax bit @code{RE_NO_EMPTY_RANGES} is set, then if the range's
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
999 ending point collates less than its starting point, the range (and the
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1000 regular expression containing it) is invalid. For example, the regular
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1001 expression @samp{[z-a]} would be invalid. If this bit isn't set, then
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1002 Regex considers such a range to be empty.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1003
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1004 Since @samp{-} represents the range operator, if you want to make a
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1005 @samp{-} character itself
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1006 a list item, you must do one of the following:
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1007
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1008 @itemize @bullet
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1009 @item
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1010 Put the @samp{-} either first or last in the list.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1011
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1012 @item
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1013 Include a range whose starting point collates strictly lower than
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1014 @samp{-} and whose ending point collates equal or higher. Unless a
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1015 range is the first item in a list, a @samp{-} can't be its starting
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1016 point, but @emph{can} be its ending point. That is because Regex
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1017 considers @samp{-} to be the range operator unless it is preceded by
17274
69f030e5cec4 doc: avoid small caps
Paul Eggert <eggert@cs.ucla.edu>
parents: 16358
diff changeset
1018 another @samp{-}. For example, in the ASCII encoding, @samp{)},
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1019 @samp{*}, @samp{+}, @samp{,}, @samp{-}, @samp{.}, and @samp{/} are
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1020 contiguous characters in the collating sequence. You might think that
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1021 @samp{[)-+--/]} has two ranges: @samp{)-+} and @samp{--/}. Rather, it
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1022 has the ranges @samp{)-+} and @samp{+--}, plus the character @samp{/}, so
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1023 it matches, e.g., @samp{,}, not @samp{.}.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1024
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1025 @item
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1026 Put a range whose starting point is @samp{-} first in the list.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1027
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1028 @end itemize
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1029
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1030 For example, @samp{[-a-z]} matches a lowercase letter or a hyphen (in
17274
69f030e5cec4 doc: avoid small caps
Paul Eggert <eggert@cs.ucla.edu>
parents: 16358
diff changeset
1031 English, in ASCII).
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1032
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1033
13533
ca70a11e70e2 Integrate the regex documentation.
Bruno Haible <bruno@clisp.org>
parents: 13532
diff changeset
1034 @node Grouping Operators
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1035 @section Grouping Operators (@code{(} @dots{} @code{)} or @code{\(} @dots{} @code{\)})
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1036
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1037 @kindex (
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1038 @kindex )
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1039 @kindex \(
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1040 @kindex \)
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1041 @cindex grouping
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1042 @cindex subexpressions
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1043 @cindex parenthesizing
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1044
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1045 A @dfn{group}, also known as a @dfn{subexpression}, consists of an
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1046 @dfn{open-group operator}, any number of other operators, and a
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1047 @dfn{close-group operator}. Regex treats this sequence as a unit, just
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1048 as mathematics and programming languages treat a parenthesized
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1049 expression as a unit.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1050
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1051 Therefore, using @dfn{groups}, you can:
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1052
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1053 @itemize @bullet
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1054 @item
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1055 delimit the argument(s) to an alternation operator (@pxref{Alternation
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1056 Operator}) or a repetition operator (@pxref{Repetition
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1057 Operators}).
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1058
13532
b0bea693e638 Whitespace cleanup.
Bruno Haible <bruno@clisp.org>
parents: 13531
diff changeset
1059 @item
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1060 keep track of the indices of the substring that matched a given group.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1061 @xref{Using Registers}, for a precise explanation.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1062 This lets you:
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1063
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1064 @itemize @bullet
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1065 @item
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1066 use the back-reference operator (@pxref{Back-reference Operator}).
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1067
13532
b0bea693e638 Whitespace cleanup.
Bruno Haible <bruno@clisp.org>
parents: 13531
diff changeset
1068 @item
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1069 use registers (@pxref{Using Registers}).
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1070
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1071 @end itemize
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1072
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1073 @end itemize
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1074
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1075 If the syntax bit @code{RE_NO_BK_PARENS} is set, then @samp{(} represents
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1076 the open-group operator and @samp{)} represents the
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1077 close-group operator; otherwise, @samp{\(} and @samp{\)} do.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1078
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1079 If the syntax bit @code{RE_UNMATCHED_RIGHT_PAREN_ORD} is set and a
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1080 close-group operator has no matching open-group operator, then Regex
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1081 considers it to match @samp{)}.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1082
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1083
13533
ca70a11e70e2 Integrate the regex documentation.
Bruno Haible <bruno@clisp.org>
parents: 13532
diff changeset
1084 @node Back-reference Operator
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1085 @section The Back-reference Operator (@dfn{\}@var{digit})
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1086
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1087 @cindex back references
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1088
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1089 If the syntax bit @code{RE_NO_BK_REF} isn't set, then Regex recognizes
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1090 back references. A back reference matches a specified preceding group.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1091 The back reference operator is represented by @samp{\@var{digit}}
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1092 anywhere after the end of a regular expression's @w{@var{digit}-th}
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1093 group (@pxref{Grouping Operators}).
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1094
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1095 @var{digit} must be between @samp{1} and @samp{9}. The matcher assigns
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1096 numbers 1 through 9 to the first nine groups it encounters. By using
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1097 one of @samp{\1} through @samp{\9} after the corresponding group's
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1098 close-group operator, you can match a substring identical to the
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1099 one that the group does.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1100
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1101 Back references match according to the following (in all examples below,
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1102 @samp{(} represents the open-group, @samp{)} the close-group, @samp{@{}
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1103 the open-interval and @samp{@}} the close-interval operator):
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1104
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1105 @itemize @bullet
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1106 @item
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1107 If the group matches a substring, the back reference matches an
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1108 identical substring. For example, @samp{(a)\1} matches @samp{aa} and
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1109 @samp{(bana)na\1bo\1} matches @samp{bananabanabobana}. Likewise,
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1110 @samp{(.*)\1} matches any (newline-free if the syntax bit
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1111 @code{RE_DOT_NEWLINE} isn't set) string that is composed of two
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1112 identical halves; the @samp{(.*)} matches the first half and the
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1113 @samp{\1} matches the second half.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1114
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1115 @item
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1116 If the group matches more than once (as it might if followed
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1117 by, e.g., a repetition operator), then the back reference matches the
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1118 substring the group @emph{last} matched. For example,
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1119 @samp{((a*)b)*\1\2} matches @samp{aabababa}; first @w{group 1} (the
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1120 outer one) matches @samp{aab} and @w{group 2} (the inner one) matches
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1121 @samp{aa}. Then @w{group 1} matches @samp{ab} and @w{group 2} matches
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1122 @samp{a}. So, @samp{\1} matches @samp{ab} and @samp{\2} matches
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1123 @samp{a}.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1124
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1125 @item
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1126 If the group doesn't participate in a match, i.e., it is part of an
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1127 alternative not taken or a repetition operator allows zero repetitions
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1128 of it, then the back reference makes the whole match fail. For example,
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1129 @samp{(one()|two())-and-(three\2|four\3)} matches @samp{one-and-three}
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1130 and @samp{two-and-four}, but not @samp{one-and-four} or
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1131 @samp{two-and-three}. For example, if the pattern matches
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1132 @samp{one-and-}, then its @w{group 2} matches the empty string and its
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1133 @w{group 3} doesn't participate in the match. So, if it then matches
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1134 @samp{four}, then when it tries to back reference @w{group 3}---which it
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1135 will attempt to do because @samp{\3} follows the @samp{four}---the match
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1136 will fail because @w{group 3} didn't participate in the match.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1137
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1138 @end itemize
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1139
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1140 You can use a back reference as an argument to a repetition operator. For
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1141 example, @samp{(a(b))\2*} matches @samp{a} followed by two or more
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1142 @samp{b}s. Similarly, @samp{(a(b))\2@{3@}} matches @samp{abbbb}.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1143
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1144 If there is no preceding @w{@var{digit}-th} subexpression, the regular
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1145 expression is invalid.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1146
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1147
13533
ca70a11e70e2 Integrate the regex documentation.
Bruno Haible <bruno@clisp.org>
parents: 13532
diff changeset
1148 @node Anchoring Operators
13532
b0bea693e638 Whitespace cleanup.
Bruno Haible <bruno@clisp.org>
parents: 13531
diff changeset
1149 @section Anchoring Operators
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1150
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1151 @cindex anchoring
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1152 @cindex regexp anchoring
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1153
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1154 These operators can constrain a pattern to match only at the beginning or
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1155 end of the entire string or at the beginning or end of a line.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1156
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1157 @menu
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1158 * Match-beginning-of-line Operator:: ^
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1159 * Match-end-of-line Operator:: $
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1160 @end menu
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1161
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1162
13533
ca70a11e70e2 Integrate the regex documentation.
Bruno Haible <bruno@clisp.org>
parents: 13532
diff changeset
1163 @node Match-beginning-of-line Operator
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1164 @subsection The Match-beginning-of-line Operator (@code{^})
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1165
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1166 @kindex ^
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1167 @cindex beginning-of-line operator
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1168 @cindex anchors
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1169
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1170 This operator can match the empty string either at the beginning of the
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1171 string or after a newline character. Thus, it is said to @dfn{anchor}
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1172 the pattern to the beginning of a line.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1173
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1174 In the cases following, @samp{^} represents this operator. (Otherwise,
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1175 @samp{^} is ordinary.)
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1176
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1177 @itemize @bullet
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1178
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1179 @item
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1180 It (the @samp{^}) is first in the pattern, as in @samp{^foo}.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1181
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1182 @cnindex RE_CONTEXT_INDEP_ANCHORS @r{(and @samp{^})}
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1183 @item
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1184 The syntax bit @code{RE_CONTEXT_INDEP_ANCHORS} is set, and it is outside
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1185 a bracket expression.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1186
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1187 @cindex open-group operator and @samp{^}
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1188 @cindex alternation operator and @samp{^}
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1189 @item
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1190 It follows an open-group or alternation operator, as in @samp{a\(^b\)}
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1191 and @samp{a\|^b}. @xref{Grouping Operators}, and @ref{Alternation
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1192 Operator}.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1193
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1194 @end itemize
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1195
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1196 These rules imply that some valid patterns containing @samp{^} cannot be
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1197 matched; for example, @samp{foo^bar} if @code{RE_CONTEXT_INDEP_ANCHORS}
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1198 is set.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1199
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1200 @vindex not_bol @r{field in pattern buffer}
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1201 If the @code{not_bol} field is set in the pattern buffer (@pxref{GNU
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1202 Pattern Buffers}), then @samp{^} fails to match at the beginning of the
13647
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
1203 string. This lets you match against pieces of a line, as you would need to if,
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
1204 say, searching for repeated instances of a given pattern in a line; it
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
1205 would work correctly for patterns both with and without
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
1206 match-beginning-of-line operators.
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1207
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1208
13533
ca70a11e70e2 Integrate the regex documentation.
Bruno Haible <bruno@clisp.org>
parents: 13532
diff changeset
1209 @node Match-end-of-line Operator
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1210 @subsection The Match-end-of-line Operator (@code{$})
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1211
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1212 @kindex $
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1213 @cindex end-of-line operator
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1214 @cindex anchors
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1215
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1216 This operator can match the empty string either at the end of
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1217 the string or before a newline character in the string. Thus, it is
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1218 said to @dfn{anchor} the pattern to the end of a line.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1219
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1220 It is always represented by @samp{$}. For example, @samp{foo$} usually
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1221 matches, e.g., @samp{foo} and, e.g., the first three characters of
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1222 @samp{foo\nbar}.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1223
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1224 Its interaction with the syntax bits and pattern buffer fields is
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1225 exactly the dual of @samp{^}'s; see the previous section. (That is,
13554
3a3b9d29af1b Document not_eol.
Reuben Thomas <rrt@sc3d.org>
parents: 13553
diff changeset
1226 ``@samp{^}'' becomes ``@samp{$}'', ``beginning'' becomes ``end'',
3a3b9d29af1b Document not_eol.
Reuben Thomas <rrt@sc3d.org>
parents: 13553
diff changeset
1227 ``next'' becomes ``previous'', ``after'' becomes ``before'', and
3a3b9d29af1b Document not_eol.
Reuben Thomas <rrt@sc3d.org>
parents: 13553
diff changeset
1228 ``@code{not_bol}'' becomes ``@code{not_eol}''.)
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1229
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1230
13533
ca70a11e70e2 Integrate the regex documentation.
Bruno Haible <bruno@clisp.org>
parents: 13532
diff changeset
1231 @node GNU Operators
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1232 @chapter GNU Operators
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1233
17274
69f030e5cec4 doc: avoid small caps
Paul Eggert <eggert@cs.ucla.edu>
parents: 16358
diff changeset
1234 Following are operators that GNU defines (and POSIX doesn't).
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1235
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1236 @menu
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1237 * Word Operators::
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1238 * Buffer Operators::
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1239 @end menu
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1240
13533
ca70a11e70e2 Integrate the regex documentation.
Bruno Haible <bruno@clisp.org>
parents: 13532
diff changeset
1241 @node Word Operators
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1242 @section Word Operators
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1243
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1244 The operators in this section require Regex to recognize parts of words.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1245 Regex uses a syntax table to determine whether or not a character is
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1246 part of a word, i.e., whether or not it is @dfn{word-constituent}.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1247
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1248 @menu
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1249 * Non-Emacs Syntax Tables::
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1250 * Match-word-boundary Operator:: \b
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1251 * Match-within-word Operator:: \B
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1252 * Match-beginning-of-word Operator:: \<
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1253 * Match-end-of-word Operator:: \>
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1254 * Match-word-constituent Operator:: \w
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1255 * Match-non-word-constituent Operator:: \W
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1256 @end menu
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1257
13533
ca70a11e70e2 Integrate the regex documentation.
Bruno Haible <bruno@clisp.org>
parents: 13532
diff changeset
1258 @node Non-Emacs Syntax Tables
13532
b0bea693e638 Whitespace cleanup.
Bruno Haible <bruno@clisp.org>
parents: 13531
diff changeset
1259 @subsection Non-Emacs Syntax Tables
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1260
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1261 A @dfn{syntax table} is an array indexed by the characters in your
17274
69f030e5cec4 doc: avoid small caps
Paul Eggert <eggert@cs.ucla.edu>
parents: 16358
diff changeset
1262 character set. In the ASCII encoding, therefore, a syntax table
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1263 has 256 elements. Regex always uses a @code{char *} variable
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1264 @code{re_syntax_table} as its syntax table. In some cases, it
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1265 initializes this variable and in others it expects you to initialize it.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1266
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1267 @itemize @bullet
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1268 @item
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1269 If Regex is compiled with the preprocessor symbols @code{emacs} and
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1270 @code{SYNTAX_TABLE} both undefined, then Regex allocates
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1271 @code{re_syntax_table} and initializes an element @var{i} either to
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1272 @code{Sword} (which it defines) if @var{i} is a letter, number, or
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1273 @samp{_}, or to zero if it's not.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1274
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1275 @item
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1276 If Regex is compiled with @code{emacs} undefined but @code{SYNTAX_TABLE}
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1277 defined, then Regex expects you to define a @code{char *} variable
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1278 @code{re_syntax_table} to be a valid syntax table.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1279
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1280 @item
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1281 @xref{Emacs Syntax Tables}, for what happens when Regex is compiled with
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1282 the preprocessor symbol @code{emacs} defined.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1283
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1284 @end itemize
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1285
13533
ca70a11e70e2 Integrate the regex documentation.
Bruno Haible <bruno@clisp.org>
parents: 13532
diff changeset
1286 @node Match-word-boundary Operator
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1287 @subsection The Match-word-boundary Operator (@code{\b})
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1288
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1289 @cindex @samp{\b}
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1290 @cindex word boundaries, matching
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1291
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1292 This operator (represented by @samp{\b}) matches the empty string at
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1293 either the beginning or the end of a word. For example, @samp{\brat\b}
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1294 matches the separate word @samp{rat}.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1295
13533
ca70a11e70e2 Integrate the regex documentation.
Bruno Haible <bruno@clisp.org>
parents: 13532
diff changeset
1296 @node Match-within-word Operator
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1297 @subsection The Match-within-word Operator (@code{\B})
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1298
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1299 @cindex @samp{\B}
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1300
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1301 This operator (represented by @samp{\B}) matches the empty string within
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1302 a word. For example, @samp{c\Brat\Be} matches @samp{crate}, but
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1303 @samp{dirty \Brat} doesn't match @samp{dirty rat}.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1304
13533
ca70a11e70e2 Integrate the regex documentation.
Bruno Haible <bruno@clisp.org>
parents: 13532
diff changeset
1305 @node Match-beginning-of-word Operator
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1306 @subsection The Match-beginning-of-word Operator (@code{\<})
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1307
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1308 @cindex @samp{\<}
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1309
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1310 This operator (represented by @samp{\<}) matches the empty string at the
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1311 beginning of a word.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1312
13533
ca70a11e70e2 Integrate the regex documentation.
Bruno Haible <bruno@clisp.org>
parents: 13532
diff changeset
1313 @node Match-end-of-word Operator
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1314 @subsection The Match-end-of-word Operator (@code{\>})
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1315
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1316 @cindex @samp{\>}
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1317
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1318 This operator (represented by @samp{\>}) matches the empty string at the
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1319 end of a word.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1320
13533
ca70a11e70e2 Integrate the regex documentation.
Bruno Haible <bruno@clisp.org>
parents: 13532
diff changeset
1321 @node Match-word-constituent Operator
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1322 @subsection The Match-word-constituent Operator (@code{\w})
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1323
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1324 @cindex @samp{\w}
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1325
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1326 This operator (represented by @samp{\w}) matches any word-constituent
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1327 character.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1328
13533
ca70a11e70e2 Integrate the regex documentation.
Bruno Haible <bruno@clisp.org>
parents: 13532
diff changeset
1329 @node Match-non-word-constituent Operator
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1330 @subsection The Match-non-word-constituent Operator (@code{\W})
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1331
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1332 @cindex @samp{\W}
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1333
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1334 This operator (represented by @samp{\W}) matches any character that is
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1335 not word-constituent.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1336
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1337
13533
ca70a11e70e2 Integrate the regex documentation.
Bruno Haible <bruno@clisp.org>
parents: 13532
diff changeset
1338 @node Buffer Operators
13532
b0bea693e638 Whitespace cleanup.
Bruno Haible <bruno@clisp.org>
parents: 13531
diff changeset
1339 @section Buffer Operators
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1340
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1341 Following are operators which work on buffers. In Emacs, a @dfn{buffer}
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1342 is, naturally, an Emacs buffer. For other programs, Regex considers the
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1343 entire string to be matched as the buffer.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1344
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1345 @menu
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1346 * Match-beginning-of-buffer Operator:: \`
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1347 * Match-end-of-buffer Operator:: \'
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1348 @end menu
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1349
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1350
13533
ca70a11e70e2 Integrate the regex documentation.
Bruno Haible <bruno@clisp.org>
parents: 13532
diff changeset
1351 @node Match-beginning-of-buffer Operator
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1352 @subsection The Match-beginning-of-buffer Operator (@code{\`})
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1353
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1354 @cindex @samp{\`}
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1355
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1356 This operator (represented by @samp{\`}) matches the empty string at the
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1357 beginning of the buffer.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1358
13533
ca70a11e70e2 Integrate the regex documentation.
Bruno Haible <bruno@clisp.org>
parents: 13532
diff changeset
1359 @node Match-end-of-buffer Operator
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1360 @subsection The Match-end-of-buffer Operator (@code{\'})
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1361
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1362 @cindex @samp{\'}
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1363
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1364 This operator (represented by @samp{\'}) matches the empty string at the
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1365 end of the buffer.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1366
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1367
13533
ca70a11e70e2 Integrate the regex documentation.
Bruno Haible <bruno@clisp.org>
parents: 13532
diff changeset
1368 @node GNU Emacs Operators
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1369 @chapter GNU Emacs Operators
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1370
17274
69f030e5cec4 doc: avoid small caps
Paul Eggert <eggert@cs.ucla.edu>
parents: 16358
diff changeset
1371 Following are operators that GNU defines (and POSIX doesn't)
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1372 that you can use only when Regex is compiled with the preprocessor
13532
b0bea693e638 Whitespace cleanup.
Bruno Haible <bruno@clisp.org>
parents: 13531
diff changeset
1373 symbol @code{emacs} defined.
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1374
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1375 @menu
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1376 * Syntactic Class Operators::
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1377 @end menu
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1378
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1379
13533
ca70a11e70e2 Integrate the regex documentation.
Bruno Haible <bruno@clisp.org>
parents: 13532
diff changeset
1380 @node Syntactic Class Operators
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1381 @section Syntactic Class Operators
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1382
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1383 The operators in this section require Regex to recognize the syntactic
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1384 classes of characters. Regex uses a syntax table to determine this.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1385
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1386 @menu
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1387 * Emacs Syntax Tables::
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1388 * Match-syntactic-class Operator:: \sCLASS
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1389 * Match-not-syntactic-class Operator:: \SCLASS
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1390 @end menu
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1391
13533
ca70a11e70e2 Integrate the regex documentation.
Bruno Haible <bruno@clisp.org>
parents: 13532
diff changeset
1392 @node Emacs Syntax Tables
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1393 @subsection Emacs Syntax Tables
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1394
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1395 A @dfn{syntax table} is an array indexed by the characters in your
17274
69f030e5cec4 doc: avoid small caps
Paul Eggert <eggert@cs.ucla.edu>
parents: 16358
diff changeset
1396 character set. In the ASCII encoding, therefore, a syntax table
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1397 has 256 elements.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1398
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1399 If Regex is compiled with the preprocessor symbol @code{emacs} defined,
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1400 then Regex expects you to define and initialize the variable
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1401 @code{re_syntax_table} to be an Emacs syntax table. Emacs' syntax
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1402 tables are more complicated than Regex's own (@pxref{Non-Emacs Syntax
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1403 Tables}). @xref{Syntax, , Syntax, emacs, The GNU Emacs User's Manual},
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1404 for a description of Emacs' syntax tables.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1405
13533
ca70a11e70e2 Integrate the regex documentation.
Bruno Haible <bruno@clisp.org>
parents: 13532
diff changeset
1406 @node Match-syntactic-class Operator
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1407 @subsection The Match-syntactic-class Operator (@code{\s}@var{class})
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1408
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1409 @cindex @samp{\s}
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1410
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1411 This operator matches any character whose syntactic class is represented
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1412 by a specified character. @samp{\s@var{class}} represents this operator
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1413 where @var{class} is the character representing the syntactic class you
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1414 want. For example, @samp{w} represents the syntactic
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1415 class of word-constituent characters, so @samp{\sw} matches any
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1416 word-constituent character.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1417
13533
ca70a11e70e2 Integrate the regex documentation.
Bruno Haible <bruno@clisp.org>
parents: 13532
diff changeset
1418 @node Match-not-syntactic-class Operator
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1419 @subsection The Match-not-syntactic-class Operator (@code{\S}@var{class})
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1420
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1421 @cindex @samp{\S}
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1422
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1423 This operator is similar to the match-syntactic-class operator except
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1424 that it matches any character whose syntactic class is @emph{not}
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1425 represented by the specified character. @samp{\S@var{class}} represents
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1426 this operator. For example, @samp{w} represents the syntactic class of
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1427 word-constituent characters, so @samp{\Sw} matches any character that is
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1428 not word-constituent.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1429
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1430
13533
ca70a11e70e2 Integrate the regex documentation.
Bruno Haible <bruno@clisp.org>
parents: 13532
diff changeset
1431 @node What Gets Matched?
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1432 @chapter What Gets Matched?
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1433
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1434 Regex usually matches strings according to the ``leftmost longest''
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1435 rule; that is, it chooses the longest of the leftmost matches. This
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1436 does not mean that for a regular expression containing subexpressions
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1437 that it simply chooses the longest match for each subexpression, left to
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1438 right; the overall match must also be the longest possible one.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1439
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1440 For example, @samp{(ac*)(c*d[ac]*)\1} matches @samp{acdacaaa}, not
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1441 @samp{acdac}, as it would if it were to choose the longest match for the
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1442 first subexpression.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1443
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1444
13533
ca70a11e70e2 Integrate the regex documentation.
Bruno Haible <bruno@clisp.org>
parents: 13532
diff changeset
1445 @node Programming with Regex
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1446 @chapter Programming with Regex
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1447
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1448 Here we describe how you use the Regex data structures and functions in
17274
69f030e5cec4 doc: avoid small caps
Paul Eggert <eggert@cs.ucla.edu>
parents: 16358
diff changeset
1449 C programs. Regex has three interfaces: one designed for GNU, one
69f030e5cec4 doc: avoid small caps
Paul Eggert <eggert@cs.ucla.edu>
parents: 16358
diff changeset
1450 compatible with POSIX (as specified by POSIX, draft
69f030e5cec4 doc: avoid small caps
Paul Eggert <eggert@cs.ucla.edu>
parents: 16358
diff changeset
1451 1003.2/D11.2), and one compatible with Berkeley Unix. The
69f030e5cec4 doc: avoid small caps
Paul Eggert <eggert@cs.ucla.edu>
parents: 16358
diff changeset
1452 POSIX interface is not documented here; see the documentation of
69f030e5cec4 doc: avoid small caps
Paul Eggert <eggert@cs.ucla.edu>
parents: 16358
diff changeset
1453 GNU libc, or the POSIX man pages. The Berkeley Unix interface is
13647
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
1454 documented here for convenience, since its documentation is not
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
1455 otherwise readily available on GNU systems.
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1456
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1457 @menu
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1458 * GNU Regex Functions::
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1459 * BSD Regex Functions::
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1460 @end menu
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1461
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1462
13533
ca70a11e70e2 Integrate the regex documentation.
Bruno Haible <bruno@clisp.org>
parents: 13532
diff changeset
1463 @node GNU Regex Functions
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1464 @section GNU Regex Functions
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1465
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1466 If you're writing code that doesn't need to be compatible with either
17274
69f030e5cec4 doc: avoid small caps
Paul Eggert <eggert@cs.ucla.edu>
parents: 16358
diff changeset
1467 POSIX or Berkeley Unix, you can use these functions. They
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1468 provide more options than the other interfaces.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1469
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1470 @menu
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1471 * GNU Pattern Buffers:: The re_pattern_buffer type.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1472 * GNU Regular Expression Compiling:: re_compile_pattern ()
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1473 * GNU Matching:: re_match ()
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1474 * GNU Searching:: re_search ()
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1475 * Matching/Searching with Split Data:: re_match_2 (), re_search_2 ()
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1476 * Searching with Fastmaps:: re_compile_fastmap ()
16236
8d0c35a0ae1d doc: fix minor quoting issues, mostly with `
Paul Eggert <eggert@cs.ucla.edu>
parents: 15563
diff changeset
1477 * GNU Translate Tables:: The @code{translate} field.
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1478 * Using Registers:: The re_registers type and related fns.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1479 * Freeing GNU Pattern Buffers:: regfree ()
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1480 @end menu
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1481
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1482
13533
ca70a11e70e2 Integrate the regex documentation.
Bruno Haible <bruno@clisp.org>
parents: 13532
diff changeset
1483 @node GNU Pattern Buffers
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1484 @subsection GNU Pattern Buffers
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1485
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1486 @cindex pattern buffer, definition of
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1487 @tindex re_pattern_buffer @r{definition}
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1488 @tindex struct re_pattern_buffer @r{definition}
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1489
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1490 To compile, match, or search for a given regular expression, you must
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1491 supply a pattern buffer. A @dfn{pattern buffer} holds one compiled
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1492 regular expression.@footnote{Regular expressions are also referred to as
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1493 ``patterns,'' hence the name ``pattern buffer.''}
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1494
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1495 You can have several different pattern buffers simultaneously, each
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1496 holding a compiled pattern for a different regular expression.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1497
13647
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
1498 @file{regex.h} defines the pattern buffer @code{struct} with the
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
1499 following public fields:
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1500
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1501 @example
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1502 unsigned char *buffer;
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1503 unsigned long allocated;
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1504 char *fastmap;
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1505 char *translate;
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1506 size_t re_nsub;
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1507 unsigned no_sub : 1;
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1508 unsigned not_bol : 1;
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1509 unsigned not_eol : 1;
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1510 @end example
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1511
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1512
13533
ca70a11e70e2 Integrate the regex documentation.
Bruno Haible <bruno@clisp.org>
parents: 13532
diff changeset
1513 @node GNU Regular Expression Compiling
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1514 @subsection GNU Regular Expression Compiling
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1515
17274
69f030e5cec4 doc: avoid small caps
Paul Eggert <eggert@cs.ucla.edu>
parents: 16358
diff changeset
1516 In GNU, you can both match and search for a given regular
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1517 expression. To do either, you must first compile it in a pattern buffer
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1518 (@pxref{GNU Pattern Buffers}).
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1519
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1520 @cindex syntax initialization
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1521 @vindex re_syntax_options @r{initialization}
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1522 Regular expressions match according to the syntax with which they were
17274
69f030e5cec4 doc: avoid small caps
Paul Eggert <eggert@cs.ucla.edu>
parents: 16358
diff changeset
1523 compiled; with GNU, you indicate what syntax you want by setting
13553
8fc3314fe460 Document not_eol and remove mention of regex.c.
Reuben Thomas <rrt@sc3d.org>
parents: 13549
diff changeset
1524 the variable @code{re_syntax_options} (declared in @file{regex.h})
8fc3314fe460 Document not_eol and remove mention of regex.c.
Reuben Thomas <rrt@sc3d.org>
parents: 13549
diff changeset
1525 before calling the compiling function, @code{re_compile_pattern} (see
8fc3314fe460 Document not_eol and remove mention of regex.c.
Reuben Thomas <rrt@sc3d.org>
parents: 13549
diff changeset
1526 below). @xref{Syntax Bits}, and @ref{Predefined Syntaxes}.
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1527
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1528 You can change the value of @code{re_syntax_options} at any time.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1529 Usually, however, you set its value once and then never change it.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1530
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1531 @cindex pattern buffer initialization
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1532 @code{re_compile_pattern} takes a pattern buffer as an argument. You
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1533 must initialize the following fields:
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1534
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1535 @table @code
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1536
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1537 @item translate @r{initialization}
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1538
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1539 @item translate
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1540 @vindex translate @r{initialization}
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1541 Initialize this to point to a translate table if you want one, or to
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1542 zero if you don't. We explain translate tables in @ref{GNU Translate
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1543 Tables}.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1544
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1545 @item fastmap
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1546 @vindex fastmap @r{initialization}
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1547 Initialize this to nonzero if you want a fastmap, or to zero if you
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1548 don't.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1549
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1550 @item buffer
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1551 @itemx allocated
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1552 @vindex buffer @r{initialization}
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1553 @vindex allocated @r{initialization}
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1554 @findex malloc
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1555 If you want @code{re_compile_pattern} to allocate memory for the
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1556 compiled pattern, set both of these to zero. If you have an existing
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1557 block of memory (allocated with @code{malloc}) you want Regex to use,
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1558 set @code{buffer} to its address and @code{allocated} to its size (in
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1559 bytes).
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1560
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1561 @code{re_compile_pattern} uses @code{realloc} to extend the space for
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1562 the compiled pattern as necessary.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1563
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1564 @end table
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1565
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1566 To compile a pattern buffer, use:
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1567
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1568 @findex re_compile_pattern
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1569 @example
13532
b0bea693e638 Whitespace cleanup.
Bruno Haible <bruno@clisp.org>
parents: 13531
diff changeset
1570 char *
b0bea693e638 Whitespace cleanup.
Bruno Haible <bruno@clisp.org>
parents: 13531
diff changeset
1571 re_compile_pattern (const char *@var{regex}, const int @var{regex_size},
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1572 struct re_pattern_buffer *@var{pattern_buffer})
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1573 @end example
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1574
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1575 @noindent
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1576 @var{regex} is the regular expression's address, @var{regex_size} is its
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1577 length, and @var{pattern_buffer} is the pattern buffer's address.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1578
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1579 If @code{re_compile_pattern} successfully compiles the regular
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1580 expression, it returns zero and sets @code{*@var{pattern_buffer}} to the
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1581 compiled pattern. It sets the pattern buffer's fields as follows:
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1582
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1583 @table @code
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1584 @item buffer
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1585 @vindex buffer @r{field, set by @code{re_compile_pattern}}
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1586 to the compiled pattern.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1587
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1588 @item syntax
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1589 @vindex syntax @r{field, set by @code{re_compile_pattern}}
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1590 to the current value of @code{re_syntax_options}.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1591
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1592 @item re_nsub
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1593 @vindex re_nsub @r{field, set by @code{re_compile_pattern}}
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1594 to the number of subexpressions in @var{regex}.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1595
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1596 @end table
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1597
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1598 If @code{re_compile_pattern} can't compile @var{regex}, it returns an
17274
69f030e5cec4 doc: avoid small caps
Paul Eggert <eggert@cs.ucla.edu>
parents: 16358
diff changeset
1599 error string corresponding to a POSIX error code.
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1600
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1601
13533
ca70a11e70e2 Integrate the regex documentation.
Bruno Haible <bruno@clisp.org>
parents: 13532
diff changeset
1602 @node GNU Matching
13532
b0bea693e638 Whitespace cleanup.
Bruno Haible <bruno@clisp.org>
parents: 13531
diff changeset
1603 @subsection GNU Matching
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1604
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1605 @cindex matching with GNU functions
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1606
17274
69f030e5cec4 doc: avoid small caps
Paul Eggert <eggert@cs.ucla.edu>
parents: 16358
diff changeset
1607 Matching the GNU way means trying to match as much of a string as
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1608 possible starting at a position within it you specify. Once you've compiled
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1609 a pattern into a pattern buffer (@pxref{GNU Regular Expression
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1610 Compiling}), you can ask the matcher to match that pattern against a
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1611 string using:
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1612
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1613 @findex re_match
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1614 @example
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1615 int
13532
b0bea693e638 Whitespace cleanup.
Bruno Haible <bruno@clisp.org>
parents: 13531
diff changeset
1616 re_match (struct re_pattern_buffer *@var{pattern_buffer},
b0bea693e638 Whitespace cleanup.
Bruno Haible <bruno@clisp.org>
parents: 13531
diff changeset
1617 const char *@var{string}, const int @var{size},
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1618 const int @var{start}, struct re_registers *@var{regs})
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1619 @end example
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1620
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1621 @noindent
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1622 @var{pattern_buffer} is the address of a pattern buffer containing a
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1623 compiled pattern. @var{string} is the string you want to match; it can
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1624 contain newline and null characters. @var{size} is the length of that
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1625 string. @var{start} is the string index at which you want to
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1626 begin matching; the first character of @var{string} is at index zero.
14775
a152da4489c4 maint: replace misused "a" with "an"
Jim Meyering <meyering@redhat.com>
parents: 14774
diff changeset
1627 @xref{Using Registers}, for an explanation of @var{regs}; you can safely
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1628 pass zero.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1629
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1630 @code{re_match} matches the regular expression in @var{pattern_buffer}
13647
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
1631 against the string @var{string} according to the syntax of
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
1632 @var{pattern_buffer}. (@xref{GNU Regular Expression Compiling}, for how
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
1633 to set it.) The function returns @math{-1} if the compiled pattern does
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
1634 not match any part of @var{string} and @math{-2} if an internal error
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
1635 happens; otherwise, it returns how many (possibly zero) characters of
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
1636 @var{string} the pattern matched.
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1637
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1638 An example: suppose @var{pattern_buffer} points to a pattern buffer
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1639 containing the compiled pattern for @samp{a*}, and @var{string} points
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1640 to @samp{aaaaab} (whereupon @var{size} should be 6). Then if @var{start}
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1641 is 2, @code{re_match} returns 3, i.e., @samp{a*} would have matched the
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1642 last three @samp{a}s in @var{string}. If @var{start} is 0,
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1643 @code{re_match} returns 5, i.e., @samp{a*} would have matched all the
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1644 @samp{a}s in @var{string}. If @var{start} is either 5 or 6, it returns
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1645 zero.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1646
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1647 If @var{start} is not between zero and @var{size}, then
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1648 @code{re_match} returns @math{-1}.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1649
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1650
13533
ca70a11e70e2 Integrate the regex documentation.
Bruno Haible <bruno@clisp.org>
parents: 13532
diff changeset
1651 @node GNU Searching
13532
b0bea693e638 Whitespace cleanup.
Bruno Haible <bruno@clisp.org>
parents: 13531
diff changeset
1652 @subsection GNU Searching
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1653
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1654 @cindex searching with GNU functions
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1655
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1656 @dfn{Searching} means trying to match starting at successive positions
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1657 within a string. The function @code{re_search} does this.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1658
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1659 Before calling @code{re_search}, you must compile your regular
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1660 expression. @xref{GNU Regular Expression Compiling}.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1661
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1662 Here is the function declaration:
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1663
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1664 @findex re_search
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1665 @example
13532
b0bea693e638 Whitespace cleanup.
Bruno Haible <bruno@clisp.org>
parents: 13531
diff changeset
1666 int
b0bea693e638 Whitespace cleanup.
Bruno Haible <bruno@clisp.org>
parents: 13531
diff changeset
1667 re_search (struct re_pattern_buffer *@var{pattern_buffer},
b0bea693e638 Whitespace cleanup.
Bruno Haible <bruno@clisp.org>
parents: 13531
diff changeset
1668 const char *@var{string}, const int @var{size},
b0bea693e638 Whitespace cleanup.
Bruno Haible <bruno@clisp.org>
parents: 13531
diff changeset
1669 const int @var{start}, const int @var{range},
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1670 struct re_registers *@var{regs})
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1671 @end example
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1672
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1673 @noindent
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1674 @vindex start @r{argument to @code{re_search}}
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1675 @vindex range @r{argument to @code{re_search}}
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1676 whose arguments are the same as those to @code{re_match} (@pxref{GNU
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1677 Matching}) except that the two arguments @var{start} and @var{range}
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1678 replace @code{re_match}'s argument @var{start}.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1679
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1680 If @var{range} is positive, then @code{re_search} attempts a match
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1681 starting first at index @var{start}, then at @math{@var{start} + 1} if
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1682 that fails, and so on, up to @math{@var{start} + @var{range}}; if
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1683 @var{range} is negative, then it attempts a match starting first at
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1684 index @var{start}, then at @math{@var{start} -1} if that fails, and so
13532
b0bea693e638 Whitespace cleanup.
Bruno Haible <bruno@clisp.org>
parents: 13531
diff changeset
1685 on.
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1686
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1687 If @var{start} is not between zero and @var{size}, then @code{re_search}
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1688 returns @math{-1}. When @var{range} is positive, @code{re_search}
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1689 adjusts @var{range} so that @math{@var{start} + @var{range} - 1} is
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1690 between zero and @var{size}, if necessary; that way it won't search
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1691 outside of @var{string}. Similarly, when @var{range} is negative,
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1692 @code{re_search} adjusts @var{range} so that @math{@var{start} +
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1693 @var{range} + 1} is between zero and @var{size}, if necessary.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1694
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1695 If the @code{fastmap} field of @var{pattern_buffer} is zero,
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1696 @code{re_search} matches starting at consecutive positions; otherwise,
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1697 it uses @code{fastmap} to make the search more efficient.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1698 @xref{Searching with Fastmaps}.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1699
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1700 If no match is found, @code{re_search} returns @math{-1}. If
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1701 a match is found, it returns the index where the match began. If an
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1702 internal error happens, it returns @math{-2}.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1703
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1704
13533
ca70a11e70e2 Integrate the regex documentation.
Bruno Haible <bruno@clisp.org>
parents: 13532
diff changeset
1705 @node Matching/Searching with Split Data
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1706 @subsection Matching and Searching with Split Data
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1707
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1708 Using the functions @code{re_match_2} and @code{re_search_2}, you can
13532
b0bea693e638 Whitespace cleanup.
Bruno Haible <bruno@clisp.org>
parents: 13531
diff changeset
1709 match or search in data that is divided into two strings.
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1710
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1711 The function:
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1712
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1713 @findex re_match_2
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1714 @example
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1715 int
13532
b0bea693e638 Whitespace cleanup.
Bruno Haible <bruno@clisp.org>
parents: 13531
diff changeset
1716 re_match_2 (struct re_pattern_buffer *@var{buffer},
b0bea693e638 Whitespace cleanup.
Bruno Haible <bruno@clisp.org>
parents: 13531
diff changeset
1717 const char *@var{string1}, const int @var{size1},
b0bea693e638 Whitespace cleanup.
Bruno Haible <bruno@clisp.org>
parents: 13531
diff changeset
1718 const char *@var{string2}, const int @var{size2},
b0bea693e638 Whitespace cleanup.
Bruno Haible <bruno@clisp.org>
parents: 13531
diff changeset
1719 const int @var{start},
b0bea693e638 Whitespace cleanup.
Bruno Haible <bruno@clisp.org>
parents: 13531
diff changeset
1720 struct re_registers *@var{regs},
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1721 const int @var{stop})
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1722 @end example
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1723
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1724 @noindent
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1725 is similar to @code{re_match} (@pxref{GNU Matching}) except that you
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1726 pass @emph{two} data strings and sizes, and an index @var{stop} beyond
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1727 which you don't want the matcher to try matching. As with
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1728 @code{re_match}, if it succeeds, @code{re_match_2} returns how many
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1729 characters of @var{string} it matched. Regard @var{string1} and
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1730 @var{string2} as concatenated when you set the arguments @var{start} and
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1731 @var{stop} and use the contents of @var{regs}; @code{re_match_2} never
13532
b0bea693e638 Whitespace cleanup.
Bruno Haible <bruno@clisp.org>
parents: 13531
diff changeset
1732 returns a value larger than @math{@var{size1} + @var{size2}}.
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1733
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1734 The function:
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1735
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1736 @findex re_search_2
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1737 @example
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1738 int
13532
b0bea693e638 Whitespace cleanup.
Bruno Haible <bruno@clisp.org>
parents: 13531
diff changeset
1739 re_search_2 (struct re_pattern_buffer *@var{buffer},
b0bea693e638 Whitespace cleanup.
Bruno Haible <bruno@clisp.org>
parents: 13531
diff changeset
1740 const char *@var{string1}, const int @var{size1},
b0bea693e638 Whitespace cleanup.
Bruno Haible <bruno@clisp.org>
parents: 13531
diff changeset
1741 const char *@var{string2}, const int @var{size2},
b0bea693e638 Whitespace cleanup.
Bruno Haible <bruno@clisp.org>
parents: 13531
diff changeset
1742 const int @var{start}, const int @var{range},
b0bea693e638 Whitespace cleanup.
Bruno Haible <bruno@clisp.org>
parents: 13531
diff changeset
1743 struct re_registers *@var{regs},
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1744 const int @var{stop})
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1745 @end example
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1746
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1747 @noindent
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1748 is similarly related to @code{re_search}.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1749
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1750
13533
ca70a11e70e2 Integrate the regex documentation.
Bruno Haible <bruno@clisp.org>
parents: 13532
diff changeset
1751 @node Searching with Fastmaps
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1752 @subsection Searching with Fastmaps
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1753
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1754 @cindex fastmaps
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1755 If you're searching through a long string, you should use a fastmap.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1756 Without one, the searcher tries to match at consecutive positions in the
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1757 string. Generally, most of the characters in the string could not start
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1758 a match. It takes much longer to try matching at a given position in the
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1759 string than it does to check in a table whether or not the character at
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1760 that position could start a match. A @dfn{fastmap} is such a table.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1761
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1762 More specifically, a fastmap is an array indexed by the characters in
17274
69f030e5cec4 doc: avoid small caps
Paul Eggert <eggert@cs.ucla.edu>
parents: 16358
diff changeset
1763 your character set. Under the ASCII encoding, therefore, a fastmap
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1764 has 256 elements. If you want the searcher to use a fastmap with a
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1765 given pattern buffer, you must allocate the array and assign the array's
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1766 address to the pattern buffer's @code{fastmap} field. You either can
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1767 compile the fastmap yourself or have @code{re_search} do it for you;
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1768 when @code{fastmap} is nonzero, it automatically compiles a fastmap the
13532
b0bea693e638 Whitespace cleanup.
Bruno Haible <bruno@clisp.org>
parents: 13531
diff changeset
1769 first time you search using a particular compiled pattern.
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1770
13647
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
1771 By setting the buffer’s @code{fastmap} field before calling
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
1772 @code{re_compile_pattern}, you can reuse a buffer data structure across
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
1773 multiple searches with different patterns, and allocate the fastmap only
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
1774 once. Nonetheless, the fastmap must be recompiled each time the buffer
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
1775 has a new pattern compiled into it.
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
1776
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1777 To compile a fastmap yourself, use:
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1778
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1779 @findex re_compile_fastmap
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1780 @example
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1781 int
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1782 re_compile_fastmap (struct re_pattern_buffer *@var{pattern_buffer})
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1783 @end example
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1784
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1785 @noindent
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1786 @var{pattern_buffer} is the address of a pattern buffer. If the
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1787 character @var{c} could start a match for the pattern,
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1788 @code{re_compile_fastmap} makes
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1789 @code{@var{pattern_buffer}->fastmap[@var{c}]} nonzero. It returns
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1790 @math{0} if it can compile a fastmap and @math{-2} if there is an
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1791 internal error. For example, if @samp{|} is the alternation operator
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1792 and @var{pattern_buffer} holds the compiled pattern for @samp{a|b}, then
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1793 @code{re_compile_fastmap} sets @code{fastmap['a']} and
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1794 @code{fastmap['b']} (and no others).
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1795
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1796 @code{re_search} uses a fastmap as it moves along in the string: it
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1797 checks the string's characters until it finds one that's in the fastmap.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1798 Then it tries matching at that character. If the match fails, it
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1799 repeats the process. So, by using a fastmap, @code{re_search} doesn't
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1800 waste time trying to match at positions in the string that couldn't
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1801 start a match.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1802
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1803 If you don't want @code{re_search} to use a fastmap,
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1804 store zero in the @code{fastmap} field of the pattern buffer before
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1805 calling @code{re_search}.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1806
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1807 Once you've initialized a pattern buffer's @code{fastmap} field, you
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1808 need never do so again---even if you compile a new pattern in
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1809 it---provided the way the field is set still reflects whether or not you
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1810 want a fastmap. @code{re_search} will still either do nothing if
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1811 @code{fastmap} is null or, if it isn't, compile a new fastmap for the
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1812 new pattern.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1813
13533
ca70a11e70e2 Integrate the regex documentation.
Bruno Haible <bruno@clisp.org>
parents: 13532
diff changeset
1814 @node GNU Translate Tables
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1815 @subsection GNU Translate Tables
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1816
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1817 If you set the @code{translate} field of a pattern buffer to a translate
17274
69f030e5cec4 doc: avoid small caps
Paul Eggert <eggert@cs.ucla.edu>
parents: 16358
diff changeset
1818 table, then the GNU Regex functions to which you've passed that
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1819 pattern buffer use it to apply a simple transformation
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1820 to all the regular expression and string characters at which they look.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1821
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1822 A @dfn{translate table} is an array indexed by the characters in your
17274
69f030e5cec4 doc: avoid small caps
Paul Eggert <eggert@cs.ucla.edu>
parents: 16358
diff changeset
1823 character set. Under the ASCII encoding, therefore, a translate
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1824 table has 256 elements. The array's elements are also characters in
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1825 your character set. When the Regex functions see a character @var{c},
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1826 they use @code{translate[@var{c}]} in its place, with one exception: the
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1827 character after a @samp{\} is not translated. (This ensures that, the
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1828 operators, e.g., @samp{\B} and @samp{\b}, are always distinguishable.)
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1829
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1830 For example, a table that maps all lowercase letters to the
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1831 corresponding uppercase ones would cause the matcher to ignore
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1832 differences in case.@footnote{A table that maps all uppercase letters to
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1833 the corresponding lowercase ones would work just as well for this
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1834 purpose.} Such a table would map all characters except lowercase letters
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1835 to themselves, and lowercase letters to the corresponding uppercase
17274
69f030e5cec4 doc: avoid small caps
Paul Eggert <eggert@cs.ucla.edu>
parents: 16358
diff changeset
1836 ones. Under the ASCII encoding, here's how you could initialize
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1837 such a table (we'll call it @code{case_fold}):
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1838
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1839 @example
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1840 for (i = 0; i < 256; i++)
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1841 case_fold[i] = i;
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1842 for (i = 'a'; i <= 'z'; i++)
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1843 case_fold[i] = i - ('a' - 'A');
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1844 @end example
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1845
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1846 You tell Regex to use a translate table on a given pattern buffer by
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1847 assigning that table's address to the @code{translate} field of that
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1848 buffer. If you don't want Regex to do any translation, put zero into
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1849 this field. You'll get weird results if you change the table's contents
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1850 anytime between compiling the pattern buffer, compiling its fastmap, and
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1851 matching or searching with the pattern buffer.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1852
13533
ca70a11e70e2 Integrate the regex documentation.
Bruno Haible <bruno@clisp.org>
parents: 13532
diff changeset
1853 @node Using Registers
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1854 @subsection Using Registers
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1855
16358
a712776b11ce maint: spelling fixes
Paul Eggert <eggert@cs.ucla.edu>
parents: 16236
diff changeset
1856 A group in a regular expression can match a (possibly empty) substring
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1857 of the string that regular expression as a whole matched. The matcher
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1858 remembers the beginning and end of the substring matched by
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1859 each group.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1860
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1861 To find out what they matched, pass a nonzero @var{regs} argument to a
17274
69f030e5cec4 doc: avoid small caps
Paul Eggert <eggert@cs.ucla.edu>
parents: 16358
diff changeset
1862 GNU matching or searching function (@pxref{GNU Matching} and
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1863 @ref{GNU Searching}), i.e., the address of a structure of this type, as
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1864 defined in @file{regex.h}:
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1865
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1866 @c We don't bother to include this directly from regex.h,
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1867 @c since it changes so rarely.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1868 @example
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1869 @tindex re_registers
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1870 @vindex num_regs @r{in @code{struct re_registers}}
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1871 @vindex start @r{in @code{struct re_registers}}
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1872 @vindex end @r{in @code{struct re_registers}}
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1873 struct re_registers
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1874 @{
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1875 unsigned num_regs;
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1876 regoff_t *start;
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1877 regoff_t *end;
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1878 @};
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1879 @end example
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1880
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1881 Except for (possibly) the @var{num_regs}'th element (see below), the
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1882 @var{i}th element of the @code{start} and @code{end} arrays records
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1883 information about the @var{i}th group in the pattern. (They're declared
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1884 as C pointers, but this is only because not all C compilers accept
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1885 zero-length arrays; conceptually, it is simplest to think of them as
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1886 arrays.)
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1887
13647
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
1888 The @code{start} and @code{end} arrays are allocated in one of two ways.
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1889 The simplest and perhaps most useful is to let the matcher (re)allocate
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1890 enough space to record information for all the groups in the regular
13647
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
1891 expression. If @code{re_set_registers} is not called before searching
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
1892 or matching, then the matcher allocates two arrays each of @math{1 +
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
1893 @var{re_nsub}} elements (@var{re_nsub} is another field in the pattern
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
1894 buffer; @pxref{GNU Pattern Buffers}). The extra element is set to
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
1895 @math{-1}. Then on subsequent calls with the same pattern buffer and
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
1896 @var{regs} arguments, the matcher reallocates more space if necessary.
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
1897
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
1898 The function:
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
1899
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
1900 @findex re_set_registers
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
1901 @example
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
1902 void
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
1903 re_set_registers (struct re_pattern_buffer *@var{buffer},
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
1904 struct re_registers *@var{regs},
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
1905 size_t @var{num_regs},
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
1906 regoff_t *@var{starts}, regoff_t *@var{ends})
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
1907 @end example
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
1908
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
1909 @noindent sets @var{regs} to hold @var{num_regs} registers, storing
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
1910 them in @var{starts} and @var{ends}. Subsequent matches using
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
1911 @var{buffer} and @var{regs} will use this memory for recording
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
1912 register information. @var{starts} and @var{ends} must be allocated
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
1913 with malloc, and must each be at least @math{@var{num_regs} *
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
1914 @code{sizeof (regoff_t)}} bytes long.
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
1915
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
1916 If @var{num_regs} is zero, then subsequent matches should allocate
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
1917 their own register data.
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
1918
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
1919 Unless this function is called, the first search or match using
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
1920 @var{buffer} will allocate its own register data, without freeing the
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
1921 old data.
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1922
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1923 The following examples illustrate the information recorded in the
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1924 @code{re_registers} structure. (In all of them, @samp{(} represents the
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1925 open-group and @samp{)} the close-group operator. The first character
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1926 in the string @var{string} is at index 0.)
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1927
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1928 @itemize @bullet
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1929
13532
b0bea693e638 Whitespace cleanup.
Bruno Haible <bruno@clisp.org>
parents: 13531
diff changeset
1930 @item
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1931 If the regular expression has an @w{@var{i}-th}
13647
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
1932 group that matches a
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1933 substring of @var{string}, then the function sets
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1934 @code{@w{@var{regs}->}start[@var{i}]} to the index in @var{string} where
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1935 the substring matched by the @w{@var{i}-th} group begins, and
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1936 @code{@w{@var{regs}->}end[@var{i}]} to the index just beyond that
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1937 substring's end. The function sets @code{@w{@var{regs}->}start[0]} and
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1938 @code{@w{@var{regs}->}end[0]} to analogous information about the entire
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1939 pattern.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1940
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1941 For example, when you match @samp{((a)(b))} against @samp{ab}, you get:
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1942
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1943 @itemize
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1944 @item
13532
b0bea693e638 Whitespace cleanup.
Bruno Haible <bruno@clisp.org>
parents: 13531
diff changeset
1945 0 in @code{@w{@var{regs}->}start[0]} and 2 in @code{@w{@var{regs}->}end[0]}
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1946
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1947 @item
13532
b0bea693e638 Whitespace cleanup.
Bruno Haible <bruno@clisp.org>
parents: 13531
diff changeset
1948 0 in @code{@w{@var{regs}->}start[1]} and 2 in @code{@w{@var{regs}->}end[1]}
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1949
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1950 @item
13532
b0bea693e638 Whitespace cleanup.
Bruno Haible <bruno@clisp.org>
parents: 13531
diff changeset
1951 0 in @code{@w{@var{regs}->}start[2]} and 1 in @code{@w{@var{regs}->}end[2]}
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1952
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1953 @item
13532
b0bea693e638 Whitespace cleanup.
Bruno Haible <bruno@clisp.org>
parents: 13531
diff changeset
1954 1 in @code{@w{@var{regs}->}start[3]} and 2 in @code{@w{@var{regs}->}end[3]}
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1955 @end itemize
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1956
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1957 @item
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1958 If a group matches more than once (as it might if followed by,
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1959 e.g., a repetition operator), then the function reports the information
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1960 about what the group @emph{last} matched.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1961
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1962 For example, when you match the pattern @samp{(a)*} against the string
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1963 @samp{aa}, you get:
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1964
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1965 @itemize
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1966 @item
13532
b0bea693e638 Whitespace cleanup.
Bruno Haible <bruno@clisp.org>
parents: 13531
diff changeset
1967 0 in @code{@w{@var{regs}->}start[0]} and 2 in @code{@w{@var{regs}->}end[0]}
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1968
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1969 @item
13532
b0bea693e638 Whitespace cleanup.
Bruno Haible <bruno@clisp.org>
parents: 13531
diff changeset
1970 1 in @code{@w{@var{regs}->}start[1]} and 2 in @code{@w{@var{regs}->}end[1]}
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1971 @end itemize
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1972
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1973 @item
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1974 If the @w{@var{i}-th} group does not participate in a
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1975 successful match, e.g., it is an alternative not taken or a
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1976 repetition operator allows zero repetitions of it, then the function
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1977 sets @code{@w{@var{regs}->}start[@var{i}]} and
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1978 @code{@w{@var{regs}->}end[@var{i}]} to @math{-1}.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1979
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1980 For example, when you match the pattern @samp{(a)*b} against
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1981 the string @samp{b}, you get:
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1982
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1983 @itemize
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1984 @item
13532
b0bea693e638 Whitespace cleanup.
Bruno Haible <bruno@clisp.org>
parents: 13531
diff changeset
1985 0 in @code{@w{@var{regs}->}start[0]} and 1 in @code{@w{@var{regs}->}end[0]}
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1986
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1987 @item
13532
b0bea693e638 Whitespace cleanup.
Bruno Haible <bruno@clisp.org>
parents: 13531
diff changeset
1988 @math{-1} in @code{@w{@var{regs}->}start[1]} and @math{-1} in @code{@w{@var{regs}->}end[1]}
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1989 @end itemize
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1990
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1991 @item
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1992 If the @w{@var{i}-th} group matches a zero-length string, then the
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1993 function sets @code{@w{@var{regs}->}start[@var{i}]} and
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1994 @code{@w{@var{regs}->}end[@var{i}]} to the index just beyond that
13532
b0bea693e638 Whitespace cleanup.
Bruno Haible <bruno@clisp.org>
parents: 13531
diff changeset
1995 zero-length string.
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1996
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1997 For example, when you match the pattern @samp{(a*)b} against the string
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1998 @samp{b}, you get:
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
1999
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2000 @itemize
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2001 @item
13532
b0bea693e638 Whitespace cleanup.
Bruno Haible <bruno@clisp.org>
parents: 13531
diff changeset
2002 0 in @code{@w{@var{regs}->}start[0]} and 1 in @code{@w{@var{regs}->}end[0]}
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2003
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2004 @item
13532
b0bea693e638 Whitespace cleanup.
Bruno Haible <bruno@clisp.org>
parents: 13531
diff changeset
2005 0 in @code{@w{@var{regs}->}start[1]} and 0 in @code{@w{@var{regs}->}end[1]}
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2006 @end itemize
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2007
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2008 @item
13532
b0bea693e638 Whitespace cleanup.
Bruno Haible <bruno@clisp.org>
parents: 13531
diff changeset
2009 If an @w{@var{i}-th} group contains a @w{@var{j}-th} group
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2010 in turn not contained within any other group within group @var{i} and
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2011 the function reports a match of the @w{@var{i}-th} group, then it
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2012 records in @code{@w{@var{regs}->}start[@var{j}]} and
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2013 @code{@w{@var{regs}->}end[@var{j}]} the last match (if it matched) of
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2014 the @w{@var{j}-th} group.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2015
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2016 For example, when you match the pattern @samp{((a*)b)*} against the
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2017 string @samp{abb}, @w{group 2} last matches the empty string, so you
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2018 get what it previously matched:
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2019
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2020 @itemize
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2021 @item
13532
b0bea693e638 Whitespace cleanup.
Bruno Haible <bruno@clisp.org>
parents: 13531
diff changeset
2022 0 in @code{@w{@var{regs}->}start[0]} and 3 in @code{@w{@var{regs}->}end[0]}
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2023
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2024 @item
13532
b0bea693e638 Whitespace cleanup.
Bruno Haible <bruno@clisp.org>
parents: 13531
diff changeset
2025 2 in @code{@w{@var{regs}->}start[1]} and 3 in @code{@w{@var{regs}->}end[1]}
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2026
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2027 @item
13532
b0bea693e638 Whitespace cleanup.
Bruno Haible <bruno@clisp.org>
parents: 13531
diff changeset
2028 2 in @code{@w{@var{regs}->}start[2]} and 2 in @code{@w{@var{regs}->}end[2]}
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2029 @end itemize
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2030
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2031 When you match the pattern @samp{((a)*b)*} against the string
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2032 @samp{abb}, @w{group 2} doesn't participate in the last match, so you
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2033 get:
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2034
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2035 @itemize
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2036 @item
13532
b0bea693e638 Whitespace cleanup.
Bruno Haible <bruno@clisp.org>
parents: 13531
diff changeset
2037 0 in @code{@w{@var{regs}->}start[0]} and 3 in @code{@w{@var{regs}->}end[0]}
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2038
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2039 @item
13532
b0bea693e638 Whitespace cleanup.
Bruno Haible <bruno@clisp.org>
parents: 13531
diff changeset
2040 2 in @code{@w{@var{regs}->}start[1]} and 3 in @code{@w{@var{regs}->}end[1]}
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2041
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2042 @item
13532
b0bea693e638 Whitespace cleanup.
Bruno Haible <bruno@clisp.org>
parents: 13531
diff changeset
2043 0 in @code{@w{@var{regs}->}start[2]} and 1 in @code{@w{@var{regs}->}end[2]}
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2044 @end itemize
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2045
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2046 @item
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2047 If an @w{@var{i}-th} group contains a @w{@var{j}-th} group
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2048 in turn not contained within any other group within group @var{i}
13532
b0bea693e638 Whitespace cleanup.
Bruno Haible <bruno@clisp.org>
parents: 13531
diff changeset
2049 and the function sets
b0bea693e638 Whitespace cleanup.
Bruno Haible <bruno@clisp.org>
parents: 13531
diff changeset
2050 @code{@w{@var{regs}->}start[@var{i}]} and
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2051 @code{@w{@var{regs}->}end[@var{i}]} to @math{-1}, then it also sets
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2052 @code{@w{@var{regs}->}start[@var{j}]} and
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2053 @code{@w{@var{regs}->}end[@var{j}]} to @math{-1}.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2054
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2055 For example, when you match the pattern @samp{((a)*b)*c} against the
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2056 string @samp{c}, you get:
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2057
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2058 @itemize
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2059 @item
13532
b0bea693e638 Whitespace cleanup.
Bruno Haible <bruno@clisp.org>
parents: 13531
diff changeset
2060 0 in @code{@w{@var{regs}->}start[0]} and 1 in @code{@w{@var{regs}->}end[0]}
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2061
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2062 @item
13532
b0bea693e638 Whitespace cleanup.
Bruno Haible <bruno@clisp.org>
parents: 13531
diff changeset
2063 @math{-1} in @code{@w{@var{regs}->}start[1]} and @math{-1} in @code{@w{@var{regs}->}end[1]}
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2064
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2065 @item
13532
b0bea693e638 Whitespace cleanup.
Bruno Haible <bruno@clisp.org>
parents: 13531
diff changeset
2066 @math{-1} in @code{@w{@var{regs}->}start[2]} and @math{-1} in @code{@w{@var{regs}->}end[2]}
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2067 @end itemize
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2068
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2069 @end itemize
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2070
13533
ca70a11e70e2 Integrate the regex documentation.
Bruno Haible <bruno@clisp.org>
parents: 13532
diff changeset
2071 @node Freeing GNU Pattern Buffers
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2072 @subsection Freeing GNU Pattern Buffers
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2073
17274
69f030e5cec4 doc: avoid small caps
Paul Eggert <eggert@cs.ucla.edu>
parents: 16358
diff changeset
2074 To free any allocated fields of a pattern buffer, use the POSIX
13647
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
2075 function @code{regfree}:
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2076
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2077 @findex regfree
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2078 @example
13532
b0bea693e638 Whitespace cleanup.
Bruno Haible <bruno@clisp.org>
parents: 13531
diff changeset
2079 void
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2080 regfree (regex_t *@var{preg})
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2081 @end example
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2082
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2083 @noindent
13647
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
2084 @var{preg} is the pattern buffer whose allocated fields you want freed;
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
2085 this works because since the type @code{regex_t}---the type for
17274
69f030e5cec4 doc: avoid small caps
Paul Eggert <eggert@cs.ucla.edu>
parents: 16358
diff changeset
2086 POSIX pattern buffers---is equivalent to the type
13647
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
2087 @code{re_pattern_buffer}.
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
2088
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
2089 @code{regfree} also sets @var{preg}'s @code{allocated} field to zero.
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
2090 After a buffer has been freed, it must have a regular expression
e5c0e28232bc regex documentation update from Reuben Thomas <rrt@sc3d.org>, 20 Aug 2010 12:04:39 +0100
Karl Berry <karl@freefriends.org>
parents: 13554
diff changeset
2091 compiled in it before passing it to a matching or searching function.
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2092
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2093
13533
ca70a11e70e2 Integrate the regex documentation.
Bruno Haible <bruno@clisp.org>
parents: 13532
diff changeset
2094 @node BSD Regex Functions
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2095 @section BSD Regex Functions
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2096
17274
69f030e5cec4 doc: avoid small caps
Paul Eggert <eggert@cs.ucla.edu>
parents: 16358
diff changeset
2097 If you're writing code that has to be Berkeley Unix compatible,
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2098 you'll need to use these functions whose interfaces are the same as those
17274
69f030e5cec4 doc: avoid small caps
Paul Eggert <eggert@cs.ucla.edu>
parents: 16358
diff changeset
2099 in Berkeley Unix.
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2100
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2101 @menu
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2102 * BSD Regular Expression Compiling:: re_comp ()
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2103 * BSD Searching:: re_exec ()
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2104 @end menu
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2105
13533
ca70a11e70e2 Integrate the regex documentation.
Bruno Haible <bruno@clisp.org>
parents: 13532
diff changeset
2106 @node BSD Regular Expression Compiling
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2107 @subsection BSD Regular Expression Compiling
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2108
17274
69f030e5cec4 doc: avoid small caps
Paul Eggert <eggert@cs.ucla.edu>
parents: 16358
diff changeset
2109 With Berkeley Unix, you can only search for a given regular
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2110 expression; you can't match one. To search for it, you must first
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2111 compile it. Before you compile it, you must indicate the regular
13532
b0bea693e638 Whitespace cleanup.
Bruno Haible <bruno@clisp.org>
parents: 13531
diff changeset
2112 expression syntax you want it compiled according to by setting the
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2113 variable @code{re_syntax_options} (declared in @file{regex.h} to some
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2114 syntax (@pxref{Regular Expression Syntax}).
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2115
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2116 To compile a regular expression use:
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2117
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2118 @findex re_comp
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2119 @example
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2120 char *
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2121 re_comp (char *@var{regex})
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2122 @end example
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2123
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2124 @noindent
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2125 @var{regex} is the address of a null-terminated regular expression.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2126 @code{re_comp} uses an internal pattern buffer, so you can use only the
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2127 most recently compiled pattern buffer. This means that if you want to
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2128 use a given regular expression that you've already compiled---but it
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2129 isn't the latest one you've compiled---you'll have to recompile it. If
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2130 you call @code{re_comp} with the null string (@emph{not} the empty
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2131 string) as the argument, it doesn't change the contents of the pattern
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2132 buffer.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2133
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2134 If @code{re_comp} successfully compiles the regular expression, it
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2135 returns zero. If it can't compile the regular expression, it returns
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2136 an error string. @code{re_comp}'s error messages are identical to those
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2137 of @code{re_compile_pattern} (@pxref{GNU Regular Expression
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2138 Compiling}).
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2139
13533
ca70a11e70e2 Integrate the regex documentation.
Bruno Haible <bruno@clisp.org>
parents: 13532
diff changeset
2140 @node BSD Searching
13532
b0bea693e638 Whitespace cleanup.
Bruno Haible <bruno@clisp.org>
parents: 13531
diff changeset
2141 @subsection BSD Searching
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2142
17274
69f030e5cec4 doc: avoid small caps
Paul Eggert <eggert@cs.ucla.edu>
parents: 16358
diff changeset
2143 Searching the Berkeley Unix way means searching in a string
13531
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2144 starting at its first character and trying successive positions within
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2145 it to find a match. Once you've compiled a pattern using @code{re_comp}
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2146 (@pxref{BSD Regular Expression Compiling}), you can ask Regex
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2147 to search for that pattern in a string using:
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2148
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2149 @findex re_exec
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2150 @example
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2151 int
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2152 re_exec (char *@var{string})
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2153 @end example
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2154
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2155 @noindent
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2156 @var{string} is the address of the null-terminated string in which you
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2157 want to search.
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2158
de7ebb2f1530 Add regex documentation.
Bruno Haible <bruno@clisp.org>
parents:
diff changeset
2159 @code{re_exec} returns either 1 for success or 0 for failure. It
17274
69f030e5cec4 doc: avoid small caps
Paul Eggert <eggert@cs.ucla.edu>
parents: 16358
diff changeset
2160 automatically uses a GNU fastmap (@pxref{Searching with Fastmaps}).