1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
|
.TH REGEXP 6
.SH NAME
regexp \- regular expression notation
.SH DESCRIPTION
A
.I "regular expression"
specifies
a set of strings of characters.
A member of this set of strings is said to be
.I matched
by the regular expression. In many applications
a delimiter character, commonly
.LR / ,
bounds a regular expression.
In the following specification for regular expressions
the word `character' means any character (rune) but newline.
.PP
The syntax for a regular expression
.B e0
is
.IP
.EX
e3: literal | charclass | '.' | '^' | '$' | '(' e0 ')'
e2: e3
| e2 REP
REP: '*' | '+' | '?'
e1: e2
| e1 e2
e0: e1
| e0 '|' e1
.EE
.PP
A
.B literal
is any non-metacharacter, or a metacharacter
(one of
.BR .*+?[]()|\e^$ ),
or the delimiter
preceded by
.LR \e .
.PP
A
.B charclass
is a nonempty string
.I s
bracketed
.BI [ \|s\| ]
(or
.BI [^ s\| ]\fR);
it matches any character in (or not in)
.IR s .
A negated character class never
matches newline.
A substring
.IB a - b\f1,
with
.I a
and
.I b
in ascending
order, stands for the inclusive
range of
characters between
.I a
and
.IR b .
In
.IR s ,
the metacharacters
.LR - ,
.LR ] ,
an initial
.LR ^ ,
and the regular expression delimiter
must be preceded by a
.LR \e ;
other metacharacters
have no special meaning and
may appear unescaped.
.PP
A
.L .
matches any character.
.PP
A
.L ^
matches the beginning of a line;
.L $
matches the end of the line.
.PP
The
.B REP
operators match zero or more
.RB ( * ),
one or more
.RB ( + ),
zero or one
.RB ( ? ),
instances respectively of the preceding regular expression
.BR e2 .
.PP
A concatenated regular expression,
.BR "e1\|e2" ,
matches a match to
.B e1
followed by a match to
.BR e2 .
.PP
An alternative regular expression,
.BR "e0\||\|e1" ,
matches either a match to
.B e0
or a match to
.BR e1 .
.PP
A match to any part of a regular expression
extends as far as possible without preventing
a match to the remainder of the regular expression.
.SH "SEE ALSO"
.IR awk (1),
.IR ed (1),
.IR grep (1),
.IR sam (1),
.IR sed (1),
.IR regexp (2)
|