Home
Search Perl pages
Subjects
By activity
Professions, Sciences, Humanities, Business, ...
User Interface
Text-based, GUI, Audio, Video, Keyboards, Mouse, Images,...
Text Strings
Conversions, tests, processing, manipulation,...
Math
Integer, Floating point, Matrix, Statistics, Boolean, ...
Processing
Algorithms, Memory, Process control, Debugging, ...
Stored Data
Data storage, Integrity, Encryption, Compression, ...
Communications
Networks, protocols, Interprocess, Remote, Client Server, ...
Hard World Timing, Calendar and Clock, Audio, Video, Printer, Controls...
File System
Management, Filtering, File & Directory access, Viewers, ...
|
|
|
Here are the quote-like operators that apply to pattern matching and
related activities.
- ?PATTERN?
-
This is just like the
/pattern/ search, except that it matches only once between calls to the
reset() operator. This is a useful optimization when you want
to see only the first occurrence of something in each file of a set of
files, for instance. Only ??
patterns local to the current package are reset.
This usage is vaguely deprecated, and may be removed in some future version
of Perl.
- m/PATTERN/gimosx
-
- /PATTERN/gimosx
-
Searches a string for a pattern match, and in a scalar context returns true
(1) or false (''). If no string is specified via the
=~ or
!~ operator, the $_ string is searched. (The string specified
with
=~ need not be an lvalue--it may be the result of an expression evaluation,
but remember the =~ binds rather tightly.) See also
the perlre manpage. See the perllocale manpage for discussion of additional considerations which apply when use locale is in effect.
Options are:
g Match globally, i.e., find all occurrences.
i Do case-insensitive pattern matching.
m Treat string as multiple lines.
o Compile pattern only once.
s Treat string as single line.
x Use extended regular expressions.
If ``/'' is the delimiter then the initial m is optional. With the m
you can use any pair of non-alphanumeric, non-whitespace characters as delimiters. This is particularly useful for matching Unix path names that contain ``/'', to avoid
LTS (leaning toothpick syndrome). If ``?'' is the delimiter, then the match-only-once rule of
?PATTERN? applies.
PATTERN may contain variables, which will be
interpolated (and the pattern recompiled) every time the pattern search is
evaluated. (Note that $) and $| might not be interpolated because they look like end-of-string tests.) If
you want such a pattern to be compiled only once, add a /o after the trailing delimiter. This avoids expensive run-time
recompilations, and is useful when the value you are interpolating won't
change over the life of the script. However, mentioning
/o constitutes a promise that you won't change the variables in the pattern.
If you change them, Perl won't even notice.
If the
PATTERN evaluates to a null string, the last
successfully executed regular expression is used instead.
If used in a context that requires a list value, a pattern match returns a
list consisting of the subexpressions matched by the parentheses in the
pattern, i.e., ($1 , $2, $3...). (Note that here $1 etc. are also set, and that
this differs from Perl 4's behavior.) If the match fails, a null array is
returned. If the match succeeds, but there were no parentheses, a list
value of (1) is returned.
Examples:
open(TTY, '/dev/tty');
<TTY> =~ /^y/i && foo(); # do foo if desired
if (/Version: *([0-9.]*)/) { $version = $1; }
next if m#^/usr/spool/uucp#;
# poor man's grep
$arg = shift;
while (<>) {
print if /$arg/o; # compile only once
}
if (($F1, $F2, $Etc) = ($foo =~ /^(\S+)\s+(\S+)\s*(.*)/))
This last example splits $foo into the first two words and the remainder of the line, and assigns those three fields to
$F1,
$F2, and $Etc. The conditional is true if any variables were assigned, i.e., if the pattern matched.
The /g modifier specifies global pattern matching--that is, matching as many times
as possible within the string. How it behaves depends on the context. In a
list context, it returns a list of all the substrings matched by all the
parentheses in the regular expression. If there are no parentheses, it
returns a list of all the matched strings, as if there were parentheses
around the whole pattern.
In a scalar context, m//g iterates through the string, returning
TRUE each time it matches, and
FALSE when it eventually runs out of matches. (In other words, it remembers where it left off last time and restarts the search at that point. You can actually find the current match position of a string or set it using the pos() function; see
pos.)
A failed match normally resets the search position to
the beginning of the string, but you can avoid that by adding the ``c''
modifier (e.g. m//gc ). Modifying the target string also resets the search position.
You can intermix m//g matches with m/\G.../g , where \G is a zero-width assertion that matches the exact position where the
previous
m//g , if any, left off. The \G assertion is not supported without the /g modifier; currently, without /g , \G behaves just like
\A, but that's accidental and may change in the future.
Examples:
# list context
($one,$five,$fifteen) = (`uptime` =~ /(\d+\.\d+)/g);
# scalar context
$/ = ""; $* = 1; # $* deprecated in modern perls
while (defined($paragraph = <>)) {
while ($paragraph =~ /[a-z]['")]*[.!?]+['")]*\s/g) {
$sentences++;
}
}
print "$sentences\n";
# using m//gc with \G
$_ = "ppooqppqq";
while ($i++ < 2) {
print "1: '";
print $1 while /(o)/gc; print "', pos=", pos, "\n";
print "2: '";
print $1 if /\G(q)/gc; print "', pos=", pos, "\n";
print "3: '";
print $1 while /(p)/gc; print "', pos=", pos, "\n";
}
The last example should print:
1: 'oo', pos=4
2: 'q', pos=5
3: 'pp', pos=7
1: '', pos=7
2: 'q', pos=8
3: '', pos=8
A useful idiom for lex -like scanners is /\G.../gc . You can combine several regexps like this to process a string
part-by-part, doing different actions depending on which regexp matched.
Each regexp tries to match where the previous one leaves off.
$_ = <<'EOL';
$url = new URI::URL "http://www/"; die if $url eq "xXx";
EOL
LOOP:
{
print(" digits"), redo LOOP if /\G\d+\b[,.;]?\s*/gc;
print(" lowercase"), redo LOOP if /\G[a-z]+\b[,.;]?\s*/gc;
print(" UPPERCASE"), redo LOOP if /\G[A-Z]+\b[,.;]?\s*/gc;
print(" Capitalized"), redo LOOP if /\G[A-Z][a-z]+\b[,.;]?\s*/gc;
print(" MiXeD"), redo LOOP if /\G[A-Za-z]+\b[,.;]?\s*/gc;
print(" alphanumeric"), redo LOOP if /\G[A-Za-z0-9]+\b[,.;]?\s*/gc;
print(" line-noise"), redo LOOP if /\G[^A-Za-z0-9]+/gc;
print ". That's all!\n";
}
Here is the output (split into several lines):
line-noise lowercase line-noise lowercase UPPERCASE line-noise
UPPERCASE line-noise lowercase line-noise lowercase line-noise
lowercase lowercase line-noise lowercase lowercase line-noise
MiXeD line-noise. That's all!
- q/STRING/
-
- 'STRING'
-
A single-quoted, literal string.
A backslash represents a backslash unless followed by the delimiter or another backslash, in which case the delimiter or backslash is interpolated.
$foo = q!I said, "You said, 'She said it.'"!;
$bar = q('This is it.');
$baz = '\n'; # a two-character string
- qq/STRING/
-
- "STRING"
-
A double-quoted, interpolated string.
$_ .= qq
(*** The previous line contains the naughty word "$1".\n)
if /(tcl|rexx|python)/; # :-)
$baz = "\n"; # a one-character string
- qx/STRING/
-
- `STRING`
-
A string which is interpolated and then executed as a system command. The collected standard output of the command is returned. In scalar context, it comes back as a single (potentially multi-line) string. In list context, returns a list of lines (however you've defined lines with $/ or
$INPUT_RECORD_SEPARATOR).
$today = qx{ date };
Note that how the string gets evaluated is entirely subject to the command
interpreter on your system. On most platforms, you will have to protect
shell metacharacters if you want them treated literally. On some platforms
(notably DOS-like ones), the shell may not be capable of dealing with
multiline commands, so putting newlines in the string may not get you what
you want. You may be able to evaluate multiple commands in a single line by
separating them with the command separator character, if your shell
supports that (e.g. ; on many Unix shells; & on the Windows
NT cmd shell).
Beware that some command shells may place restrictions on the length of the
command line. You must ensure your strings don't exceed this limit after
any necessary interpolations. See the platform-specific release notes for
more details about your particular environment.
Also realize that using this operator frequently leads to unportable
programs.
See I/O Operators for more discussion.
- qw/STRING/
-
Returns a list of the words extracted out of
STRING, using embedded whitespace as the word
delimiters. It is exactly equivalent to
split(' ', q/STRING/);
Some frequently seen examples:
use POSIX qw( setlocale localeconv )
@EXPORT = qw( foo bar baz );
A common mistake is to try to separate the words with
comma or to put comments into a multi-line qw-string. For this reason the -w
switch produce warnings if the
STRING contains the ``,'' or the ``#'' character.
- s/PATTERN/REPLACEMENT/egimosx
-
Searches a string for a pattern, and if found, replaces that pattern with
the replacement text and returns the number of substitutions made.
Otherwise it returns false (specifically, the empty string).
If no string is specified via the =~ or !~ operator, the $_
variable is searched and modified. (The string specified with =~ must be a scalar variable, an array element, a hash element, or an
assignment to one of those, i.e., an lvalue.)
If the delimiter chosen is single quote, no variable interpolation is done on either the
PATTERN or the
REPLACEMENT. Otherwise, if the
PATTERN contains a $ that looks like a variable rather than an end-of-string test, the variable will be interpolated into the pattern at run-time. If you want the pattern compiled only once the first time the variable is interpolated, use the
/o option. If the pattern evaluates to a null string, the last successfully
executed regular expression is used instead. See the perlre manpage for further explanation on these. See the perllocale manpage for discussion of additional considerations which apply when use locale is in effect.
Options are:
e Evaluate the right side as an expression.
g Replace globally, i.e., all occurrences.
i Do case-insensitive pattern matching.
m Treat string as multiple lines.
o Compile pattern only once.
s Treat string as single line.
x Use extended regular expressions.
Any non-alphanumeric, non-whitespace delimiter may replace the slashes. If
single quotes are used, no interpretation is done on the replacement string
(the /e modifier overrides this, however). Unlike Perl 4, Perl 5 treats backticks as normal delimiters; the replacement text is not evaluated as a command. If the
PATTERN is delimited by bracketing quotes, the
REPLACEMENT has its own pair of quotes, which may or may not be bracketing quotes, e.g.,
s(foo)(bar) or s<foo>/bar/ .
A /e will cause the replacement portion to be interpreter as a full-fledged Perl
expression and eval()ed right then and there. It is, however,
syntax checked at compile-time.
Examples:
s/\bgreen\b/mauve/g; # don't change wintergreen
$path =~ s|/usr/bin|/usr/local/bin|;
s/Login: $foo/Login: $bar/; # run-time pattern
($foo = $bar) =~ s/this/that/;
$count = ($paragraph =~ s/Mister\b/Mr./g);
$_ = 'abc123xyz';
s/\d+/$&*2/e; # yields 'abc246xyz'
s/\d+/sprintf("%5d",$&)/e; # yields 'abc 246xyz'
s/\w/$& x 2/eg; # yields 'aabbcc 224466xxyyzz'
s/%(.)/$percent{$1}/g; # change percent escapes; no /e
s/%(.)/$percent{$1} || $&/ge; # expr now, so /e
s/^=(\w+)/&pod($1)/ge; # use function call
# /e's can even nest; this will expand
# simple embedded variables in $_
s/(\$\w+)/$1/eeg;
# Delete C comments.
$program =~ s {
/\* # Match the opening delimiter.
.*? # Match a minimal number of characters.
\*/ # Match the closing delimiter.
} []gsx;
s/^\s*(.*?)\s*$/$1/; # trim white space
s/([^ ]*) *([^ ]*)/$2 $1/; # reverse 1st two fields
Note the use of $ instead of \ in the last example. Unlike
sed, we use the \<digit> form in only the left hand side. Anywhere else it's $<digit>.
Occasionally, you can't use just a /g to get all the changes to occur. Here are two common cases:
# put commas in the right places in an integer
1 while s/(.*\d)(\d\d\d)/$1,$2/g; # perl4
1 while s/(\d)(\d\d\d)(?!\d)/$1,$2/g; # perl5
# expand tabs to 8-column spacing
1 while s/\t+/' ' x (length($&)*8 - length($`)%8)/e;
- tr/SEARCHLIST/REPLACEMENTLIST/cds
-
- y/SEARCHLIST/REPLACEMENTLIST/cds
-
Translates all occurrences of the characters found in the search list with
the corresponding character in the replacement list. It returns the number
of characters replaced or deleted. If no string is specified via the =~ or
!~ operator, the
$_ string is translated. (The string
specified with =~ must be a scalar variable, an array element, a hash
element, or an assignment to one of those, i.e., an lvalue.) For sed devotees, y is provided as a synonym for tr. If the
SEARCHLIST is delimited by bracketing quotes, the
REPLACEMENTLIST has its own pair of quotes, which may or may not be bracketing quotes, e.g.,
tr[A-Z][a-z] or tr(+-*/)/ABCD/ .
Options:
c Complement the SEARCHLIST.
d Delete found but unreplaced characters.
s Squash duplicate replaced characters.
If the /c modifier is specified, the
SEARCHLIST character set is complemented. If the /d modifier is specified, any characters specified by
SEARCHLIST not found in
REPLACEMENTLIST are deleted. (Note that this is slightly more flexible than the behavior of some
tr
programs, which delete anything they find in the
SEARCHLIST, period.) If the /s modifier is specified, sequences of characters that were translated to the
same character are squashed down to a single instance of the character.
If the /d modifier is used, the
REPLACEMENTLIST is always interpreted exactly as specified. Otherwise, if the
REPLACEMENTLIST is shorter than the
SEARCHLIST, the final character is replicated till it is long enough. If the
REPLACEMENTLIST is null, the
SEARCHLIST is replicated. This latter is useful for counting characters in a class or for squashing character sequences in a class.
Examples:
$ARGV[1] =~ tr/A-Z/a-z/; # canonicalize to lower case
$cnt = tr/*/*/; # count the stars in $_
$cnt = $sky =~ tr/*/*/; # count the stars in $sky
$cnt = tr/0-9//; # count the digits in $_
tr/a-zA-Z//s; # bookkeeper -> bokeper
($HOST = $host) =~ tr/a-z/A-Z/;
tr/a-zA-Z/ /cs; # change non-alphas to single space
tr [\200-\377]
[\000-\177]; # delete 8th bit
If multiple translations are given for a character, only the first one is
used:
tr/AAA/XYZ/
will translate any
A to
X.
Note that because the translation table is built at compile time, neither the
SEARCHLIST nor the
REPLACEMENTLIST are subjected to double quote interpolation. That means that if you want to use variables, you must use an eval():
eval "tr/$oldlist/$newlist/";
die $@ if $@;
eval "tr/$oldlist/$newlist/, 1" or die $@;
Source: Perl operators and precedence Copyright: Larry Wall, et al. |
Next: I/O Operators
Previous: Quote and Quote-like Operators
(Corrections, notes, and links courtesy of RocketAware.com)
Up to: PERL
Rapid-Links:
Search | About | Comments | Submit Path: RocketAware > Perl >
perlop/Regexp_Quote_Like_Operators.htm
RocketAware.com is a service of Mib Software Copyright 2000, Forrest J. Cavalier III. All Rights Reserved. We welcome submissions and comments
|