! Aware to Perl: Regexp Quote-Like Operators

RocketAware > Perl >

Regexp Quote-Like Operators

Tips: Browse or Search all pages for efficient awareness of Perl functions, operators, and FAQs.

Home

Search Perl pages

Subjects

By activity
Professions, Sciences, Humanities, Business, ...

User Interface
Text-based, GUI, Audio, Video, Keyboards, Mouse, Images,...

Text Strings
Conversions, tests, processing, manipulation,...

Math
Integer, Floating point, Matrix, Statistics, Boolean, ...

Processing
Algorithms, Memory, Process control, Debugging, ...

Stored Data
Data storage, Integrity, Encryption, Compression, ...

Communications
Networks, protocols, Interprocess, Remote, Client Server, ...

Hard World
Timing, Calendar and Clock, Audio, Video, Printer, Controls...

File System
Management, Filtering, File & Directory access, Viewers, ...

Regexp Quote-Like Operators

Here are the quote-like operators that apply to pattern matching and related activities.

?PATTERN?

This is just like the /pattern/ search, except that it matches only once between calls to the reset() operator. This is a useful optimization when you want to see only the first occurrence of something in each file of a set of files, for instance. Only ?? patterns local to the current package are reset.

This usage is vaguely deprecated, and may be removed in some future version of Perl.

m/PATTERN/gimosx

/PATTERN/gimosx

Searches a string for a pattern match, and in a scalar context returns true (1) or false (''). If no string is specified via the =~ or !~ operator, the $_ string is searched. (The string specified with =~ need not be an lvalue--it may be the result of an expression evaluation, but remember the =~ binds rather tightly.) See also the perlre manpage. See the perllocale manpage for discussion of additional considerations which apply when use locale is in effect.

Options are:

    g   Match globally, i.e., find all occurrences.
    i   Do case-insensitive pattern matching.
    m   Treat string as multiple lines.
    o   Compile pattern only once.
    s   Treat string as single line.
    x   Use extended regular expressions.

If ``/'' is the delimiter then the initial m is optional. With the m you can use any pair of non-alphanumeric, non-whitespace characters as delimiters. This is particularly useful for matching Unix path names that contain ``/'', to avoid LTS (leaning toothpick syndrome). If ``?'' is the delimiter, then the match-only-once rule of ?PATTERN? applies.

PATTERN may contain variables, which will be interpolated (and the pattern recompiled) every time the pattern search is evaluated. (Note that $) and $| might not be interpolated because they look like end-of-string tests.) If you want such a pattern to be compiled only once, add a /o after the trailing delimiter. This avoids expensive run-time recompilations, and is useful when the value you are interpolating won't change over the life of the script. However, mentioning /o constitutes a promise that you won't change the variables in the pattern. If you change them, Perl won't even notice.

If the PATTERN evaluates to a null string, the last successfully executed regular expression is used instead.

If used in a context that requires a list value, a pattern match returns a list consisting of the subexpressions matched by the parentheses in the pattern, i.e., ($1, $2, $3...). (Note that here $1 etc. are also set, and that this differs from Perl 4's behavior.) If the match fails, a null array is returned. If the match succeeds, but there were no parentheses, a list value of (1) is returned.

Examples:

    open(TTY, '/dev/tty');
    <TTY> =~ /^y/i && foo();    # do foo if desired

    if (/Version: *([0-9.]*)/) { $version = $1; }

    next if m#^/usr/spool/uucp#;

    # poor man's grep
    $arg = shift;
    while (<>) {
        print if /$arg/o;       # compile only once
    }

    if (($F1, $F2, $Etc) = ($foo =~ /^(\S+)\s+(\S+)\s*(.*)/))

This last example splits $foo into the first two words and the remainder of the line, and assigns those three fields to $F1, $F2, and $Etc. The conditional is true if any variables were assigned, i.e., if the pattern matched.

The /g modifier specifies global pattern matching--that is, matching as many times as possible within the string. How it behaves depends on the context. In a list context, it returns a list of all the substrings matched by all the parentheses in the regular expression. If there are no parentheses, it returns a list of all the matched strings, as if there were parentheses around the whole pattern.

In a scalar context, m//g iterates through the string, returning TRUE each time it matches, and FALSE when it eventually runs out of matches. (In other words, it remembers where it left off last time and restarts the search at that point. You can actually find the current match position of a string or set it using the pos() function; see pos.) A failed match normally resets the search position to the beginning of the string, but you can avoid that by adding the ``c'' modifier (e.g. m//gc). Modifying the target string also resets the search position.

You can intermix m//g matches with m/\G.../g, where \G is a zero-width assertion that matches the exact position where the previous m//g, if any, left off. The \G assertion is not supported without the /g modifier; currently, without /g, \G behaves just like \A, but that's accidental and may change in the future.

Examples:

    # list context
    ($one,$five,$fifteen) = (`uptime` =~ /(\d+\.\d+)/g);

    # scalar context
    $/ = ""; $* = 1;  # $* deprecated in modern perls
    while (defined($paragraph = <>)) {
        while ($paragraph =~ /[a-z]['")]*[.!?]+['")]*\s/g) {
            $sentences++;
        }
    }
    print "$sentences\n";

    # using m//gc with \G
    $_ = "ppooqppqq";
    while ($i++ < 2) {
        print "1: '";
        print $1 while /(o)/gc; print "', pos=", pos, "\n";
        print "2: '";
        print $1 if /\G(q)/gc;  print "', pos=", pos, "\n";
        print "3: '";
        print $1 while /(p)/gc; print "', pos=", pos, "\n";
    }

The last example should print:

    1: 'oo', pos=4
    2: 'q', pos=5
    3: 'pp', pos=7
    1: '', pos=7
    2: 'q', pos=8
    3: '', pos=8

A useful idiom for lex-like scanners is /\G.../gc. You can combine several regexps like this to process a string part-by-part, doing different actions depending on which regexp matched. Each regexp tries to match where the previous one leaves off.

 $_ = <<'EOL';
      $url = new URI::URL "http://www/";;   die if $url eq "xXx";
 EOL
 LOOP:
    {
      print(" digits"),         redo LOOP if /\G\d+\b[,.;]?\s*/gc;
      print(" lowercase"),      redo LOOP if /\G[a-z]+\b[,.;]?\s*/gc;
      print(" UPPERCASE"),      redo LOOP if /\G[A-Z]+\b[,.;]?\s*/gc;
      print(" Capitalized"),    redo LOOP if /\G[A-Z][a-z]+\b[,.;]?\s*/gc;
      print(" MiXeD"),          redo LOOP if /\G[A-Za-z]+\b[,.;]?\s*/gc;
      print(" alphanumeric"),   redo LOOP if /\G[A-Za-z0-9]+\b[,.;]?\s*/gc;
      print(" line-noise"),     redo LOOP if /\G[^A-Za-z0-9]+/gc;
      print ". That's all!\n";
    }

Here is the output (split into several lines):

 line-noise lowercase line-noise lowercase UPPERCASE line-noise
 UPPERCASE line-noise lowercase line-noise lowercase line-noise
 lowercase lowercase line-noise lowercase lowercase line-noise
 MiXeD line-noise. That's all!

q/STRING/

'STRING'

A single-quoted, literal string. A backslash represents a backslash unless followed by the delimiter or another backslash, in which case the delimiter or backslash is interpolated.

    $foo = q!I said, "You said, 'She said it.'"!;
    $bar = q('This is it.');
    $baz = '\n';                # a two-character string

qq/STRING/

"STRING"

A double-quoted, interpolated string.

    $_ .= qq
     (*** The previous line contains the naughty word "$1".\n)
                if /(tcl|rexx|python)/;      # :-)
    $baz = "\n";                # a one-character string

qx/STRING/

`STRING`

A string which is interpolated and then executed as a system command. The collected standard output of the command is returned. In scalar context, it comes back as a single (potentially multi-line) string. In list context, returns a list of lines (however you've defined lines with $/ or $INPUT_RECORD_SEPARATOR).

    $today = qx{ date };

Note that how the string gets evaluated is entirely subject to the command interpreter on your system. On most platforms, you will have to protect shell metacharacters if you want them treated literally. On some platforms (notably DOS-like ones), the shell may not be capable of dealing with multiline commands, so putting newlines in the string may not get you what you want. You may be able to evaluate multiple commands in a single line by separating them with the command separator character, if your shell supports that (e.g. ; on many Unix shells; & on the Windows NT cmd shell).

Beware that some command shells may place restrictions on the length of the command line. You must ensure your strings don't exceed this limit after any necessary interpolations. See the platform-specific release notes for more details about your particular environment.

Also realize that using this operator frequently leads to unportable programs.

See I/O Operators for more discussion.

qw/STRING/

Returns a list of the words extracted out of STRING, using embedded whitespace as the word delimiters. It is exactly equivalent to

    split(' ', q/STRING/);

Some frequently seen examples:

    use POSIX qw( setlocale localeconv )
    @EXPORT = qw( foo bar baz );

A common mistake is to try to separate the words with comma or to put comments into a multi-line qw-string. For this reason the -w switch produce warnings if the STRING contains the ``,'' or the ``#'' character.

s/PATTERN/REPLACEMENT/egimosx

Searches a string for a pattern, and if found, replaces that pattern with the replacement text and returns the number of substitutions made. Otherwise it returns false (specifically, the empty string).

If no string is specified via the =~ or !~ operator, the $_ variable is searched and modified. (The string specified with =~ must be a scalar variable, an array element, a hash element, or an assignment to one of those, i.e., an lvalue.)

If the delimiter chosen is single quote, no variable interpolation is done on either the PATTERN or the REPLACEMENT. Otherwise, if the PATTERN contains a $ that looks like a variable rather than an end-of-string test, the variable will be interpolated into the pattern at run-time. If you want the pattern compiled only once the first time the variable is interpolated, use the /o option. If the pattern evaluates to a null string, the last successfully executed regular expression is used instead. See the perlre manpage for further explanation on these. See the perllocale manpage for discussion of additional considerations which apply when use locale is in effect.

Options are:

    e   Evaluate the right side as an expression.
    g   Replace globally, i.e., all occurrences.
    i   Do case-insensitive pattern matching.
    m   Treat string as multiple lines.
    o   Compile pattern only once.
    s   Treat string as single line.
    x   Use extended regular expressions.

Any non-alphanumeric, non-whitespace delimiter may replace the slashes. If single quotes are used, no interpretation is done on the replacement string (the /e modifier overrides this, however). Unlike Perl 4, Perl 5 treats backticks as normal delimiters; the replacement text is not evaluated as a command. If the PATTERN is delimited by bracketing quotes, the REPLACEMENT has its own pair of quotes, which may or may not be bracketing quotes, e.g., s(foo)(bar) or s<foo>/bar/. A /e will cause the replacement portion to be interpreter as a full-fledged Perl expression and eval()ed right then and there. It is, however, syntax checked at compile-time.

Examples:

    s/\bgreen\b/mauve/g;                # don't change wintergreen

    $path =~ s|/usr/bin|/usr/local/bin|;

    s/Login: $foo/Login: $bar/; # run-time pattern

    ($foo = $bar) =~ s/this/that/;

    $count = ($paragraph =~ s/Mister\b/Mr./g);

    $_ = 'abc123xyz';
    s/\d+/$&*2/e;               # yields 'abc246xyz'
    s/\d+/sprintf("%5d",$&)/e;  # yields 'abc  246xyz'
    s/\w/$& x 2/eg;             # yields 'aabbcc  224466xxyyzz'

    s/%(.)/$percent{$1}/g;      # change percent escapes; no /e
    s/%(.)/$percent{$1} || $&/ge;       # expr now, so /e
    s/^=(\w+)/&pod($1)/ge;      # use function call

    # /e's can even nest;  this will expand
    # simple embedded variables in $_
    s/(\$\w+)/$1/eeg;

    # Delete C comments.
    $program =~ s {
        /\*     # Match the opening delimiter.
        .*?     # Match a minimal number of characters.
        \*/     # Match the closing delimiter.
    } []gsx;

    s/^\s*(.*?)\s*$/$1/;        # trim white space

    s/([^ ]*) *([^ ]*)/$2 $1/;  # reverse 1st two fields

Note the use of $ instead of \ in the last example. Unlike sed, we use the \<digit> form in only the left hand side. Anywhere else it's $<digit>.

Occasionally, you can't use just a /g to get all the changes to occur. Here are two common cases:

    # put commas in the right places in an integer
    1 while s/(.*\d)(\d\d\d)/$1,$2/g;      # perl4
    1 while s/(\d)(\d\d\d)(?!\d)/$1,$2/g;  # perl5

    # expand tabs to 8-column spacing
    1 while s/\t+/' ' x (length($&)*8 - length($`)%8)/e;

tr/SEARCHLIST/REPLACEMENTLIST/cds

y/SEARCHLIST/REPLACEMENTLIST/cds

Translates all occurrences of the characters found in the search list with the corresponding character in the replacement list. It returns the number of characters replaced or deleted. If no string is specified via the =~ or !~ operator, the $_ string is translated. (The string specified with =~ must be a scalar variable, an array element, a hash element, or an assignment to one of those, i.e., an lvalue.) For sed devotees, y is provided as a synonym for tr. If the SEARCHLIST is delimited by bracketing quotes, the REPLACEMENTLIST has its own pair of quotes, which may or may not be bracketing quotes, e.g., tr[A-Z][a-z] or tr(+-*/)/ABCD/.

Options:

    c   Complement the SEARCHLIST.
    d   Delete found but unreplaced characters.
    s   Squash duplicate replaced characters.

If the /c modifier is specified, the SEARCHLIST character set is complemented. If the /d modifier is specified, any characters specified by SEARCHLIST not found in REPLACEMENTLIST are deleted. (Note that this is slightly more flexible than the behavior of some tr programs, which delete anything they find in the SEARCHLIST, period.) If the /s modifier is specified, sequences of characters that were translated to the same character are squashed down to a single instance of the character.

If the /d modifier is used, the REPLACEMENTLIST is always interpreted exactly as specified. Otherwise, if the REPLACEMENTLIST is shorter than the SEARCHLIST, the final character is replicated till it is long enough. If the REPLACEMENTLIST is null, the SEARCHLIST is replicated. This latter is useful for counting characters in a class or for squashing character sequences in a class.

Examples:

    $ARGV[1] =~ tr/A-Z/a-z/;    # canonicalize to lower case

    $cnt = tr/*/*/;             # count the stars in $_

    $cnt = $sky =~ tr/*/*/;     # count the stars in $sky

    $cnt = tr/0-9//;            # count the digits in $_

    tr/a-zA-Z//s;               # bookkeeper -> bokeper

    ($HOST = $host) =~ tr/a-z/A-Z/;

    tr/a-zA-Z/ /cs;             # change non-alphas to single space

    tr [\200-\377]
       [\000-\177];             # delete 8th bit

If multiple translations are given for a character, only the first one is used:

    tr/AAA/XYZ/

will translate any A to X.

Note that because the translation table is built at compile time, neither the SEARCHLIST nor the REPLACEMENTLIST are subjected to double quote interpolation. That means that if you want to use variables, you must use an eval():

    eval "tr/$oldlist/$newlist/";
    die $@ if $@;

    eval "tr/$oldlist/$newlist/, 1" or die $@;

Source: Perl operators and precedence
Copyright: Larry Wall, et al.

Next: I/O Operators

Previous: Quote and Quote-like Operators

(Corrections, notes, and links courtesy of RocketAware.com)

[Overview Topics]

Up to: PERL

Rapid-Links: Search | About | Comments | Submit Path: RocketAware > Perl > perlop/Regexp_Quote_Like_Operators.htm