icon Top 9 categories map      RocketAware > Perl >

What good is \G in a regular expression?

Tips: Browse or Search all pages for efficient awareness of Perl functions, operators, and FAQs.



Home

Search Perl pages


Subjects

By activity
Professions, Sciences, Humanities, Business, ...

User Interface
Text-based, GUI, Audio, Video, Keyboards, Mouse, Images,...

Text Strings
Conversions, tests, processing, manipulation,...

Math
Integer, Floating point, Matrix, Statistics, Boolean, ...

Processing
Algorithms, Memory, Process control, Debugging, ...

Stored Data
Data storage, Integrity, Encryption, Compression, ...

Communications
Networks, protocols, Interprocess, Remote, Client Server, ...

Hard World
Timing, Calendar and Clock, Audio, Video, Printer, Controls...

File System
Management, Filtering, File & Directory access, Viewers, ...

    

What good is \G in a regular expression?

The notation \G is used in a match or substitution in conjunction the /g modifier (and ignored if there's no /g) to anchor the regular expression to the point just past where the last match occurred, i.e. the pos() point.

For example, suppose you had a line of text quoted in standard mail and Usenet notation, (that is, with leading > characters), and you want change each leading > into a corresponding :. You could do so in this way:

     s/^(>+)/':' x length($1)/gem;

Or, using \G, the much simpler (and faster):

    s/\G>/:/g;

A more sophisticated use might involve a tokenizer. The following lex-like example is courtesy of Jeffrey Friedl. It did not work in 5.003 due to bugs in that release, but does work in 5.004 or better. (Note the use of /c, which prevents a failed match with /g from resetting the search position back to the beginning of the string.)

    while (<>) {
      chomp;
      PARSER: {
           m/ \G( \d+\b    )/gcx    && do { print "number: $1\n";  redo; };
           m/ \G( \w+      )/gcx    && do { print "word:   $1\n";  redo; };
           m/ \G( \s+      )/gcx    && do { print "space:  $1\n";  redo; };
           m/ \G( [^\w\d]+ )/gcx    && do { print "other:  $1\n";  redo; };
      }
    }

Of course, that could have been written as

    while (<>) {
      chomp;
      PARSER: {
           if ( /\G( \d+\b    )/gcx  {
                print "number: $1\n";
                redo PARSER;
           }
           if ( /\G( \w+      )/gcx  {
                print "word: $1\n";
                redo PARSER;
           }
           if ( /\G( \s+      )/gcx  {
                print "space: $1\n";
                redo PARSER;
           }
           if ( /\G( [^\w\d]+ )/gcx  {
                print "other: $1\n";
                redo PARSER;
           }
      }
    }

But then you lose the vertical alignment of the regular expressions.


Source: Perl FAQ: Regexps
Copyright: Copyright (c) 1997 Tom Christiansen and Nathan Torkington.
Next: Are Perl regexps DFAs or NFAs? Are they POSIX compliant?

Previous: Why does using $&, $`, or $' slow my program down?



(Corrections, notes, and links courtesy of RocketAware.com)


[Overview Topics]

Up to: NUL terminated String Comparison and Search




Rapid-Links: Search | About | Comments | Submit Path: RocketAware > Perl > perlfaq6/What_good_is_C_G_in_a_regular_.htm
RocketAware.com is a service of Mib Software
Copyright 2000, Forrest J. Cavalier III. All Rights Reserved.
We welcome submissions and comments