icon Top 9 categories map      RocketAware > Perl >

How do I extract URLs?

Tips: Browse or Search all pages for efficient awareness of Perl functions, operators, and FAQs.



Home

Search Perl pages


Subjects

By activity
Professions, Sciences, Humanities, Business, ...

User Interface
Text-based, GUI, Audio, Video, Keyboards, Mouse, Images,...

Text Strings
Conversions, tests, processing, manipulation,...

Math
Integer, Floating point, Matrix, Statistics, Boolean, ...

Processing
Algorithms, Memory, Process control, Debugging, ...

Stored Data
Data storage, Integrity, Encryption, Compression, ...

Communications
Networks, protocols, Interprocess, Remote, Client Server, ...

Hard World
Timing, Calendar and Clock, Audio, Video, Printer, Controls...

File System
Management, Filtering, File & Directory access, Viewers, ...

    

How do I extract URLs?

A quick but imperfect approach is

    #!/usr/bin/perl -n00
    # qxurl - tchrist@perl.com
    print "$2\n" while m{
        < \s*
          A \s+ HREF \s* = \s* (["']) (.*?) \1
        \s* >
    }gsix;

This version does not adjust relative URLs, understand alternate bases, deal with HTML comments, deal with HREF and NAME attributes in the same tag, or accept URLs themselves as arguments. It also runs about 100x faster than a more ``complete'' solution using the LWP suite of modules, such as the http://www.perl.com/CPAN/authors/Tom_Christiansen/scripts/xurl.gz program.


Source: Perl FAQ: Networking
Copyright: Copyright (c) 1997 Tom Christiansen and Nathan Torkington.
Next: How do I download a file from the user's machine? How do I open a file on another machine?

Previous: How do I remove HTML from a string?



(Corrections, notes, and links courtesy of RocketAware.com)


[Overview Topics]

Up to: WWW robots and proxies
Up to: WWW authoring




Rapid-Links: Search | About | Comments | Submit Path: RocketAware > Perl > perlfaq9/How_do_I_extract_URLs_.htm
RocketAware.com is a service of Mib Software
Copyright 2000, Forrest J. Cavalier III. All Rights Reserved.
We welcome submissions and comments