! Aware > default selections > Activity specific > Information Tools > WWW > Robots and Proxies >

WWW robots and proxies

Automatic processing of HTTP requests
Subsets on this page: - #Apps & Utilities - #Q&A - #Articles - #Books - #Info - #Libs & Functions -
- #Personalize -
     icon
Search ! Aware:



     Home
  By TONY
  By MARK
  By JERRY
  By ANN
  By ERICA

Search all pages


Subjects

By activity
Professions, Sciences, Humanities, Business, ...

User Interface
Text-based, GUI, Audio, Video, Keyboards, Mouse, Images,...

Text Strings
Conversions, tests, processing, manipulation,...

Math
Integer, Floating point, Matrix, Statistics, Boolean, ...

Processing
Algorithms, Memory, Process control, Debugging, ...

Stored Data
Data storage, Integrity, Encryption, Compression, ...

Communications
Networks, protocols, Interprocess, Remote, Client Server, ...

Hard World
Timing, Calendar and Clock, Audio, Video, Printer, Controls...

File System
Management, Filtering, File & Directory access, Viewers, ...



Information and Publications: Showing

RFC2186 Internet Cache Protocol (ICP), version 2. [c. 1997/09/01]

RFC2187 Application of Internet Cache Protocol (ICP), version 2. [c. 1997/09/01]

SOFTWARE: Squid proxy caching server (monthly posting)

ftp://rtfm.mit.edu/pub/faqs/www/squid-info (At MIT)


Books: Showing

Web Content Mining with Java: Techniques for Exploiting the World's Biggest Information Resource
[Tony Loton; 2002-04-11] ISBN 047084311X
- At Barnes & Noble - At Amazon - At Half

Programming Spiders, Bots, and Aggregators in Java
[Jeff Heaton; 2002-02] ISBN 0782140408
- At Barnes & Noble - At Amazon - At Half

Web Caching (O'Reilly Internet Series)
[Duane Wessels; 2001-06] ISBN 156592536X
- At Barnes & Noble - At Amazon - At Half

Avatars of the Word: From Papyrus to Cyberspace
[James Joseph O'Donnell; 2000-05-05] ISBN 067400194X
- At Barnes & Noble - At Amazon - At Half

Software Agents
[Jeffrey M. Bradshaw (Editor); 1997-04-18] ISBN 0262522349
- At Barnes & Noble - At Amazon - At Half

Web Client Programming With Perl
[Clinton Wong; 1997-04] ISBN 156592214X
- At Barnes & Noble - At Amazon - At Half

Mobile Agents
[Classroom Connect ; Zyda, Michael ] ISBN 0138582424
- At Barnes & Noble - At Amazon - At Half


Articles: Showing

BASAR: A framework for integrating agents in the World Wide Web ( Christoph G. Thomas ; IEEE Computer Magazine 1995-05)


Questions and Answers: Showing

Topical AND Keyword Based Search Engines? [ 2000/10/16]

At Ask Slashdot

Unusual HTTP Requests For robots.txt? [ 2000/09/22]

At Ask Slashdot

How About an Intelligent Open Source Filter? [ 2000/04/04]

At Ask Slashdot

Is Spidering Content from the Web Illegal? [ 1999/11/09]

At Ask Slashdot

How do I configure my machine to use a proxy server? [ 1999/09/01]

At DaemonNews

Questions and Answers

Others not displayed here
Full List


Applications and Utilities: Showing

Dead Link Check (DLC) - DLC - HTTP link checker written in Perl. Can generate HTML output for easy checking of results and process a link cache file to hasten multiple requests. Initially created as an extension to Public Bookmark Generator (PBM); can be used alone. {(L)GPL}

At Sourceforge ( Production/Stable)

p5-WWW-Link-0.030 - Maintain information about the state of links

At FreeBSD Ports

narval-1.1 - Network Assistant Reasoning with a Validating Agent Language

At FreeBSD Ports

larbin-2.6.1 - A powerful HTTP crawler with an easy interface

At FreeBSD Ports

ja-navi2ch-emacs21-1.5.1,1 - 2ch.net client for Emacsen

At FreeBSD Ports

googolplex-0.1.0 - Query Google, parse it and returns the result as a list

At FreeBSD Ports

dailystrips-1.0.22 - Utility to download or view your favorite online comic strips daily

At FreeBSD Ports

crawl-0.3 - A small, efficient web crawler with advanced features

At FreeBSD Ports

p5-Image-Grab-1.4 - Perl extension for Grabbing images off the Internet

At FreeBSD Ports

p5-WWW-Robot-0.022 - Perl interface to a generic web traversal engine

At FreeBSD Ports

junkbuster-2.0.2 - An HTTP proxy server that eliminates ads

At FreeBSD Ports

adzapper-0.4.0 - A filtering proxy that can block ads from being displayed

At FreeBSD Ports

surfraw-1.0.7 - Command line interface to popular WWW search engines

At FreeBSD Ports
surfraw-1.0.2.tgz (At OpenBSD 2.8_packages i386)
surfraw-1.0.2.tgz (At OpenBSD 2.8_packages sparc)
surfraw-1.0.3 - Shell Users' Revolutionary Front Rage Against the Web (At NetBSD packages collection)

htdump-0.9x - A tool to retrieve WWW data

At FreeBSD Ports

decss-1.0 - Strip cascading style sheets from webpages

At FreeBSD Ports
decss-0.0.5.tgz (At OpenBSD 2.7_packages i386)
decss-0.0.6.tgz (At OpenBSD 2.8_packages i386)
decss-0.0.5.tgz (At OpenBSD 2.7_packages sparc)
decss-0.0.6.tgz (At OpenBSD 2.8_packages sparc)

p5-HTML-Summary-0.017 - Produces summaries from the textual content of web pages

At FreeBSD Ports

puf-0.91b6a - puf is a "parallel url fetcher" for UN*X systems

At FreeBSD Ports

quotes-1.7.2 - Quote, currency, and Slashdot headline fetcher based on Perl

At FreeBSD Ports

downloader-1.30 - Program for downloading via ftp or http with GUI

At FreeBSD Ports

checkbot-1.67 - A WWW link verifier, similar like momspider

At FreeBSD Ports
checkbot-1.64 - Verify links on a set of HTML pages (At NetBSD packages collection)

momspider-1.00 - WWW Spider for multi-owner maintenance.

At FreeBSD Ports

xquote-1.1 - A quote retrieval tool for X. [X]

At FreeBSD Ports
xquote-2.2 - WWW ticker symbol quote retrieval program (At NetBSD packages collection)

wget-1.8.1_1 - Retrieve files from the 'net via HTTP and FTP

At FreeBSD Ports
wget-1.5.3.tgz (At OpenBSD 2.7_packages i386)
wget-1.5.3.tgz (At OpenBSD 2.8_packages i386)
wget-msgs-1.5.3-ru.tgz - russian messages for GNU wget (At OpenBSD 2.8_packages i386)
wget-1.5.3.tgz (At OpenBSD 2.7_packages m68k)
wget-1.5.3.tgz (At OpenBSD 2.7_packages sparc)
wget-1.5.3.tgz (At OpenBSD 2.8_packages powerpc)
wget-1.5.3.tgz (At OpenBSD 2.8_packages m68k)
wget-1.5.3.tgz (At OpenBSD 2.8_packages sparc)
wget-1.6nb1 (At NetBSD packages collection)

urlview-0.9 - URL extractor/launcher

At FreeBSD Ports
urlview-0.9.tgz - curses-based URL ripper (At OpenBSD 2.8_packages i386)
urlview-0.9.tgz - curses-based URL ripper (At OpenBSD 2.8_packages sparc)
extract URLs from text files and display them in a menu (At NetBSD packages collection)

comline-4.0D - W3C Command Line WWW Tool

At FreeBSD Ports

harvest-1.5 - Collect information from all over the Internet

At FreeBSD Ports

transproxy-1.2 - Transparent HTTP proxy for ipfw's fwd rule (or IPFILTER's ipnat command)

At FreeBSD Ports
transproxy-0.4.tgz - transparent www proxy driver for IPFILTER (At OpenBSD 2.7_packages i386)
transproxy-1.3.tgz - transparent www proxy driver for IPFILTER (At OpenBSD 2.8_packages i386)
transproxy-0.4.tgz - transparent www proxy driver for IPFILTER (At OpenBSD 2.7_packages sparc)
transproxy-1.3.tgz - transparent www proxy driver for IPFILTER (At OpenBSD 2.8_packages sparc)

wcolEpre-1999.01.10 - A prefetching proxy server for WWW

At FreeBSD Ports

webcopy-0.98b7 - A Web Mirroring Program

At FreeBSD Ports

webglimpse-1.6 - WWW interface to Glimpse search engine.

At FreeBSD Ports

site-dater.pl - Generates a table of web links within a local hierarchy sorted by date. {PD}

(Info at freshmeat)

SiteMap - Creates an HTML SiteMap of your *.*htm* files {GPL}

(Info at freshmeat)

ht://Dig - Complete world wide web indexing and searching system {GPL}

Checklinks - HTML link checker that supports SSI, many Apache options, and more (in Perl 5) {OpenSource}

(Info at freshmeat)

notify - Notify (website) visitors of changes to your site. {GPL}

(Info at freshmeat)

CheckURL - Sends notification e-mails for changed URLs {GPL}

(Info at freshmeat)

DejaSearch - DejaSearch is a frontend to DejaNews, the leading Usenet archive {GPL}

dejasearch-1.8.6 - A frontend to DejaNews for searching Usenet archives (At FreeBSD Ports)
(Info at freshmeat)

Web Secretary - Web page monitoring software {GPL}

(Info at freshmeat)

netcomics - A perl script that downloads today's comics from the Web {GPL}

(Info at freshmeat)
Netcomics - A modularized program that downloads comic strips from the 'Net that are updated regularly. It can be used to download any time-based image that is updated periodically. (At Sourceforge)

sitecopy - Maintain remote copies of locally stored web sites {GPL}

sitecopy-0.11.4_2 - Maintains remote websites, uses FTP or WebDAV to sync up with local copy (At FreeBSD Ports)
sitecopy-0.10.15 - utility for synchronizing remote and local web sites (At NetBSD packages collection)
(Info at freshmeat)

DraE Tracking - Allows servers to provide free tracking to web sites {GPL}

(Info at freshmeat)

FastLink - FastLink is a free Java Applet that displays mirror sites sorted by their respon {GPL}

(Info at freshmeat)

The Internet Junkbuster - The Internet Junkbuster v2.0.2 {GPL}

EHeadlines - Root Menu news system. {x,GPL}

(Info at freshmeat)

gtkMeat - A Freshmeat new submissions ticker {x,GPL}

(Info at freshmeat)

gtkSlash - Gtk+ based Slashdot headlines news ticker {x,GPL}

(Info at freshmeat)

Kget - KDE app to get files from the internet {x,GPL}

asScotch - The days UserFriendly comic strip in your AfterStep rootmenu {x,GPL}

(Info at freshmeat)

asTequila - The AfterStep Resource Page (TARP) headlines in your AfterStep rootmenu {x,GPL}

(Info at freshmeat)

Squid - High performance Web proxy cache {GPL}

squid-2.4.1 - Post-Harvest_cached WWW proxy cache and accelerator (At NetBSD packages collection)
(Info at freshmeat)

w3mir - HTTP copying and mirroring program {Artistic}

w3mir-1.0.10 - All-purpose HTTP copying and mirroring tool (At FreeBSD Ports)
(Info at freshmeat)

WWWOFFLE - Simple proxy server with special features for use with dial-up internet links {GPL}

wwwoffle-2.7b - A caching proxy server for HTTP and FTP designed for dial-up hosts (At FreeBSD Ports)
wwwoffle-2.5e.tgz - WWW OFFLine Explorer (At OpenBSD 2.8_packages i386)
wwwoffle-2.5e.tgz - WWW OFFLine Explorer (At OpenBSD 2.8_packages sparc)
wwwoffle-2.6c - WWW proxy with support for offline browsing (At NetBSD packages collection)
(Info at freshmeat)

freshmeat newsletter to HTML converter - procmail filter to convert freshmeat email newsletter to HTML {Artistic}

(Info at freshmeat)

webcrawl {PD}

webcrawl-1.10 - Download web sites without user interaction by following links (At FreeBSD Ports)
(Info at freshmeat)

ECLiPt-Mirror - Full-featured mirroring script {GPL}

pavuk - Webgrabber with an optional Xt or GTK GUI {GPL}

(Info at freshmeat)

snarf - Command-line URL retrieval tool with some unique features. {GPL}

snarf-7.0 - Another small command-line URL (http/ftp/gopher/finger) fetcher (At FreeBSD Ports)
(Info at freshmeat)

ticker - Configurable text scroller, with slashdot and freshmeat modules {GPL}

(Info at freshmeat)

curl - Tiny command line client for getting data from a URL {GPL}

curl-7.9.6 - Non-interactive tool to get files from FTP, GOPHER, HTTP(S) servers (At FreeBSD Ports)
curl-6.5.2.tgz - get files from FTP, GOPHER, HTTP or HTTPS servers (At OpenBSD 2.7_packages i386)
curl-7.3-kerberos.tgz - get files from FTP, GOPHER, HTTP or HTTPS servers (At OpenBSD 2.8_packages i386)
curl-7.3.tgz - get files from FTP, GOPHER, HTTP or HTTPS servers (At OpenBSD 2.8_packages i386)
curl-6.5.2.tgz - get files from FTP, GOPHER, HTTP or HTTPS servers (At OpenBSD 2.7_packages sparc)
curl-7.3-kerberos.tgz - get files from FTP, GOPHER, HTTP or HTTPS servers (At OpenBSD 2.8_packages m68k)
curl-7.3.tgz - get files from FTP, GOPHER, HTTP or HTTPS servers (At OpenBSD 2.8_packages m68k)
curl-7.3-kerberos.tgz - get files from FTP, GOPHER, HTTP or HTTPS servers (At OpenBSD 2.8_packages sparc)
curl-7.3.tgz - get files from FTP, GOPHER, HTTP or HTTPS servers (At OpenBSD 2.8_packages sparc)
curl-7.7.1 - client that groks URLs (At NetBSD packages collection)
(Info at freshmeat)
- Curl is a tool for transfering files with URL syntax, supporting FTP, HTTP, HTTPS, GOPHER, TELNET, DICT, FILE and LDAP. Curl supports HTTP POST, HTTP PUT, FTP uploading, HTTPS certificates, HTTP form based upload, proxies, cookies, user+password authen (At Sourceforge)

swebget - Prints a webpage to stdout {GPL}

(Info at freshmeat)

GNU Wget - Network utility to retrieve files from the World Wide Web {GPL}

(Info at freshmeat)

PathFinder - A personal web search engine {GPL}

(Info at freshmeat)

HTTPGate - A Filtering HTTP Gateway {GPL}

(Info at freshmeat)

Muffin - Filtering proxy server for the World Wide Web written entirely in Java {GPL}

(Info at freshmeat)
muffin - Muffin is a World Wide Web filtering system written entirely in Java that can filter any HTTP data sent and received by your web browser. (At Sourceforge)

tinyproxy - A small, lightweight, easy-to-configure HTTP proxy. {GPL}

tinyproxy-1.4.3 - A small, efficient HTTP proxy server (At FreeBSD Ports)

Internet Junkbuster - Blocks unwanted banner ads and protects your privacy {GPL}

(Info at freshmeat)

Kticker - News ticker widget that downloads news headlines and displays them periodically {x,GPL}

(Info at freshmeat)

urlredir - URL redirector for use with the squid proxy server {GPL}

(Info at freshmeat)

DailyUpdate - Grabs dynamic information from the internet and integrates itinto your webpage {GPL}

(Info at freshmeat)

Web User Interface - Builds a list of all available personal homepages. {GPL}

(Info at freshmeat)

CGIProxy - Anonymizing, filter-bypassing HTTP proxy in a CGI script (in Perl) {OpenSource}

(Info at freshmeat)

Get Right - HTTP resume for failed transfers. {GPL}

(Info at freshmeat)

Web Tree Scanner - A program to visualize the tree of a WWW server and check the links [X] {GPL}

nss - Netscape Startup Script. Script to handle Netscape launches. [X] {freely distributable}

Slashdot Reader - Slashdot Reader written in Pike/GTK. [X] {PD}

httptunnel-3.3 - Tunnel a tcp/ip connection through a http/tcp/ip connection

At FreeBSD Ports
httptunnel-3.0.tgz - HTTP tunneling utility (At OpenBSD 2.7_packages i386)
httptunnel-3.0.3.tgz - HTTP tunneling utility (At OpenBSD 2.8_packages i386)
httptunnel-3.0.tgz - HTTP tunneling utility (At OpenBSD 2.7_packages sparc)
httptunnel-3.0.3.tgz - HTTP tunneling utility (At OpenBSD 2.8_packages m68k)
httptunnel-3.0.3.tgz - HTTP tunneling utility (At OpenBSD 2.8_packages sparc)
httptunnel - Creates a two-way data tunnel through an HTTP proxy

RabbIT - Mutating, caching webproxy to speed up surfing over slow links {freely distributable}

(Info at freshmeat)

World Engine - Java Search Engine Front End

fresh-split - Perl scripts for splitting freshmeat news

Submitwolf Pro 4.02 By Trellian

asGin - Linux Today headlines in your AfterStep root menu [X] {GPL}

urlmon - URL monitoring and report tool {GPL}

Cyberscrub Professional Edition 1.5

Net Nanny 4.0

LinkBot Personal Edition 6.0

Applications and Utilities

Others not displayed here
Full List

Libraries and Components: Showing

p5-WWW-Search-AltaVista-2.05 - Perl WWW::Search class for searching AltaVista

At FreeBSD Ports

libstocks-0.5.0 - A C library which can be used to fetch stocks quotes

At FreeBSD Ports

p5-HTTP-GHTTP-1.06 - Perl interface to the gnome ghttp library

At FreeBSD Ports

ruby-rss-0.9.1 - Ruby library for parsing, creating, downloading, and caching RSS

At FreeBSD Ports

HTTP::Status - Processes status codes sent over HTTP, e.g. "403 Forbidden", "4040 Not Found", or "402 Payment required". Part of the libwww bundle. [Perl] {oss}

At CPAN

LWP::RobotUA - Create your own Web robot. Part of the libwww bundle. [Perl] {oss}

At CPAN

WWW::Robot - A traversal engine for your Web robot. [Perl] {oss}

At CPAN

WWW::RobotRules - Nice Web robots, as they scour the Net for treasure, heed a robots.txt file if they find one. Information about the Robot standard can be found in http://info.webcrawler.com/mak/projects/robots/norobots.html. [Perl] {oss}

At CPAN

ARS - A Web client for Remedy's ARS system. Useful only if you're already using ARSPerl. [Perl] {oss}

At CPAN

Libraries and Functions

Others not displayed here
Full List

Related Subjects (default selections)

(The following links to subjects at this site retain your personalized selections.)

WWW Servers - Respond to HTTP requests

WWW authoring - Creating HTML, CGI

WWW Browsers - User interface for accessing the WWW

Up to: World Wide Web - HTTP, HTML, standards, browsers, transfer utilities, servers, et al.

(There may be additional related subject pages listed here)

External Categories

freshmeat.net : Topic : Internet : WWW/HTTP : Site Management : Link Checking

Www - - WEB utilities (browers, HTTP servers, etc).

Computers : Programming : Agents :

(Metalab at UNC) /pub/linux/apps/www/indexing/ - indexing and search tools for the Web

(Metalab at UNC) /pub/linux/apps/www/mirroring/ - mirroring and batch retrieval

Personalized Selections
Platform:
MS Windows.
Unix/BSD/Linux.
X.
Prog.Language:
C/C++.
Perl.
Java.
PHP.
License:
Open-source.
  Artistic.
  Public Domain.
  GPL or LGPL.
Maturity:
Stable.
Pre-production.
Tip: To exclude choices, select all others in same column
Pre-Selections

Use our system: Bring Rapid Knowledge Transfer and Awareness to your company website!



Rapid-Links: Search | About | Comments | Submit Path: RocketAware > Activity specific > Information Tools > WWW > Robots and Proxies >
RocketAware.com is a service of Mib Software
Copyright 2002, Forrest J. Cavalier III. All Rights Reserved.
We welcome submissions and comments