Sunday, January 24, 2010

Grabbing, Testing Web Links

Once more, I'm interrupting a series of blogs on the topic of complexity to provide a pointer to a recently constructed web page showing how to grab web pages and test web links, using Perl, Python, or Ruby.

Perl, Python and Ruby have their own external modules for HTTP transactions. Each language's module has its own peculiar syntax. Still, the basic operation is the same: your script initiates an HTTP request for a web file at a specific network address (the URL, or Uniform Resource Locator). A response is received determining if the page is available (equivalent to testing a link). With a little effort, you can modify the provided scripts to collect and examine a large number of web pages. With a little more effort, you can write your own spider software that searches for web addresses, iteratively collecting information from links within web pages.

The article is at:

