Thursday, January 28, 2010

Scripts for fetching and testing web pages

Web pages are files (usually in HTML format) that reside on servers that accept HTTP requests from clients connected to the Internet. Browsers are software applications that send HTTP requests and display the received web pages. Using Perl, Python, or Ruby, you can automate HTTP requests. For each language, the easiest way to make an HTTP request is to use a module that comes bundled as a standard component of the language.

I've written very simple scripts, in Perl, Python, and Ruby, for fetching web files. The scripts, and an explanation of how they work, are available at:

http://www.julesberman.info/factoids/url_get.htm


Perl, Python and Ruby use their own external modules for HTTP transactions, and each language's module has its own peculiar syntax. Still, the basic operation is the same: your script initiates an HTTP request for a web file at a specific network address (the URL, or Uniform Resource Locator); a response is received; the web page is retrieved, if possible, and printed to the monitor. Otherwise, the response will contain some information indicating why the page could not be retrieved.

With a little effort, you can use these basic scripts to collect and examine a large number of web pages. With a little more effort, you can write your own spider software that searches for web addresses within web pages, and iteratively collects information from web pages within web pages.

© 2010 Jules J. Berman

key words: testing link, ruby programming, perl programming, python programming, bioinformatics, valid web page, web page is available, good http request, valid http request testing if web page exists, testing web links, jules berman, jules j berman, Ph.D., M.D.
Science is not a collection of facts. Science is what facts teach us; what we can learn about our universe, and ourselves, by deductive thinking. From observations of the night sky, made without the aid of telescopes, we can deduce that the universe is expanding, that the universe is not infinitely old, and why black holes exist. Without resorting to experimentation or mathematical analysis, we can deduce that gravity is a curvature in space-time, that the particles that compose light have no mass, that there is a theoretical limit to the number of different elements in the universe, and that the earth is billions of years old. Likewise, simple observations on animals tell us much about the migration of continents, the evolutionary relationships among classes of animals, why the nuclei of cells contain our genetic material, why certain animals are long-lived, why the gestation period of humans is 9 months, and why some diseases are rare and other diseases are common. In “Armchair Science”, the reader is confronted with 129 scientific mysteries, in cosmology, particle physics, chemistry, biology, and medicine. Beginning with simple observations, step-by-step analyses guide the reader toward solutions that are sometimes startling, and always entertaining. “Armchair Science” is written for general readers who are curious about science, and who want to sharpen their deductive skills.