Sunday, January 24, 2010

Grabbing, Testing Web Links

Once more, I'm interrupting a series of blogs on the topic of complexity to provide a pointer to a recently constructed web page showing how to grab web pages and test web links, using Perl, Python, or Ruby.

Perl, Python and Ruby have their own external modules for HTTP transactions. Each language's module has its own peculiar syntax. Still, the basic operation is the same: your script initiates an HTTP request for a web file at a specific network address (the URL, or Uniform Resource Locator). A response is received determining if the page is available (equivalent to testing a link). With a little effort, you can modify the provided scripts to collect and examine a large number of web pages. With a little more effort, you can write your own spider software that searches for web addresses, iteratively collecting information from links within web pages.

The article is at:

© 2010 Jules J. Berman

key words: html, http, hypertext transfer protocol, testing link, ruby programming, perl programming, python programming, bioinformatics, valid web page, web page is available, good http request, valid http request testing if web page exists, testing web links, jules berman Ph.D., M.D.
Science is not a collection of facts. Science is what facts teach us; what we can learn about our universe, and ourselves, by deductive thinking. From observations of the night sky, made without the aid of telescopes, we can deduce that the universe is expanding, that the universe is not infinitely old, and why black holes exist. Without resorting to experimentation or mathematical analysis, we can deduce that gravity is a curvature in space-time, that the particles that compose light have no mass, that there is a theoretical limit to the number of different elements in the universe, and that the earth is billions of years old. Likewise, simple observations on animals tell us much about the migration of continents, the evolutionary relationships among classes of animals, why the nuclei of cells contain our genetic material, why certain animals are long-lived, why the gestation period of humans is 9 months, and why some diseases are rare and other diseases are common. In “Armchair Science”, the reader is confronted with 129 scientific mysteries, in cosmology, particle physics, chemistry, biology, and medicine. Beginning with simple observations, step-by-step analyses guide the reader toward solutions that are sometimes startling, and always entertaining. “Armchair Science” is written for general readers who are curious about science, and who want to sharpen their deductive skills.