Tuesday, December 30, 2008

Why I write scripts in Perl, Ruby and Python

Readers of this blog know that I often include equivalent scripts in Perl, Ruby and Python, for the biomedical informatics projects that I post.

It must be tedious to encounter a blog that devotes long stretches of space to scripts written in three different languages. I can imagine the groans coming from non-programmers when they see a long post full of code, while all they're interested in is the biomedical problem described in the first paragraph of the post.

I have reasons for including these scripts:

1. I want to reach all of the people who are interested in the biomedical problems discussed in my posts. Most of these problems are solved with informatics techniques. The techniques often involve a little bit of programming knowledge. Most biomedical programming is done with Perl, Ruby or Python. These are free, open source and cross-platform languages with active user communities and with abundant instructional material on the internet. If I want to attract the maximum number of readers, I've got to include solutions in all three languages.

2. I want to show readers that powerful scripts, that solve important biomedical problems, can be written in a few lines of code, regardless of the language used. Perl, Ruby and Python can all be used to write equivalent programs. For length, clarity, and speed of execution, there really isn't much difference among the major scripting languages. Biomedical scripts tend to use a few favorite commands (open a file, parse the file line by line, extract something from the lines of the file, do some sort of transformation on the extracted data, write the results to another file). These commands can be learned in a few hours.

3. I want to emphasize in all of my blogs that the difficult part in any informatics project is developing your question (i.e., asking a smart, important, and solvable question), and understanding the substance and limitations of the available data sources. Writing scripts is the easiest and most enjoyable part of the exercise.

- © 2008 Jules J. Berman
Science is not a collection of facts. Science is what facts teach us; what we can learn about our universe, and ourselves, by deductive thinking. From observations of the night sky, made without the aid of telescopes, we can deduce that the universe is expanding, that the universe is not infinitely old, and why black holes exist. Without resorting to experimentation or mathematical analysis, we can deduce that gravity is a curvature in space-time, that the particles that compose light have no mass, that there is a theoretical limit to the number of different elements in the universe, and that the earth is billions of years old. Likewise, simple observations on animals tell us much about the migration of continents, the evolutionary relationships among classes of animals, why the nuclei of cells contain our genetic material, why certain animals are long-lived, why the gestation period of humans is 9 months, and why some diseases are rare and other diseases are common. In “Armchair Science”, the reader is confronted with 129 scientific mysteries, in cosmology, particle physics, chemistry, biology, and medicine. Beginning with simple observations, step-by-step analyses guide the reader toward solutions that are sometimes startling, and always entertaining. “Armchair Science” is written for general readers who are curious about science, and who want to sharpen their deductive skills.