Friday, June 15, 2007

Ruby Programming

I've recently written a book, "Ruby Programming for Medicine and Biology." The Table of Contents is available at the Jones and Bartlett Web Site.

In my opinion, it's important that biomedical informaticians become self-sufficient and less reliant on vendor-supplied applications. The simple act of writing your own programs is an empowering experience and permits us to develop and try new ideas, something that would not be feasible with commercial software.

For a long time, I've been an advocate of Perl (see my book).

Perl is very good when you want to do imperative programming (sometimes called procedural programming). Basically, in imperative programming, the program consists of the implementation of an algorithm in the syntax of the programming language. Each line of the program is another command that executes a step in the algorithm. Procedural programming is virtually the same as imperative programming. The only difference is that in procedural programming, a step in the algorithm may involve calling an external method (i.e., another algorithm). You can think of procedural programs as imperative programs with subroutines. This is what Perl does very well. Because it's easy to learn Perl syntax and because the built-in Perl commands and the available Perl modules provide most of the functionality that anyone would need in the biomedical field, Perl has become a very popular language among bioinformaticians.

The problem with Perl is that it is not well suited as a language that models and integrates biomedical classifications and ontologies. This last jargon-heavy sentence deserves a little explanation, but you probably don't need the standard essay on the data-intensive aspects of modern biomedicine. Suffice it to say that when you have lots and lots of complex data, you need some way to simplify the data and to relate one kind of data to other kinds of data. The best way to simplify data is with classifications or ontologies that can annotate data in a manner that everyone can understand and exchange. When you talk about classifications and ontologies, you're talking about data objects, object (instance) methods, class methods, inheritance, metadata descriptions, specifications, on and on. These are the things that object oriented languages provide.

Ruby is a great object oriented language because it is free, open source, has a very simple and logical syntax, and gracefully models existing biomedical classifications and ontologies. I tried using object-oriented Perl for my work with classifications and ontologies, but it just was not a good fit. I dabbled in Python (an excellent object-oriented programming language that has many of the features I was seeking), but it lacked a few things that I wanted.

Let's not get into an endless argument over Ruby v Python v Java. Let me just say that Python is fine (I won't get into my peeves regarding Java), but I chose Ruby because 1) its syntax was beautiful and simple, and I had no trouble learning the language; 2) it enforces single lineage inheritance (which greatly simplifies the language and fits well with the biological classifications that I work with), and 3) it uses the so-called open world paradigm for evaluating assertions, returning true, false or nil (rather than the true/false dichotomy of Perl and Python). I really need Ruby's "nil".

When do you use Perl, and when do you use Ruby? I use Perl whenever I want to create simple utility scripts (transforming one file into another file of a different structure, performing a single algorithm on an input, and so on). In the past, most of my work was this sort of thing. I don't use Ruby to create short utilities because Ruby is slower than Perl. A Ruby script will execute in about twice the time as a Perl script for the same algorithm. This is true of all object-oriented languages. The primary reason they run slowly is because they need to traverse their object libraries when methods are sent to objects.

I use Ruby for modeling biomedical domains. This usually means that if I'm using RDF, ontologies, classifications, objects, object libraries, I use Ruby.

Some of you may have heard of Ruby on Rails (RoR). This is a web server programming environment for creating simple, quick, elegant, object-oriented Web applications. It is wildly popular at the moment. It's just one more perk to learning Ruby.


-Jules Berman
Science is not a collection of facts. Science is what facts teach us; what we can learn about our universe, and ourselves, by deductive thinking. From observations of the night sky, made without the aid of telescopes, we can deduce that the universe is expanding, that the universe is not infinitely old, and why black holes exist. Without resorting to experimentation or mathematical analysis, we can deduce that gravity is a curvature in space-time, that the particles that compose light have no mass, that there is a theoretical limit to the number of different elements in the universe, and that the earth is billions of years old. Likewise, simple observations on animals tell us much about the migration of continents, the evolutionary relationships among classes of animals, why the nuclei of cells contain our genetic material, why certain animals are long-lived, why the gestation period of humans is 9 months, and why some diseases are rare and other diseases are common. In “Armchair Science”, the reader is confronted with 129 scientific mysteries, in cosmology, particle physics, chemistry, biology, and medicine. Beginning with simple observations, step-by-step analyses guide the reader toward solutions that are sometimes startling, and always entertaining. “Armchair Science” is written for general readers who are curious about science, and who want to sharpen their deductive skills.