Friday, July 23, 2010

New book coming in September

If you've noticed a decline in my blogging of late, it's because I've been putting the finishing touches on my new book, Methods in Medical Informatics: Fundamentals of Healthcare Programming in Perl, Python, and Ruby, which will be published in September in the CRC Press series in Mathematical and Computational Biology.

This book is a departure from my earlier language-specific works. The new book stresses descriptions of medical informatics algorithms, all presented in the following format:

1) Each method has a background discussion, explaining how the method is used by healthcare professionals.

2) This is followed by a narrative description of each of the steps of the algorithm.

3) Equivalent implementations of the algorithm are provided in Perl, Python and Ruby. Scripts draw data from freely available datasets (OMIM, SEER, CDC, Census Bureau, MeSH, and PubMed, Taxonomy).

4) Then comes an analysis of the output of the algorithm.

At the end of every chapter, a list of exercises provides students with short projects, that they can develop in the language of their own choice.

There are about 70 algorithms or methods provided in the book, and with a few exceptions (some of the introductory-level methods), these algorithms have not appeared in any of my prior publications.

The unique offering of this book is that students can master the fundamental methods and algorithms of medical informatics, in their preferred programming language. Instructors can concentrate on teaching useful algorithms, without wasting class time on language-specific programming instruction.

The book is divided into four parts: Part I. Building blocks of medical informatics; Part II. Free, publicly available medical data resources; Part III. The common tasks of medical informatics, and Part IV. Medical Discovery.

The purpose of Part I, Building blocks of medical informatics, is to introduce the basic computational subroutines that will be used in more complex scripts later in the book. Everyone who reads the book is expected to have a basic understanding of either Perl, Python, or Ruby. Readers can skip over the parts written for the other languages. Over the years, students will find it valuable to re-read the text, this time paying attention to the languages they ignored in the first pass.

In Part II, Free, publicly available medical data resources, students are shown how to use the U.S. government biomedical datasets. The chapters in Part II explain the intended uses of these datasets, how the data sets are organized, and how students can retrieve and analyze the data.

Part III, The common tasks of medical informatics, covers some of the computational methods of biomedical informatics, including autocoding, data scrubbing, and data de-identification.

Part IV, Medical discovery, provides examples of the kinds of questions that biomedical scientists can ask and answer with public data and open source programming languages. Students will learn techniques for combining heterogeneous data sources, measuring and visualizing trends in complex data, and testing new hypotheses. The majority of research and development projects that use biomedical data can be developed using the methods described in Part IV.

This book is written specifically as a textbook for courses in medical informatics. Instructors should appreciate this book, because it frees them to teach medical informatics, without wasting time teaching basic programming skills. The included scripts are for the students, who need code samples, from a familiar language, that they can use throughout their careers. The instructors will teach the algorithms (not the language implementations). This strategy allows instructors to teach a big subject (computational medical informatics), in a small amount of time (one academic course).

The book is designed to eliminate the inequities that result when an instructor imposes his choice [of programming language] on all of his students. Students who are trained in an alternate language will resent taking a course in an unfamiliar language. These students will be at a disadvantage compared to other students who happen to be trained in the course-book language. When a professor selects a computer language, he's usually got to start the course with the basics of the selected language, and this instruction can takes weeks of time away from the subject of the course [in this case, medical informatics]. This is just lame.

Creative Writing courses do not waste time teaching people about pens and pencils. Informatics courses should not be waste time teaching programming skills. When a medical informatics instructor tells everyone in the class to use Python, it's equivalent to a writing instructor telling everyone in the class to use a Papermate pen. A good course in medical informatics should expect students to have programming skills, but should not be focused on any particular programming language. Instructors should focus on the tasks, issues, and questions that comprise the core activities of professionals in the field. By providing a discussion of the importance and utility of each algorithm and method, instructors can teach the fundamentals of medical informatics, without wasting time teaching programming.

All of the data used in this book are free and publicly available. Most of the data comes from U.S. government sources, providing hundreds of gigabytes of high quality, curated biomedical data to a global community of scientists, healthcare experts, clinicians, nurses, and students. Every student should become familiar with these data sources, and understand their medical value. This book provides instructions for downloading all of the data sources discussed.

One of the most surprising features of medical informatics is that scripts capable of parsing large data sets, very quickly, can be quite short. By focusing on methods, not full-scale applications, scripts are kept short, concise, and easy to learn. In many cases, the scripts that implement an algorithm are actually shorter than the step-by-step description that precedes the implementation. Students will find that most of their future data analysis efforts can be accomplished by modifying and combining the scripts provided with this book.

-© 2010 Jules J. Berman


Science is not a collection of facts. Science is what facts teach us; what we can learn about our universe, and ourselves, by deductive thinking. From observations of the night sky, made without the aid of telescopes, we can deduce that the universe is expanding, that the universe is not infinitely old, and why black holes exist. Without resorting to experimentation or mathematical analysis, we can deduce that gravity is a curvature in space-time, that the particles that compose light have no mass, that there is a theoretical limit to the number of different elements in the universe, and that the earth is billions of years old. Likewise, simple observations on animals tell us much about the migration of continents, the evolutionary relationships among classes of animals, why the nuclei of cells contain our genetic material, why certain animals are long-lived, why the gestation period of humans is 9 months, and why some diseases are rare and other diseases are common. In “Armchair Science”, the reader is confronted with 129 scientific mysteries, in cosmology, particle physics, chemistry, biology, and medicine. Beginning with simple observations, step-by-step analyses guide the reader toward solutions that are sometimes startling, and always entertaining. “Armchair Science” is written for general readers who are curious about science, and who want to sharpen their deductive skills.