Monday, December 15, 2008

CDC Mortality Data: 6

This is the sixth in a series of posts on the CDC's (Centers for Disease Control and Prevention) public use mortality data sets.

Yesterday, we showed how to create a dictionary of ICD code/term pairs that could be used to assign disease terms to the death certificate record codes occurring in the CDC Mortality data sets. This morning I prepared a , web page that contains output data, from yesterday's blog post, that could not fit in the blog page.

I also mentioned, yesterday, that I would explain how the CDC data could be used in mashup projects. So, today, we'll begin a series of blogs that explain how mashup technology can integrate the CDC mortality data sets, and answer biomedical hypotheses.

Data mashups combine and integrate different data sources to produce a graphical representation of data that could not be achieved with any single available data source. Many people apply the term "mashup" to Web-based applications that employ two or more web services or that use two or more web-based applications that have web-accessible APIs (Application Progam Interfaces) that permit their data to be integrated into a derivative application. Because I am a biomedical information specialist, I apply "mashup" to any application that integrates available biomedical data sources, to answer questions with a graphic output (with or without Web involvement).

The classic medical mashup was done by Dr. John Snow, in London, in 1854. Wikimedia has an excellent essay on the subject. The story goes that a major outbreak of cholera occurred in late-August and early September of 1854, in the Soho district of London. By the end of the outbreak, 616 people died.

At the time, nobody understood the biological cause of cholera. At the height of the outbreak, Dr. Snow conducted a rapid, interview-based survey of the site of occurrences of new cases of cholera, producing a case-density map (hand-drawn by the doctor himself).



This map is now in the public domain. A higher-resolution version of the map is available from Wikimedia.

Examination of the map revealed that the epidemic expanded from a water source, the Broad Street pump. The pump was quickly shut. Dr. Snow's historic mashup is sometimes credited with ending the cholera epidemic and heralding a new age in scientific biomedical investigation.

To create a map mashup, we will need a data source that lists occurrences of disease and the localities in which they occur; a data source that provides the latitude and longitude of localities, and a map whose East, West, North, and South boundaries have known latitudes and longitudes. We will also need a programming language that can transform data to graphics and transfer graphics to a a map. We'll use Ruby because I like the Ruby interface to Image Magick, but Perl or Python would work equally well.

Much more importantly, we will need to have a question or hypothesis, whose solution requires a mashup. Much of computational medicine can be described as a solution in search of a question. We have many ways of analyzing data, but we often lack important questions. In the next several blogs, we will show how the CDC mortality data files can be used to test medical hypotheses. Through examples, we will introduce concepts and tools used in mashups, and we will end this series with several mashups, of increasing complexity.

If you are new to this blog, you might want to review the prior 5 blog posts, in the series, sequentially.

As I remind readers in almost every blog post, if you want to do your own creative data mining, you will need to learn a little about computer programming.

For Perl and Ruby programmers, methods and scripts for using a wide range of publicly available biomedical databases, are described in detail in my prior books:

Perl Programming for Medicine and Biology

Ruby Programming for Medicine and Biology

An overview of the many uses of biomedical information is available in my book,
Biomedical Informatics.

More information on cancer is available in my recently published book, Neoplasms: Principles of Development and Diversity.

© 2008 Jules Berman

As with all of my scripts, lists, web sites, and blog entries, the following disclaimer applies. This material is provided by its creator, Jules J. Berman, "as is", without warranty of any kind, expressed or implied, including but not limited to the warranties of merchantability, fitness for a particular purpose and noninfringement. in no event shall the author or copyright holder be liable for any claim, damages or other liability, whether in an action of contract, tort or otherwise, arising from, out of or in connection with the material or the use or other dealings in the material.

No comments: