Friday, December 19, 2008

CDC Mortality Data: 11th of 11 posts

This is the eleventh and final post in a series on the CDC mortality data sets. If you've been following this series, you've seen how easy it is to parse through a year's worth of de-identified death certificate data contained in one of the CDC public use mortality files.

We've been using the 1999 mortality file, which contains about 2.3 million records. Each record may list up to 20 diseases, representing the underlying and proximate causes of death and any significant additional conditions that the certifying doctor deems noteworthy.

How many diagnoses are typically listed on a death certificate? About 3. Many certificates list only a single condition

It's easy to rank the average number of conditions listed on the certificates, by state.

The lowest ranking state is AR (Arkansas), with an average of 2.442 conditions listed on each certificate. Next in line is Louisianna, with 2.47 conditions listed. Arizona follows with 2.501.

2.442 AR
2.479 LA
2.501 AZ
2.531 AL
2.554 MT
2.567 MA
2.579 NV
2.603 OK
2.603 VA
2.609 KY
2.621 IL
2.631 IN
2.632 WI
2.634 NM
2.649 OR
2.652 FL
2.663 MI
2.667 SD
2.678 MN
2.690 NJ
2.714 UT
2.768 AK
2.774 PA
2.781 MS
2.789 KS
2.795 MO
2.796 ID
2.800 WY
2.802 GA
2.824 SC
2.831 IA
2.855 ME
2.875 CO
2.875 TX
2.879 WA
2.880 NC
2.883 TN
2.903 DE
2.909 NH
2.921 NE
2.935 DC
2.949 NY
2.955 MD
2.956 CT
3.083 ND
3.102 WV
3.125 VT
3.138 OH
3.195 RI
3.316 HI
3.363 CA

The highest-ranking state is California, with 3.363 conditions listed on each certificate. Next to the top is Hawaii, with 3.316 conditions.

Here is the Perl script that produced the data.

open (STATE, "cdc_states.txt"); #maps CDC state number
$line = " "; #to the state abbreviation
while ($line ne "")
$line = <STATE>;
$line =~ /^[0-9]{2}/;
$state_code = $&;
$line =~ / +([A-Z]{2}) *$/;
$state_abb = $1;
$statehash{$state_code} = $state_abb;
close STATE;
open (ICD, "Mort99us\.dat"); #the CDC mortality file
$line = " ";
while ($line ne "")
$line = <ICD>;
$codesection = substr($line,161,140);
$code = substr($line,20,2);
$state = $statehash{$code};
$codesection =~ s/ +$//;
$eager = scalar(split(" ",$codesection));
$state_eager{$state} = $state_eager{$state} + $eager;
while ((my $key, my $value) = each(%state_total))
$goodness = substr(($state_eager{$key} / $value),0,5);
push(@list_array, "$goodness $key");
print join("\n", (sort(@list_array)));

What is a "lazy" death certificate? I would think that a lazy death certificate is one that contains the absolutely minimal number of conditions required to certify death (i.e., "1"). Let's rank the states by the fraction of death certificates, registered in the state, that contain only one listed condition for the cause of death (by tweaking the first Perl script).

0.323 AL
0.304 MT
0.303 AR
0.291 KY
0.290 IN
0.288 LA
0.285 MN
0.277 VA
0.274 WI
0.270 SD
0.267 MI
0.267 IL
0.258 PA
0.255 OK
0.255 MA
0.252 OR
0.249 NM
0.249 NJ
0.245 MO
0.244 AZ
0.242 ID
0.241 ME
0.241 FL
0.239 AK
0.238 UT
0.238 KS
0.234 WA
0.233 IA
0.229 DE
0.228 WY
0.225 SC
0.222 TN
0.222 CO
0.221 NC
0.220 TX
0.219 NV
0.217 DC
0.214 MS
0.214 MD
0.200 GA
0.199 NH
0.196 OH
0.192 WV
0.190 ND
0.185 NE
0.180 RI
0.177 VT
0.176 CT
0.171 HI
0.129 NY
0.119 CA

Alabama has the worst performance, with nearly one third of death certificates having only 1 listed condition. California, once more, has the best performance of all the states, with one condition reported in only about one tenth of certificates (i.e., about 90% of certificates have more than one condition reported).

Just about every death involves multiple underlying causes of death leading to a proximate cause of death. The number of conditions listed on a death certificate is, in most cases, a matter of personal effort on the part of the certifying doctor.

As we discussed in an earlier blog in this series, it can be quite difficult to produce an accurate death certificate. Nonetheless, much of what we know about human disease and the causes of human mortality come from examination of death certificates. Death certificates have profound importance to the family of the deceased. Doctors should be trained to provide complete and accurate entries for "causes of death" and "other significant conditions" on death certificates.

As I remind readers in almost every blog post, if you want to do your own creative data mining, you will need to learn a little about computer programming.

For Perl and Ruby programmers, methods and scripts for using a wide range of publicly available biomedical databases, are described in detail in my prior books:

Perl Programming for Medicine and Biology

Ruby Programming for Medicine and Biology

An overview of the many uses of biomedical information is available in my book,
Biomedical Informatics.

More information on cancer is available in my recently published book, Neoplasms: Principles of Development and Diversity.

© 2008 Jules Berman

As with all of my scripts, lists, web sites, and blog entries, the following disclaimer applies. This material is provided by its creator, Jules J. Berman, "as is", without warranty of any kind, expressed or implied, including but not limited to the warranties of merchantability, fitness for a particular purpose and noninfringement. in no event shall the author or copyright holder be liable for any claim, damages or other liability, whether in an action of contract, tort or otherwise, arising from, out of or in connection with the material or the use or other dealings.

In June, 2014, my book, entitled Rare Diseases and Orphan Drugs: Keys to Understanding and Treating the Common Diseases was published by Elsevier. The book builds the argument that our best chance of curing the common diseases will come from studying and curing the rare diseases.

I urge you to read more about my book. There's a generous preview of the book at the Google Books site. If you like the book, please request your librarian to purchase a copy of this book for your library or reading room.