Monday, December 15, 2008

CDC Mortality Data: 7

This is the seventh in a series of posts on the CDC's (Centers for Disease Control and Prevention) public use mortality data sets. Yesterday, we introduced the concept of data mashups. The next several blogs will describe mashup techniques. For today's blog, let's focus on the kinds of biological questions that can be approached with the CDC data. You can't design a credible mashup until you've acquired some understanding of the potential value of the mashed up data.

Alpha-1 antitrypsin disease is a prototypical serpinase disease (disease due to deficiencies or abnormalities in the synthesis of serine proteinases). People with this disorder are homozygous for mutations in the alpha-1 antitrypsin gene. The full-blown disease is characterized by cirrhosis and emphysema. The pathogenesis of this diseases is somewhat complex, because there are a variety of different possible mutations of the gene, and the clinical manifestations vary somewhat with the mutation type. The cirrhosis is apparantly due to the intracellular accumulation of abnormal alpha-1 antitrypsin molecules within hepatocytes, and the emphysema is apparently the result of destructive effects of inflammation-induced intrapulmonary trypsin levels, unopposed by antitrypsin.

As is the case in most rare recessive genetic disorders, heterozygous mutations in the alpha-1 antitrypsin gene are found as common gene variants in the general population.

If a double-dose (homozygous) of an altered gene causes disease, what is the effect of a single (heterozygous) gene variant? Gene variations may be responsible for differences in the pathogenesis of disease among members of the apparently healthy public. About 15% of smokers develop COPD (chronic obstructive pulmonary disease) or emphysema. Why does one smoker develop COPD, while another smoker escapes pulmonary toxicity? Might the difference be accounted for by gene variations, and might a key gene be the alpha-1 antitrypsin gene?

A number of researchers have provided data indicating that heterozygous carriers of alpha-1 antitrypsin mutations are at increased risk for developing emphysema (Lieberman 1969 and Stevens 1971).

Lieberman, J.: Heterozygous and homozygous alpha-1-antitrypsin deficiency in patients with pulmonary emphysema. New Eng. J. Med. 281: 279-284, 1969.

Stevens, P. M.; Hnilica, V.; Johnson, P. C.; Bell, R. L.: Pathophysiology of hereditary emphysema. Ann. Intern. Med. 74: 672-680, 1971.

Population studies indicate that the African American population has much lower levels of alpha-1 antitrypsin disease gene variants than whites, the most prevalent mutations occurring in people with European ancestry (DeCroo 1991, Hutchison 1998).

DeCroo, S.; Kamboh, M. I.; Ferrell, R. E.:Population genetics of alpha-1-antitrypsin polymorphism in US whites, US blacks and African blacks. Hum. Hered. 41: 215-221, 1991.

Hutchison, D. C. S.: Alpha-1-antitrypsin deficiency in Europe: geographical distribution of Pi types S and Z. Resp. Med. 92: 367-377, 1998.

We hypothesize that if alpha-1 antitrypsin disease mutations plays a significant contributory role role in the pathogenesis of emphysema in the general population, we can expect to see fewer emphysema cases in African-Americans (who are unlikely to be heterozygous for alpha-1 antitrypsin diseases mutations) than the white population. We can test this hypothesis by determining the percentage of African-Americans who die, in the U.S., with emphysema, and comparing that number with the percentage of White Americans who die with emphysema.

Here's the Perl script:

open (ICD, "Mort99us.dat");
$line = " ";
while ($line ne "")
$line = <ICD>;
$codesection = substr($line,161,140);
$race = substr($line,59,2);
$whitecount++ if ($race eq "01");
$blackcount++ if ($race eq "02");
if ($codesection =~ /J4[34]/)
$whiteemp++ if ($race eq "01");
$blackemp++ if ($race eq "02");
close ICD;
$whiteempfrac = 100 * ($whiteemp / $whitecount);
$blackempfrac = 100 * ($blackemp / $blackcount);
print "Total records in file is $count\n";
print "Total African-Americans in file is $blackcount\n";
print "Total Whites in file is $whitecount\n";
print "Total African-Americans with emphysema $blackemp\n";
print "Total Whites with emphysema is $whiteemp\n";
print "Percent African-Americans with emphysema is ";
print substr($blackempfrac,0,4) . "\n";
print "Percent Whites with emphysema is ";
print substr($whiteempfrac,0,4) . "\n";

Here is the output from the script:

Total records in file is 2394872
Total African-Americans in file is 285276
Total Whites in file is 2064169
Total African-Americans with emphysema 15190
Total Whites with emphysema is 222996
Percent African-Americans with emphysema is 5.32
Percent Whites with emphysema is 10.8

The Perl script parses through the CDC mortality data for 1999.

Race is assigned a two digit code, 01 for White and 02 for Black, at bytes 60 and 61 of each record. The race code is pulled with the Perl statement:

$race = substr($line,59,2);

Emphysema and COPD cover ICD codes that begin with J4, followed by 3 or 4. Cases coded or emphysema or COPD are matched with the following Perl condition:

if ($codesection =~ /J4[34]/)

The Perl script examines 2.3 million death records in the CDC data set, informs us that African Americans have about half the rate of emphysema and COPD as does the White population. This observation is consistent with our hypothesis that the alpha-1 antitrypsin gene variant increases the risk of emphysema in the general population.

Does this observation prove the hypotheses? Absolutely not. The same observation could be explained by many different hypotheses. But we have shown, with a large number of cases (nearly a quarter million emphysema/COPD cases), that African-Americans have less disease than Whites.

This is the kind of analysis that uses existing CDC mortality data sets to develop and test a hypothesis. In the next few blogs, as we start to use CDC data in mashup applications, we will be developing hypotheses that relate our available data with informaiton that has a graphic representation (such as a map, or a physical drawing of a chromosome, or an anatomic picture).

© 2008 Jules Berman

As with all of my scripts, lists, web sites, and blog entries, the following disclaimer applies. This material is provided by its creator, Jules J. Berman, "as is", without warranty of any kind, expressed or implied, including but not limited to the warranties of merchantability, fitness for a particular purpose and noninfringement. in no event shall the author or copyright holder be liable for any claim, damages or other liability, whether in an action of contract, tort or otherwise, arising from, out of or in connection with the material or the use or other dealings.
In June, 2014, my book, entitled Rare Diseases and Orphan Drugs: Keys to Understanding and Treating the Common Diseases was published by Elsevier. The book builds the argument that our best chance of curing the common diseases will come from studying and curing the rare diseases.

I urge you to read more about my book. There's a generous preview of the book at the Google Books site. If you like the book, please request your librarian to purchase a copy of this book for your library or reading room.

tags: common disease, orphan disease, orphan drugs, rare disease, subsets of disease, disease genetics, genetics of complex disease, genetics of common diseases, cryptic disease, cdc, epidemiology, neoplasms