Perl, Ruby and Python scripts for CDC public use mortality file parsing

I have been interested in knowing whether sickle cell incidence is decreasing in the U.S. population. Despite Pubmed and web searches, I have not been able to find a single data source on the subject. In a prior post, I referred to one of my Perl scripts that
parsed through CDC public use records (about 5 Gigabytes of raw data). The results seemed to suggest that the incidence of sickle cell anemia in the U.S. may be increasing.

For those interested, here are Perl, Ruby and Python versions of the script that parses through the CDC public use mortality files for the years 1996, 1999, 2002, and 2004, to produce the number of occurrences and the rates for sickle cell disease cases, from death certificate data, for those years.

Please refer to the prior post for discussion of the data.

Please refer to my document describing uses for the CDC public use mortality files for general instructions on acquiring and analyzing the de-identified U.S. death certificate data.

@filearray = qw(mort96us.dat mort99us.dat mort02us.dat mort04us.dat);
foreach $file (@filearray)
open (ICD, $file);
$line = " ";
$popcount = 0;
$counter = 0;
while ($line ne "")
$line = <ICD>;
$codesection = substr($line,448,140) if ($file eq $filearray[0]);
$codesection = substr($line,161,140) if ($file eq $filearray[1]);
$codesection = substr($line,162,140) if ($file eq $filearray[2]);
$codesection = substr($line,164,140) if ($file eq $filearray[3]);
if ($codesection =~ /D57/i)
close ICD;
$rate = $counter / $popcount;
$rate = substr((100000 * $rate),0,5);
print "\n\nRecords listing sickle cell is $counter in $file file";
print "\nSickle cell rate per 100,000 records is $rate in $file file";

filearray =
filearray = "mort96us.dat mort99us.dat mort02us.dat mort04us.dat".split
filearray.each do
text =, "r")
counter = 0; popcount = 0;
text.each_line do
codesection = line[448,140] if (file == filearray.fetch(0))
codesection = line[161,140] if (file == filearray.fetch(1))
codesection = line[162,140] if (file == filearray.fetch(2))
codesection = line[164,140] if (file == filearray.fetch(3))
popcount = popcount +1
counter = (counter + 1) if (codesection =~ /D57/i)
rate = ((counter.to_f / popcount.to_f) * 100000).to_s[0,5]
puts "\nRecords listing sickle cell is #{counter} in #{file} file"
puts "Sickle cell rate per 100,000 records is #{rate} in #{file} file"

import re
sickle_match = re.compile('D57')
lst = ("mort96us.dat","mort99us.dat","mort02us.dat","mort04us.dat")
for file in lst:
intext = open(file, "r")
popcount = 0
counter = 0
codesection = ""
for line in intext:
if file == lst[0]:
codesection = line[448:588]
if file == lst[1]:
codesection = line[161:301]
if file == lst[2]:
codesection = line[162:302]
if file == lst[3]:
codesection = line[164:304]
popcount = popcount + 1
p =
if p:
counter = counter + 1
rate = float(counter) / float(popcount) * 100000
rate = str(rate)
rate = rate[0:5]
print ('\n\nRecords listing sickle cell is ')
print (str(counter) + ' in ' + file + ' file')
print ('\nSickle cell rate per 100,000 records is ')
print(str(rate) + ' in ' + file + ' file')

