Thursday, November 20, 2008

Using SEER Public-Use Data: 7

I have been writing a series of blogs on SEER, the U.S. National Cancer Institute's Surveillance, Epidemiology and End Results program. SEER is an amazing resource for information on the cancers that occur in the U.S. One of the products of SEER is the Public Use dataset, which contains de-identified records on over 3.5 million cancers that have occurred between 1973 and 2005.

When you have 3.5 million cancer cases to study, you can draw certain types of inferences that could not possibly be made with the data accumulated at a single medical institution.

Yesterday, we used the SEER data to show that cervical in situ carcinomas, precancers that precede the development of invasive cancers of the cervix, occur in populations with an average age younger than the average age of occurrence of the advanced lesions (just as we expected).

Today, we'll look at the neoplasms that occur in the blood and bone marrow. Do the precancers of blood occur at a younger age than the cancers that develop from those precancers (as we might expect)?

Here are the SEER listings for neoplasms of the blood and bone marrow. The left-hand column is the average age of occurrence of each neoplasm. The middle column is the number of cases in the SEER collection (neoplasms with fewer than 20 SEER cases were considered un-informative and were omitted from the list). The column on the right is the ICD-O term.

Number
Age of Neoplasm name
Cases
--------------------------------------------------------------
015 0000057 langerhans cell histiocytosis, disseminated
021 0000827 precursor b-cell lymphoblastic leukemia
023 0009220 precursor cell lymphoblastic leukemia, nos
024 0000117 precursor t-cell lymphoblastic leukemia
042 0000117 burkitt cell leukemia
043 0000024 burkitt lymphoma, nos
045 0000233 burkitt's tumor
047 0001140 acute promyelocytic leukemia
048 0000257 megakaryocytic leukemia
048 0000057 ac. myelomonocytic leuk. w abn. mar. eosinophils
050 0000034 acute biphenotypic leukemia
050 0000085 acute myeloid leukemia, t(8;21)(q22;q22)
052 0000248 chronic myelogenous leukemia, bcr/abl positive
053 0000053 hypereosinophilic syndrome
054 0000121 malignant mastocytosis
055 0000032 megakaryocytic myelosis
055 0000031 mature t-cell lymphoma, nos
058 0002146 hairy cell leukemia
058 0001770 acute monocytic leukemia
059 0000030 hodgkin's disease nos
059 0000447 acute myeloid leukemia without maturation
059 0000022 acute myeloid leukemia, 11q23 abnormalities
060 0000115 myeloid sarcoma
060 0000614 acute myeloid leukemia with maturation
060 0000113 adult t-cell leukemia/lymphoma (htlv-1 pos.)
061 0018129 acute myeloid leukemia
061 0010279 chronic myeloid leukemia
061 0002006 acute myelomonocytic leukemia
061 0000324 acute myeloid leukemia, minimal differentiation
061 0000027 malignant lymphoma, mixed lymphocytic-histiocytic, nodular
062 0000086 therapy-related myelodysplastic syndrome, nos
063 0000032 hemangiosarcoma
063 0000119 plasma cell tumor, malignant
063 0000054 ml, large b-cell, diffuse, immunoblastic, nos
064 0001275 polycythemia vera
064 0000428 ml, large b-cell, diffuse
064 0000087 acute panmyelosis with myelofibrosis
065 0003731 acute leukemia nos
065 0000200 plasma cell leukemia
065 0000998 essential thrombocythemia
065 0000482 plasmacytoma, extramedullary
066 0000815 erythroleukemia
066 0000036 malignant lymphoma, nodular nos
066 0000036 ml, mixed sm. and lg. cell, diffuse
066 0000042 prolymphocytic leukemia, t-cell type
066 0000162 splenic marginal zone b-cell lymphoma
066 0000026 malignant lymphoma, follicular center cell, cleaved, follicular
067 0001014 lymphoid leukemia nos
067 0000225 malignant lymphoma, non hodgkin's type
067 0000327 acute myeloid leuk. with multilineage dysplasia
068 0000148 ml, lymphoplasmacytic
068 0000132 marginal zone b-cell lymphoma, nos
068 0000032 prolymphocytic leukemia, b-cell type
069 0036377 multiple myeloma
069 0001870 myeloid leukemia nos
069 0000094 mantle cell lymphoma
069 0000314 malignant lymphoma nos
069 0000362 myelosclerosis with myeloid metaplasia
069 0000586 chronic myeloproliferative disease, nos
070 0030307 chronic lymphoid leukemia
070 0000301 prolymphocytic leukemia, nos
071 0000276 ml, small b lymphocytic, nos
071 0001582 waldenstrom macroglobulinemia
071 0000263 refractory cytopenia with multilineage dysplasia
071 0000080 refract. anemia with excess blasts in transformation
072 0002580 leukemia nos
072 0000763 refractory anemia with excess blasts
073 0000820 refractory anemia
074 0002422 myelodysplastic syndrome, nos
074 0000655 refractory anemia with sideroblasts
074 0001798 chronic myelomonocytic leukemia, nos
074 0000089 myelodysplastic syndr. with 5q deletion syndrome
----------------------------------------------------------------

The precancer lesions of the bood cells are the myelodysplasias (previously called preleukemias). They include the refractory anemias and chronic myelomyocytic leukemia (not to be confused with chronic myeloid leukemia). These lesions, sometimes progress to acute myelogenous leukemia.

Here are the average ages of development of the myelodysplasias:

Number
Age of Neoplasm name
Cases
--------------------------------------------------------------
071 0000263 refractory cytopenia with multilineage dysplasia
071 0000080 refract. anemia with excess blasts in transformation
072 0000763 refractory anemia with excess blasts
073 0000820 refractory anemia
074 0002422 myelodysplastic syndrome, nos
074 0000655 refractory anemia with sideroblasts
074 0001798 chronic myelomonocytic leukemia, nos
074 0000089 myelodysplastic syndr. with 5q deletion syndrome
--------------------------------------------------------------

All of the myelodysplasias cluster at the upper end of ages for blood neoplasms (70+ years old). This is far older than the average age of occurrence of the acute leukemias (into which the myelodysplasias develop).

Number
Age of Neoplasm name
Cases
--------------------------------------------------------------
050 0000085 acute myeloid leukemia, t(8;21)(q22;q22)
058 0001770 acute monocytic leukemia
059 0000447 acute myeloid leukemia without maturation
059 0000022 acute myeloid leukemia, 11q23 abnormalities
060 0000614 acute myeloid leukemia with maturation
061 0018129 acute myeloid leukemia
061 0002006 acute myelomonocytic leukemia
061 0000324 acute myeloid leukemia, minimal differentiation
065 0003731 acute leukemia nos
067 0000327 acute myeloid leuk. with multilineage dysplasia
--------------------------------------------------------------

How can a precursor lesions occur in a population that is older than the population in which the developed cancer occurs?

The answer is simple. Most acute leukemias do not develop from the myelodysplasias. When we look at column two, we see at a glance that the acute leukemias are much more numerous than the myelodysplasias.

The pathway of myelodysplasia to acute leukemia is the exception, not the rule, and we would need to find some other precursor lesion to accunt for the bulk of acute myeloid leukemias.

This is an example of how to use the SEER data to examine and test existing hypotheses and to develop new hypotheses. It took under a minute to generate the table, using a Perl script that parsed through 3.5 million SEER records.

In a prior blog, I discussed some of the simple Perl routines used in the SEER-data parsing algorithms, and these are available from my web site.

If you want to do creative data mining, you will need to learn a little computer programming.

For Perl and Ruby programmers, methods and scripts for using SEER and other publicly available biomedical databases, are described in detail in my prior books:

Perl Programming for Medicine and Biology

Ruby Programming for Medicine and Biology

An overview of the many uses of biomedical information is available in my book,
Biomedical Informatics.

More information on cancer is available in my recently published book, Neoplasms.

© 2008 Jules Berman

key words: neoplasms, cancer, neoplasia, precancer, tumor, tumour, tumors, tumours, neoplasm, carcinogenesis, carcinogens, tumor genetics, myelodysplastic syndromes, IEN, pre-cancer, preneoplastic lesions, preneoplasia

As specified in the SEER Data Agreement, the citation for the SEER data is as follows:

"Surveillance, Epidemiology, and End Results (SEER) Program (www.seer.cancer.gov) Limited-Use Data (1973-2005), National Cancer Institute, DCCPS, Surveillance Research Program, Cancer Statistics Branch, released April 2008, based on the November 2007 submission."

As with all of my scripts, lists, web sites, and blog entries, the following disclaimer applies. This material is provided by its creator, Jules J. Berman, "as is", without warranty of any kind, expressed or implied, including but not limited to the warranties of merchantability, fitness for a particular purpose and noninfringement. in no event shall the author or copyright holder be liable for any claim, damages or other liability, whether in an action of contract, tort or otherwise, arising from, out of or in connection with the material or the use or other dealings in the material.

No comments: