Friday, November 14, 2008

Using SEER Public Use Data: 1

SEER is the U.S. National Cancer Institute's Surveillance, Epidemiology and End Results program. It is an amazing resource for information about the cancers that occur in the U.S. One of the products of SEER is the Public Use dataset, which contains de-identified records on over 3.5 million cancers that have occurred between 1973 and 2005.

When you have 3.5 million cancer cases to study, you can draw certain types of inferences that could not possibly be made with the data accumulated at a single medical institution.

I thought I would do a series of blogs, extended over the next few weeks or months, showing how the SEER dataset can be analyzed, the kinds of hypotheses and discoveries that can be made by studying the database, and the kinds of things that you can do when you combine SEER data with data from other publicly available resources.

Each SEER record is a cancer case, described by a series of 258 (mostly) numbers, in byte-assigned positions, described by a data dictionary document. When you have the byte locations for the data dictionary entries, you can easily write a short script (I like to use Perl, Ruby, or Python) that can extract and compile data any way you wish.

For example, the following list is a compilation of all of the diagnoses (that occur at least 10 times in the data set), sorted by the age of the person at the time of diagnosis, accompanied by the number of cases and the cumulative fraction of cases accounted for, and by the diagnosis (truncated for reasons of space).

Age Number Cumu- Name
yrs of lat- of
cases ive Neoplasm
-------------------------------------------------
000 0000041 0.000 retinoblastoma, differentiated
001 0000641 0.000 retinoblastoma nos
002 0000050 0.000 infantile fibrosarcoma
002 0000016 0.000 juvenile myelomonocytic leukemi
002 0000021 0.000 atypical teratoid/rhabdoid tumo
002 0000043 0.000 retinoblastoma, undifferentiate
004 0000303 0.000 hepatoblastoma
004 0001762 0.000 neuroblastoma nos
004 0000012 0.000 medulloepithelioma nos
005 0001466 0.001 nephroblastoma nos
007 0000308 0.001 ganglioneuroblastoma
011 0000013 0.001 subependymal giant cell astrocy
012 0000010 0.001 pancreatoblastoma
013 0001163 0.001 medulloblastoma nos
013 0000018 0.001 neurofibromatosis nos
014 0000756 0.001 embryonal rhabdomyosarcoma
014 0000084 0.001 langerhans cell histiocytosis,
016 0000060 0.001 choroid plexus papilloma, malig
017 0001545 0.002 pilocytic astrocytoma (c71._) 9
018 0000031 0.002 embryonal sarcoma
018 0000337 0.002 alveolar rhabdomyosarcoma
019 0001088 0.002 ewing's sarcoma
019 0000100 0.002 desmoplastic medulloblastoma
019 0000507 0.002 primitive neuroectodermal tumor
020 0000584 0.003 endodermal sinus tumor
020 0000055 0.003 clear cell sarcoma of kidney
021 0000069 0.003 ganglioglioma
021 0000827 0.003 precursor b-cell lymphoblastic
022 0000139 0.003 ependymoma, anaplastic type
022 0000025 0.003 dysembryoplastic neuroepithelia
022 0000049 0.003 precursor b-cell lymphoblastic
023 0001242 0.003 teratoma, malignant nos
023 0000021 0.003 choroid plexus papilloma nos
023 0009222 0.006 precursor cell lymphoblastic le
024 0000098 0.006 pineoblastoma
024 0000090 0.006 malignant rhabdoid tumor
024 0000117 0.006 precursor t-cell lymphoblastic
025 0000011 0.006 periosteal osteosarcoma
025 0000023 0.006 mixed type rhabdomyosarcoma
025 0000093 0.006 pleomorphic xanthoastrocytoma
026 0000103 0.006 alveolar soft part sarcoma
026 0000019 0.006 chondroblastoma, malignant
026 0000053 0.006 telangiectatic osteosarcoma
027 0000011 0.006 spindle cell rhabdomyosarcoma
027 0000039 0.006 desmoplastic small round cell t
028 0000690 0.006 germinoma
028 0001717 0.007 teratocarcinoma
028 0000118 0.007 parosteal osteosarcoma
028 0000227 0.007 chondroblastic osteosarcoma
028 0000095 0.007 germ cell tumor, nonseminomatou
028 0000556 0.007 choriocarcinoma combined with t
029 0000023 0.007 ganglioglioma, anaplastic
029 0000019 0.007 intratubular malignant germ cel
029 0000021 0.007 malignant placental site tropho
030 0002388 0.008 mixed germ cell tumor
030 0003184 0.009 embryonal carcinoma nos
030 0000115 0.009 precursor t-cell lymphoblastic
031 0000966 0.009 dysgerminoma
031 0000832 0.009 choriocarcinoma
031 0000021 0.009 centrol neurocytoma
032 0000016 0.009 lipoma nos
032 0000082 0.009 neuroepithelioma nos
032 0000027 0.009 adamantinomatous craniopharyngi
032 0000019 0.009 malignant teratoma, undifferent
033 0001848 0.010 osteosarcoma nos
033 0000114 0.010 fibroblastic osteosarcoma
033 0000030 0.010 adamantinoma of long bones
033 0000156 0.010 synovial sarcoma, biphasic type
033 0011933 0.013 hodgkin lymphoma, nodular scler
034 0001388 0.014 ependymoma nos
034 0000149 0.014 peripheral neuroectodermal tumo
035 0000051 0.014 prolactinoma
035 0000094 0.014 protoplasmic astrocytoma
035 0000963 0.014 precursor cell lymphoblastic ly
036 0000435 0.014 hodgkin lymphoma, nod. scler.,
037 0009849 0.017 seminoma nos
037 0000087 0.017 small cell sarcoma
037 0000010 0.017 acidophil carcinoma
037 0000555 0.017 synovial sarcoma nos
037 0001609 0.017 burkitt lymphoma, nos
037 0000045 0.017 mediastinal large b-cell lympho
037 0000178 0.017 synovial sarcoma, spindle cell
037 0000118 0.018 giant cell tumor of bone, malig
037 0000388 0.018 sq. cell carcinoma, lg. cell, n
038 0000219 0.018 astroblastoma
038 0000053 0.018 craniopharyngioma
038 0057571 0.034 carcinoma in situ nos
038 0000016 0.034 androblastoma, malignant
038 0000627 0.034 seminoma, anaplastic type
038 0000106 0.034 mesenchymal chondrosarcoma
038 0063370 0.052 squamous cell carcinoma in situ
038 0000416 0.052 hodgkin lymphoma, nod. scler.,
038 0000548 0.052 hodgkin lymphoma, nod. scler.,
039 0000019 0.052 neurofibroma nos
039 0000026 0.052 sertoli cell carcinoma
039 0000096 0.052 hepatocellular carcinoma, fibro
039 0000021 0.052 hepatosplenic gamma-delta cell
039 0000481 0.052 hodgkin lymph., nodular lymphoc
039 0000024 0.052 mpnst with rhabdomyoblastic dif
040 0001012 0.053 mixed glioma
040 0000038 0.053 cavernous hemangioma
040 0000070 0.053 myxopapillary ependymoma
040 0000040 0.053 pigmented dermatofibrosarcoma p
040 0000123 0.053 clear cell sarcoma of tendons a
040 0000040 0.053 sertoli-leydig cell tumor, poor
041 0014107 0.057 kaposi's sarcoma
041 0002297 0.057 oligodendroglioma nos
041 0000014 0.057 spongioblastoma polare
041 0000010 0.057 papillary carcinoma, oxyphilic
041 0024527 0.064 squamous intraepithelial neopla
042 0000024 0.064 pulmonary blastoma
042 0000118 0.064 burkitt cell leukemia
042 0000764 0.065 fibrillary astrocytoma
042 0003634 0.066 dermatofibrosarcoma nos
042 0000282 0.066 epithelioid cell sarcoma
042 0000861 0.066 hodgkin's disease, lymphocytic
043 0000036 0.066 hemangioma nos
043 0000505 0.066 rhabdomyosarcoma nos
043 0000019 0.066 solid pseudopapillary carcinoma
044 0000015 0.066 papillary meningioma
044 0000013 0.066 papillary meningioma 9538/3
044 0000017 0.066 juxtacortical chondrosarcoma
044 0000115 0.066 adenocarcinoma, endocervical ty
044 0003873 0.067 squamous cell carcinoma, microi
044 0000079 0.067 papillary cystadenoma, borderli
045 0000086 0.067 hemangioblastoma
045 0000037 0.067 oligodendroblastoma
045 0000016 0.067 struma ovarii, malignant
045 0000029 0.067 undifferentiated sarcoma
045 0000010 0.067 clear cell chondrosarcoma
045 0000011 0.067 carotid body tumor, malignant
045 0000091 0.067 papillary carcinoma, encapsulat
045 0000035 0.067 extra-adrenal paraganglioma, ma
045 0000482 0.067 sq. cell carcinoma, keratinizin
046 0000285 0.068 burkitt's tumor
046 0003337 0.069 glioma, malignant
046 0000024 0.069 periosteal fibrosarcoma
046 0004414 0.070 hodgkin's disease, mixed cellul
046 0010260 0.073 papillary and follicular adenoc
046 0000215 0.073 follicular carcinoma, minimally
047 0008335 0.075 astrocytoma nos
047 0000690 0.075 neurofibrosarcoma
047 0000047 0.075 leydig cell tumor, malignant
047 0001140 0.076 acute promyelocytic leukemia
047 0000163 0.076 malignant melanoma in giant pig
047 0000517 0.076 follicular adenocarcinoma, well
047 0000111 0.076 papillary mucinous cystadenoma,
047 0000084 0.076 squamous cell carcinoma in situ
048 0003078 0.077 hodgkin's disease nos
048 0000257 0.077 megakaryocytic leukemia
048 0000039 0.077 ovarian stromal tumor, mal.
048 0002320 0.077 astrocytoma, anaplastic type
048 0000500 0.078 oligodendroglioma, anaplastic t
048 0000194 0.078 endometrial stromal sarcoma, lo
048 0000104 0.078 epithelioid hemangioendotheliom
048 0000057 0.078 ac. myelomonocytic leuk. w abn.
048 0001603 0.078 serous papillary cystic tumor o
049 0000033 0.078 subependymal glioma
049 0000179 0.078 mesenchymoma, malignant
049 0000461 0.078 gemistocytic astrocytoma
049 0000012 0.078 primary effusion lymphoma
049 0000196 0.078 pheochromocytoma, malignant
049 0000159 0.078 nonencapsulated sclerosing carc
049 0000034 0.078 dermoid cyst with malignant tra
049 0001041 0.079 serous cystadenoma, borderline
049 0001486 0.079 mucinous cystic tumor of border
050 0002700 0.080 bowen's disease
050 0000036 0.080 hodgkin's granuloma
050 0000012 0.080 chromophobe adenoma
050 0001225 0.080 pituitary adenoma, nos
050 0000722 0.080 neurilemmoma, malignant
050 0000172 0.081 giant cell glioblastoma
050 0022382 0.087 papillary carcinoma nos
050 0000089 0.087 ameloblastoma, malignant
050 0000516 0.087 papillary microcarcinoma
050 0000034 0.087 acute biphenotypic leukemia
050 0004050 0.088 follicular adenocarcinoma nos
050 0000017 0.088 mixed medullary-papillary carci
050 0001004 0.088 medullary carcinoma with amyloi
050 0000085 0.088 acute myeloid leukemia, t(8;21)
050 0000030 0.088 mucinous cystadenocarcinoma, no
050 0000029 0.088 mucinous adenocarcinoma, endoce
050 0000141 0.089 follicular adenocarcinoma, trab
051 0001932 0.089 chondrosarcoma nos
051 0000116 0.089 gastrinoma, malignant
051 0000115 0.089 round cell liposarcoma
051 0000117 0.089 paraganglioma, malignant
051 0000761 0.089 adrenal cortical carcinoma
051 0000884 0.090 lymphoepithelial carcinoma
051 0000358 0.090 hemangiopericytoma, malignant
051 0004547 0.091 superficial spreading melanoma,
051 0000335 0.091 medullary carcinoma with lympho
051 0000026 0.091 malignant giant cell tumor of s
051 0000179 0.091 adenocarcinoma in adenomatous p
052 0000043 0.091 myosarcoma
052 0000153 0.091 adenoma nos
052 0000978 0.091 neurilemmoma nos
052 0001258 0.092 fibrosarcoma nos
052 0000032 0.092 histiocytic sarcoma
052 0000191 0.092 myxoid chondrosarcoma
052 0000302 0.092 esthesioneuroblastoma
052 0000110 0.092 precancerous melanosis nos
052 0040929 0.104 superficial spreading melanoma
052 0000159 0.104 hemangioendothelioma, malignant
052 0000934 0.104 cystosarcoma phyllodes, maligna
052 0000031 0.104 malignant melanoma in precancer
052 0000248 0.104 chronic myelogenous leukemia, b
052 0000058 0.104 hodgkin's disease, lymphocytic
053 0000097 0.104 neoplasm, benign
053 0000265 0.104 fibromyxosarcoma
053 0001167 0.104 myxoid liposarcoma
053 0000046 0.104 fascial fibrosarcoma
053 0000027 0.104 blue nevus, malignant
053 0000128 0.104 spermatocytic seminoma
053 0007534 0.107 medullary carcinoma nos
053 0000053 0.107 hypereosinophilic syndrome
053 0000042 0.107 odontogenic tumor, malignant
053 0000780 0.107 granulosa cell tumor, malignant
053 0000350 0.107 malignant melanoma in junctiona
054 0000032 0.107 neuroma nos
054 0000025 0.107 balloon cell melanoma
054 0000135 0.107 malignant mastocytosis
054 0000142 0.107 mesonephroma, malignant
054 0003757 0.108 mucoepidermoid carcinoma
054 0010414 0.111 lobular carcinoma in situ
054 0000029 0.111 thymoma, type b2, malignant
054 0000117 0.111 nk/t-cell lymphoma, nasal and n
054 0000890 0.111 anaplastic large cell lymphoma,
055 0000614 0.111 chordoma
055 0000023 0.111 angiomyosarcoma
055 0000111 0.111 fibrous meningioma
055 0000959 0.112 thymoma, malignant
055 0000328 0.112 adenocarcinoid tumor
055 0000057 0.112 chromophobe carcinoma
055 0000032 0.112 megakaryocytic myelosis
055 0000903 0.112 endometrial stromal sarcoma
055 0000120 0.112 atypical medullary carcinoma
055 0000265 0.112 mucocarcinoid tumor, malignant
055 0000070 0.112 juvenile carcinoma of the breas
055 0000029 0.112 adenocarcinoma in situ in famil
056 0000042 0.112 gliomatosis cerebri
056 0006891 0.114 comedocarcinoma nos
056 0000028 0.114 theca cell carcinoma
056 0000096 0.114 myxoid leiomyosarcoma
056 0048315 0.128 malignant melanoma nos
056 0000030 0.128 angiomatous meningioma
056 0000144 0.128 transitional meningioma
056 0000045 0.128 thymoma, type ab, malignant
056 0000140 0.128 pleomorphic rhabdomyosarcoma
056 0000412 0.128 malignant melanoma, regressing
056 0003428 0.129 mucinous cystadenocarcinoma nos
056 0000086 0.129 papillary carcinoma, columnar c
056 0000485 0.129 papillary mucinous cystadenocar
056 0002754 0.130 intraductal and lobular in situ
056 0000626 0.130 hodgkin's disease, lymphocytic
056 0000039 0.130 endometrioid adenocarcinoma, se
056 0000147 0.130 primary cutan. cd30+ t-cell lym
057 0000089 0.130 myxosarcoma
057 0000403 0.130 adenosarcoma
057 0021999 0.137 melanoma in situ
057 0000855 0.137 islet cell carcinoma
057 0000145 0.137 mixed type liposarcoma
057 0003285 0.138 inflammatory carcinoma
057 0000066 0.138 thymoma, type b1, malignant
057 0000047 0.138 spindle cell melanoma, type a
057 0000026 0.138 subcutaneous panniculitis-like
058 0000011 0.138 vipoma
058 0000202 0.138 myeloid sarcoma
058 0002147 0.138 hairy cell leukemia
058 0000191 0.139 stromal sarcoma, nos
058 0000038 0.139 insulinoma, malignant
058 0000032 0.139 glucagonoma, malignant
058 0006684 0.140 adenocarcinoma in situ
058 0001435 0.141 oxyphilic adenocarcinoma
058 0001770 0.141 acute monocytic leukemia
058 0000054 0.141 malignant myoepithelioma
058 0003319 0.142 adenoid cystic carcinoma
058 0000079 0.142 thymoma, type b3, malignant
058 0000477 0.142 spindle cell melanoma, type b
058 0000229 0.142 meningotheliomatous meningioma
058 0000234 0.143 malignant tumor, small cell typ
058 0009679 0.145 comedocarcinoma, noninfiltratin
058 0000207 0.145 cyst-associated renal cell carc
058 0000843 0.146 intraductal micropapillary carc
058 0004480 0.147 ml, large b-cell, diffuse, immu
058 0008089 0.149 squamous cell carcinoma, large
059 0003124 0.150 sarcoma nos
059 0008070 0.152 leiomyosarcoma nos
059 0000114 0.152 atypical meningioma
059 0000082 0.152 thymic carcinoma, nos
059 0000016 0.152 aggressive nk-cell leukemia
059 0002839 0.153 cribriform carcinoma in situ
059 0027965 0.161 papillary adenocarcinoma nos
059 0000018 0.161 malignant eccrine spiradenoma
059 0001301 0.161 papillary cystadenocarcinoma no
059 0000050 0.161 solitary fibrous tumor, maligna
059 0000091 0.161 polymorphous low grade adenocar
059 0004086 0.163 adenocarcinoma with squamous me
059 0000448 0.163 acute myeloid leukemia without
059 0035630 0.173 intraductal carcinoma, noninfil
059 0000022 0.173 acute myeloid leukemia, 11q23 a
059 0000020 0.173 mixed islet cell and exocrine a
059 0004558 0.174 infiltr. duct mixed with other
060 0008688 0.176 nodular melanoma
060 0000030 0.176 insular carcinoma
060 0003377 0.177 mycosis fungoides
060 0000764 0.178 amelanotic melanoma
060 0001025 0.178 spindle cell sarcoma
060 0000113 0.178 queyrat's erythroplasia
060 0000401 0.178 epithelioid cell melanoma
060 0016028 0.183 carcinoid tumor, malignant
060 0000486 0.183 epithelioid leiomyosarcoma
060 0001267 0.183 mature t-cell lymphoma, nos
060 0001045 0.183 cutaneous t-cell lymphoma, nos
060 0000020 0.183 carcinosarcoma, embryonal type
060 0001669 0.184 duct carcinoma in situ, solid t
060 0001021 0.184 liposarcoma, well differentiate
060 0000014 0.184 granulosa cell-theca cell tumor
060 0000614 0.184 acute myeloid leukemia with mat
060 0000617 0.184 renal cell carcinoma, chromopho
060 0000166 0.185 carcinoid tumor, argentaffin, m
060 0000126 0.185 adult t-cell leukemia/lymphoma
060 0000299 0.185 squamous cell carcinoma, small
060 0008938 0.187 malignant lymphoma, follicular
061 0019907 0.193 glioblastoma nos
061 0000906 0.193 meningioma, malignant
061 0000015 0.193 epithelioma, malignant
061 0026705 0.201 endometrioid carcinoma
061 0018131 0.206 acute myeloid leukemia
061 0010280 0.209 chronic myeloid leukemia
061 0000026 0.209 polygonal cell carcinoma
061 0000057 0.209 collecting duct carcinoma
061 0000303 0.209 metaplastic carcinoma, nos
061 0000382 0.209 mixed tumor, malignant nos
061 0333623 0.303 infiltrating duct carcinoma
061 0002006 0.303 acute myelomonocytic leukemia
061 0020557 0.309 clear cell adenocarcinoma nos
061 0000885 0.309 infiltrating ductular carcinoma
061 0000165 0.309 carcinoma in pleomorphic adenom
061 0000661 0.310 mixed epithel. & spindle cell m
061 0000295 0.310 glioblastoma with sarcomatous c
061 0000042 0.310 adenocarcinoma with apocrine me
061 0021695 0.316 infiltrating duct and lobular c
061 0000324 0.316 acute myeloid leukemia, minimal
061 0000100 0.316 papillary squamous cell carcino
061 0000012 0.316 endometrioid adenocarcinoma, ci
061 0000074 0.316 hodgkin's disease, lymphocytic
061 0002217 0.317 paget's disease and infiltratin
062 0001323 0.317 liposarcoma nos
062 0006337 0.319 tubular adenocarcinoma
062 0000014 0.319 heavy chain disease, nos
062 0000077 0.319 atypical carcinoid tumor
062 0000494 0.319 skin appendage carcinoma
062 0001092 0.319 mixed cell adenocarcinoma
062 0001281 0.320 spindle cell melanoma nos
062 0004682 0.321 serous cystadenocarcinoma nos
062 0005812 0.323 fibrous histiocytoma, malignant
062 0001263 0.323 gastrointestinal stromal sarcom
062 0000106 0.323 malignant tumor, giant cell typ
062 0000493 0.323 basaloid squamous cell carcinom
062 0000258 0.323 renal cell carcinoma, sarcomato
062 0000118 0.323 epithelial-myoepithelial carcin
062 0000884 0.323 acral lentiginous melanoma, mal
062 0015063 0.328 papillary serous cystadenocarci
062 0000039 0.328 endometrioid adenofibroma, mali
062 0012432 0.331 malignant lymphoma, non hodgkin
062 0000037 0.331 adenocarcinoma with spindle cel
062 0000086 0.331 therapy-related myelodysplastic
062 0002593 0.332 infiltr. duct mixed with other
062 0003162 0.333 noninfiltrating intraductal pap
062 0005812 0.334 malignant lymphoma, mixed lymph
062 0003414 0.335 malignant lymphoma, follicular
063 0001353 0.336 hemangiosarcoma
063 0000023 0.336 hodgkin's sarcoma
063 0000024 0.336 meningiomatosis nos
063 0000239 0.336 solid carcinoma nos
063 0000135 0.336 composite carcinoid
063 0001142 0.336 cystadenocarcinoma nos
063 0000023 0.336 schneiderian carcinoma
063 0011193 0.339 adenosquamous carcinoma
063 0000084 0.339 psammomatous meningioma
063 0037088 0.350 ml, large b-cell, diffuse
063 0000080 0.350 basal cell adenocarcinoma
063 0000016 0.350 intestinal t-cell lymphoma
063 0000287 0.350 dedifferentiated liposarcoma
063 0000939 0.350 plasmacytoma, extramedullary
063 0000869 0.350 plasma cell tumor, malignant
063 0003345 0.351 malignant lymphoma, nodular nos
063 0001358 0.352 paget disease and intraductal c
063 0002708 0.353 serous surface papillary carcin
063 0000020 0.353 squamous cell carcinoma, clear
063 0000012 0.353 basal cell carcinoma, fibroepit
063 0000334 0.353 giant cell sarcoma (except of b
063 0023109 0.359 squamous cell carcinoma, kerati
063 0000472 0.359 infiltrating lobular mixed with
063 0000277 0.359 combined hepatocellular carcino
064 0001275 0.360 polycythemia vera
064 0000041 0.360 lymphangiosarcoma
064 0017866 0.365 oat cell carcinoma
064 0037543 0.375 renal cell carcinoma
064 0001065 0.376 giant cell carcinoma
064 0011520 0.379 acinar cell carcinoma
064 0001358 0.379 cloacogenic carcinoma
064 0034336 0.389 lobular carcinoma nos
064 0013199 0.393 malignant lymphoma nos
064 0000947 0.393 granular cell carcinoma
064 0000405 0.393 pleomorphic liposarcoma
064 0000723 0.393 apocrine adenocarcinoma
064 0003164 0.394 scirrhous adenocarcinoma
064 0005106 0.396 neuroendocrine carcinoma
064 0000363 0.396 sweat gland adenocarcinoma
064 0247826 0.465 squamous cell carcinoma nos
064 0019957 0.471 hepatocellular carcinoma nos
064 0000757 0.471 papillary squamous cell carcino
064 0010254 0.474 carcinoma, undifferentiated typ
064 0000087 0.474 acute panmyelosis with myelofib
064 0000248 0.474 giant cell and spindle cell car
064 0000083 0.474 small cell carcinoma, fusiform
064 0000151 0.474 adenocarcinoma in mult. adenoma
064 0000018 0.474 atypical chronic myeloid leuk.,
064 0000026 0.474 adenocarcinoma with cartilagino
065 0002723 0.475 meningioma nos
065 0003731 0.476 acute leukemia nos
065 0000880 0.476 cribriform carcinoma
065 0000200 0.477 plasma cell leukemia
065 0000514 0.477 pleomorphic carcinoma
065 0000052 0.477 eccrine adenocarcinoma
065 0029592 0.485 large cell carcinoma nos
065 0000123 0.485 brenner tumor, malignant
065 0000998 0.485 essential thrombocythemia
065 0000016 0.485 ceruminous adenocarcinoma
065 0010967 0.488 signet ring cell carcinoma
065 0000125 0.488 nodular hidradenoma, malignant
065 0004446 0.490 carcinoma, anaplastic type nos
065 0000086 0.490 adenoid squamous cell carcinoma
065 0000116 0.490 sclerosing sweat duct carcinoma
065 0000856 0.490 adenocarcinoma with mixed subty
065 0003638 0.491 marginal zone b-cell lymphoma,
065 0004319 0.492 ml, mixed sm. and lg. cell, dif
065 0000183 0.492 superficial spreading adenocarc
065 0000054 0.492 hepatocellular carcinoma, clear
065 0000020 0.492 composite hodgkin and non-hodgk
065 0000105 0.492 adenocarcinoma with neuroendocr
065 0001094 0.493 intraductal papillary adenocarc
066 0000815 0.493 erythroleukemia
066 0001654 0.493 linitis plastica
066 0000461 0.493 basaloid carcinoma
066 0002890 0.494 mantle cell lymphoma
066 0001270 0.495 ml, lymphoplasmacytic
066 0000787 0.495 spindle cell carcinoma
066 0001267 0.495 carcinoma, diffuse type
066 0000390 0.495 alveolar adenocarcinoma
066 0000618 0.496 paget's disease, mammary
066 0050173 0.510 small cell carcinoma nos
066 0000624 0.510 papillary carcinoma in situ
066 0000783 0.510 desmoplastic melanoma, malignan
066 0011881 0.513 bronchiolo-alveolar adenocarcin
066 0000241 0.513 angioimmunoblastic t-cell lymph
066 0000403 0.514 large cell neuroendocrine carci
066 0000315 0.514 malignant tumor, fusiform cell
066 0000042 0.514 prolymphocytic leukemia, t-cell
066 0001669 0.514 small cell carcinoma, intermedi
066 0000050 0.514 adenocarc. in situ in mult. ade
066 0000127 0.514 neoplasm, uncertain whether ben
066 0000032 0.514 intraductal papillary-mucinous
067 0000114 0.514 sezary's disease
067 0001015 0.515 lymphoid leukemia nos
067 0000779 0.515 mesodermal mixed tumor
067 0000184 0.515 trabecular adenocarcinoma
067 0000514 0.515 intracystic carcinoma, nos
067 0009635 0.518 ml, small b lymphocytic, nos
067 0000557 0.518 combined small cell carcinoma
067 0025451 0.525 mucin-producing adenocarcinoma
067 0000032 0.525 dedifferentiated chondrosarcoma
067 0000016 0.525 immunoproliferative disease, no
067 0001263 0.525 epithelioid mesothelioma, malig
067 0000179 0.525 splenic marginal zone b-cell ly
067 0000212 0.525 bronchiolo-alveolar carcinoma,
067 0000584 0.526 squamous cell carcinoma, spindl
067 0008163 0.528 adenocarcinoma in situ in adeno
067 0000165 0.528 bronchiolo-alveolar carcinoma,
067 0005831 0.530 adenocarcinoma in situ in tubul
067 0000327 0.530 acute myeloid leuk. with multil
067 0000026 0.530 bronch.-alv. carc., mixed mucin
067 0000038 0.530 intraductal papillary-mucinous
068 0000054 0.530 carcinoma simplex
068 0002264 0.530 carcinosarcoma nos
068 1021940 0.818 adenocarcinoma nos
068 0003088 0.819 mullerian mixed tumor
068 0000758 0.819 villous adenocarcinoma
068 0045778 0.832 mucinous adenocarcinoma
068 0004897 0.834 mesothelioma, malignant
068 0001673 0.834 verrucous carcinoma nos
068 0014878 0.838 non-small cell carcinoma
068 0000032 0.838 thymoma, type a, malignant
068 0000295 0.838 pseudosarcomatous carcinoma
068 0000056 0.838 granular cell tumor, malignant
068 0018652 0.844 hutchinson's melanotic freckle
068 0020880 0.849 adenocarcinoma in adenomatous p
068 0000032 0.849 prolymphocytic leukemia, b-cell
068 0096537 0.877 papillary transitional cell car
068 0016555 0.881 adenocarcinoma in tubulovillous
069 0036429 0.892 multiple myeloma
069 0004610 0.893 cholangiocarcinoma
069 0001872 0.893 myeloid leukemia nos
069 0000156 0.893 basosquamous carcinoma
069 0000046 0.893 eccrine poroma, malignant
069 0000010 0.893 clear cell adenocarcinofibroma
069 0000476 0.894 fibrous mesothelioma, malignant
069 0016754 0.898 adenocarcinoma in villous adeno
069 0000940 0.899 transitional cell carcinoma in
069 0000419 0.899 noninfiltrating intracystic car
069 0000362 0.899 myelosclerosis with myeloid met
069 0000586 0.899 chronic myeloproliferative dise
069 0004672 0.900 adenocarcinoma in situ in villo
069 0002438 0.901 papillary trans. cell carcinoma
069 0007272 0.903 malignant melanoma in hutchinso
070 0003575 0.904 tumor cells, malignant
070 0030328 0.913 chronic lymphoid leukemia
070 0000301 0.913 prolymphocytic leukemia, nos
070 0050243 0.927 transitional cell carcinoma nos
070 0000049 0.927 osteosarcoma in paget's disease
071 0001585 0.927 waldenstrom macroglobulinemia
071 0000122 0.927 transitional cell carcinoma, sp
071 0000263 0.927 refractory cytopenia with multi
071 0000080 0.927 refract. anemia with excess bla
071 0000811 0.928 paget's disease, extramammary (
072 0002581 0.928 leukemia nos
072 0182616 0.980 carcinoma nos
072 0000408 0.980 klatskin tumor
072 0000012 0.980 hepatoid adenocarcinoma
072 0000796 0.980 sebaceous adenocarcinoma
072 0000786 0.980 basal cell carcinoma nos
072 0000012 0.980 adenoid basal cell carcinoma
072 0002766 0.981 adenocarcinoma, intestinal type
072 0000015 0.981 multicentric basal cell carcino
072 0000763 0.981 refractory anemia with excess b
072 0000261 0.981 mesothelioma, biphasic type, ma
072 0000028 0.981 transitional cell carcinoma, mi
073 0000820 0.982 refractory anemia
073 0000036 0.982 basal cell carcinoma, nodular
074 0001716 0.982 merkel cell carcinoma
074 0002422 0.983 myelodysplastic syndrome, nos
074 0000655 0.983 refractory anemia with siderobl
074 0001798 0.984 chronic myelomonocytic leukemia
074 0000089 0.984 myelodysplastic syndr. with 5q
076 0056558 1.000 neoplasm, malignant

As specified in the Limited-Use Data Agreement, the citation for the SEER data
is as follows:

Surveillance, Epidemiology, and End Results (SEER) Program
(www.seer.cancer.gov) Limited-Use Data (1973-2005), National Cancer
Institute, DCCPS, Surveillance Research Program, Cancer Statistics
Branch, released April 2008, based on the November 2007 submission.

Once we have the columned data, we can easily produce a graphic that represents the salient features we would like to emphasize.


In the next few blogs, I'll explain the clinical significance of this little demo project, and I'll describe the free, open source, techniques that I used to extract and compile the data. Afterwards, I'll show you how we can drill into the data to refine the questions we can ask, so that we can draw conclusions that can stand up to critical inspection.

For Perl and Ruby programmers, methods and scripts for using SEER and other publicly available biomedical databases, are described in detail in my prior books:

Perl Programming for Medicine and Biology

Ruby Programming for Medicine and Biology

- © 2008 Jules Berman

As specified in the SEER Data Agreement, the citation for the SEER data is as follows:

"Surveillance, Epidemiology, and End Results (SEER) Program (www.seer.cancer.gov) Limited-Use Data (1973-2005), National Cancer Institute, DCCPS, Surveillance Research Program, Cancer Statistics Branch, released April 2008, based on the November 2007 submission."

As with all of my scripts, lists, web sites, and blog entries, the following disclaimer applies. This material is provided by its creator, Jules J. Berman, "as is", without warranty of any kind, expressed or implied, including but not limited to the warranties of merchantability, fitness for a particular purpose and noninfringement. in no event shall the author or copyright holder be liable for any claim, damages or other liability, whether in an action of contract, tort or otherwise, arising from, out of or in connection with the material or the use or other dealings in the material.

In June, 2014, my book, entitled Rare Diseases and Orphan Drugs: Keys to Understanding and Treating the Common Diseases was published by Elsevier. The book builds the argument that our best chance of curing the common diseases will come from studying and curing the rare diseases.



I urge you to read more about my book. There's a generous preview of the book at the Google Books site. If you like the book, please request your librarian to purchase a copy of this book for your library or reading room.