Sunday, November 23, 2008

Using SEER Public-Use Data: 9

I have been writing a series of blogs on SEER, the U.S. National Cancer Institute's Surveillance, Epidemiology and End Results program. SEER is an amazing resource for information on the cancers that occur in the U.S. One of the products of SEER is the Public Use dataset, which contains de-identified records on over 3.5 million cancers that have occurred between 1973 and 2005.

When you have 3.5 million cancer cases to study, you can draw certain types of inferences that could not possibly be made with the data accumulated at a single medical institution.

Today, we'll look at the neoplasms that occur in the related anatomic sites: pleura, peritoneum, retro-peritoneum, and pelvis.

Here are the SEER listings. The left-hand column is the number of occurrences. Nneoplasms with fewer than 20 SEER cases were considered un-informative and were omitted from the lists. The second column is the average age of occurrence. The column on the right is the ICD-O term.

SEER captures malignant neoplasms. In the SEER data set, we saw that the following distribution of cases, by occurrences at anatomic site:

PLEURA = 6,138 cases
PERITONEUM = 3,067 cases
PELVIS = 470 cases

The pleura is the mesothelial-lined cavity of the chest, surrounding and covering the heart and the lungs. The peritoneum is the mesothelial-lined cavity of the abdomen, surrounding and overing all or part of the intestines and other organs of the abdomen (e.g., liver, spleen, pancreas).

The pleura accounts for many more occurrences of malignant tumors than does the peritoneum. Only a few different tumors account for the vast majority of malignant neoplasms arising in the pleura.

of Age Neoplasm name
0024 068 sarcoma nos
0038 071 ml, large b-cell, diffuse
0110 070 neoplasm, malignant
0236 073 mesothelioma, biphasic type, malignant
0433 069 fibrous mesothelioma, malignant
1079 068 epithelioid mesothelioma, malignant
4027 069 mesothelioma, malignant

The peritoneum, with about half as many cancer occurrences as the pleura, has more types of tumors that occur in significant numbers (20 or more), including the tumors that arise from the surface of the ovaries (e.g. papillary serous cystadenocarcinoma).

TOTAL = 3067
of Age Neoplasm name
022 062 liposarcoma nos
023 062 gastrointestinal stromal sarcoma
031 067 sarcoma nos
032 064 carcinoid tumor, malignant
038 085 endometrioid carcinoma
045 061 mucinous adenocarcinoma
046 066 fibrous histiocytoma, malignant
047 067 mullerian mixed tumor
053 066 neoplasm, malignant
073 072 carcinoma nos
073 063 ml, large b-cell, diffuse
097 068 papillary adenocarcinoma nos
127 063 leiomyosarcoma nos
134 062 epithelioid mesothelioma, malignant
180 065 serous cystadenocarcinoma nos
245 067 adenocarcinoma nos
359 066 serous surface papillary carcinoma
451 067 papillary serous cystadenocarcinoma
551 062 mesothelioma, malignant

The retroperitoneum, is the collection of tissues that lie between the peritoneal lining and the surface wall of the abdomen. The retroperitoneum is often referred to as the retroperitoneal space. This is not the best term, as it calls to mind a body cavity (space), perhaps lined by mesothelium, and this is not the case. The retroperitoneum is mostly fat, connective tissue, and organs. Fully retroperitoneal organs, such as the kidney and attached adrenals, can drop a little bit, along the potential space of its surronding fascia, but that's about the closest thing to a space that the retroperitoneum can offer. Organs of the abdomen that are slapped tightly against the posterior wall of the peritoneum are, technically, retroperitoneal (such as the ascending and descending colon, and the rectum). Organs or parts of organs that dangle in the abdomen (such as the transverse colon), are fully peritoneal.

For the purposes of collecting data on retroperitoneal neoplasms, the tumors that arise from identifiable organs (e.g., kidney, head of pancreas, adrenals, rectum) are assigned to those organs, in the SEER dataset, and NOT to the retroperitoneum. This leaves, for the most part, soft tissue tumors, muscle tumors and nerve tumors arising in the retroperitoneum. There are a great variety of these tumors, even when we restrict our list to those tumors that occur with a frequency of 20 or greater.

of Age Neoplasm name
020 049 rhabdomyosarcoma nos
020 020 endodermal sinus tumor
022 021 teratoma, malignant nos
022 065 epithelioid leiomyosarcoma
023 010 embryonal rhabdomyosarcoma
024 069 mesothelioma, malignant
025 054 mesenchymoma, malignant
027 057 hemangiopericytoma, malignant
030 028 embryonal carcinoma nos
034 064 malignant lymphoma nos
035 048 neurofibrosarcoma
038 058 mixed type liposarcoma
040 066 malignant lymphoma, non hodgkin's type
043 045 neurilemmoma, malignant
045 044 seminoma nos
051 012 ganglioneuroblastoma
055 058 fibrosarcoma nos
060 066 pleomorphic liposarcoma
075 063 spindle cell sarcoma
111 063 dedifferentiated liposarcoma
127 067 neoplasm, malignant
140 062 myxoid liposarcoma
142 064 ml, large b-cell, diffuse
222 060 liposarcoma, well differentiated type
223 062 sarcoma nos
255 004 neuroblastoma nos
298 063 liposarcoma nos
311 063 fibrous histiocytoma, malignant
622 062 leiomyosarcoma nos

The pelvis is a commonly used anatomic term that creates much confusion. It is sometimes described as the bowl-like invagination in the lower abdomen, or it may be described as the structures that support the lower abdomen, or it may be described as the set of bones that create the framework of the bowl-like invagination.

It is very difficult to assign neoplasms to the pelvis, because tumors arising in this area can best be assigned to the bones in which they are found, or to the peritoneum, or to the retroperitoneum. The difficulty of assigning neoplasms to the pelvis becomes apparent when we see that of the approximately 3.5 million cases in the SEER dataset, there are only 470 cases assigned to the pelvis, and of these cases, most seem to arise, more specifically, from the peritoneum (papillary serous cystadenocarcinoma) or the uterine cervix (squamous cell carcinoma), or the intestines (adenocarcinoma).

TOTAL = 470
of Age Neoplasm name
023 069 papillary serous cystadenocarcinoma
025 069 squamous cell carcinoma nos
050 075 carcinoma nos
061 066 adenocarcinoma nos
126 078 neoplasm, malignant

There is a tautologic remark that pathologists use. "Common tumors occur commonly, and uncommon tumors occur uncommonly." This means that a pathologist should be cautious when assigning a diagnosis that rarely occurs at the location where the tumor has arisen.

To know which tumors commonly occur, at what sites, at what ages, in what ethnic populations, it is very useful to have a large collection of neoplasms from which to study, and to have a good understanding of the frequency of occurrence of the neoplasms that arise at the site. The SEER data set permits such determinations.

In a prior blog, I discussed some of the simple Perl routines used in the SEER-data parsing algorithms, and these are available from my web site.

If you want to do creative data mining, you will need to learn a little computer programming.

For Perl and Ruby programmers, methods and scripts for using SEER and other publicly available biomedical databases, are described in detail in my prior books:

Perl Programming for Medicine and Biology

Ruby Programming for Medicine and Biology

An overview of the many uses of biomedical information is available in my book,
Biomedical Informatics.

More information on cancer is available in my recently published book, Neoplasms.

© 2008 Jules Berman

key words: neoplasms, cancer, neoplasia, precancer, tumor, tumour, tumors, tumours, neoplasm, carcinogenesis, carcinogens, tumor genetics, myelodysplastic syndromes, IEN, pre-cancer, preneoplastic lesions, preneoplasia

As specified in the SEER Data Agreement, the citation for the SEER data is as follows:

"Surveillance, Epidemiology, and End Results (SEER) Program ( Limited-Use Data (1973-2005), National Cancer Institute, DCCPS, Surveillance Research Program, Cancer Statistics Branch, released April 2008, based on the November 2007 submission."

The image of the retroperitoneum was taken from a reproduction of a Gray's Anatomy image, provided by Wikipedia. The copyright on Gray's Anatomy has expired, and the image is in the public domain.

As with all of my scripts, lists, web sites, and blog entries, the following disclaimer applies. This material is provided by its creator, Jules J. Berman, "as is", without warranty of any kind, expressed or implied, including but not limited to the warranties of merchantability, fitness for a particular purpose and noninfringement. in no event shall the author or copyright holder be liable for any claim, damages or other liability, whether in an action of contract, tort or otherwise, arising from, out of or in connection with the material or the use or other dealings in the material.

In June, 2014, my book, entitled Rare Diseases and Orphan Drugs: Keys to Understanding and Treating the Common Diseases was published by Elsevier. The book builds the argument that our best chance of curing the common diseases will come from studying and curing the rare diseases.

I urge you to read more about my book. There's a generous preview of the book at the Google Books site. If you like the book, please request your librarian to purchase a copy of this book for your library or reading room.