Monday, November 17, 2008

Using SEER Public Use Data: 4

This blog continutes a series on the SEER Public Use Data files.

In a prior blog, I showed how to do a global search through the SEER public use neoplasm incidence files, producing a list of every neoplastic entity captured in the approx 3.5 million SEER cases, and categorizing them by ethnicity, average age of occurrence, number of occurrences, and other data culled from the master data sets.

Reviewing the output, I found that the Hispanic and White populations had a greater number of occurrences of germ cell tumors (an uncommon type of tumor) than the African-American population.

I went to the SEER site and used SEER's public query engine to see if this observation could be verified.

The SEER query site is:

All queries begin here. I looked for tumor in males, in testes, comparing Hispanic non-whites with African-Americans. A simple interface permits these selections.

The SEER interface produces a list of your input parameters.

The same interface produces a bar chart of your results:

You may be wondering, if I am interested in germ cell tumors, why did I do a query on tumors of the testes. I did this because the SEER interface does not allow me to do a query on specific types of germ cell tumors or of any specific testicular tumor. I know that most testicular tumors are germ cell tumors, so I settled, figuring that if there were a difference in the incidence of germ cell tumors in the Hispanic and the African-American populations, it would show up in the query.

And that's what happened. The SEER output demonstrated that white Hispanics had a much higher incidence of testicular tumors (and, presumably, testicular germ cell tumors) than the African-American population.

If I want to find the ratio for specific tumors, I need to do a little more work. A simple Perl script produced the following list:

2.031 0173 026 germinoma
2.756 0005 038 intratubular malignant germ cells *
2.756 0005 025 malignant teratoma, undifferentiated type *
3.409 0104 019 teratoma, malignant nos
3.478 1125 035 seminoma nos
4.452 0054 036 seminoma, anaplastic type
4.757 0157 028 teratocarcinoma
5.053 0060 027 choriocarcinoma combined with teratoma
5.168 0018 050 spermatocytic seminoma *
5.523 0299 028 embryonal carcinoma nos
5.548 0363 028 mixed germ cell tumor
6.389 0028 027 germ cell tumor, nonseminomatous

* Cases with asterisks have too few cases (second column)
for any significance

The left column is the ratio of cases per total population of white Hispanics divided by the same ratio for African-Americans. The second column is the number of cases, the third column is the average age of cases, and the final column is the ICD-0 term for the neoplasm.

We see (column 1) that white Hispanics have a higher case ratio for every type of germ cell tumor in males (regardless of site).

This tells us a few things. First, that all of the germ cell tumors are related to each other by more than histogenesis (cell of origin). They must have a relationship that extends to causation and development. Second, it tells us that the relatively high level of occurrence of germ cell tumors in white Hispanics is not just a fluke occurring in one cancer of one particular site. It is a consistent phenomenon that extends to several different related tumors and their histologic variants.

I should stress that germ cell tumors are rare, even in the Hispanic population. We are discussing relative rates of uncommon tumors among different ethnicities. An individual's risk of developing a germ cell tumor is low, regardless of ethnicity.

For Perl and Ruby programmers, methods and scripts for using SEER and other publicly available biomedical databases, are described in detail in my prior books:

Perl Programming for Medicine and Biology

Ruby Programming for Medicine and Biology

More information on cancer is available in my recently published book, Neoplasms: Principles of Development and Diversity.

- © 2008 Jules Berman

key words: neoplasms, cancer, neoplasia, precancer, tumor, tumour, tumors, tumours, neoplasm, carcinogenesis, carcinogens, tumor genetics

As requested in the SEER Public-Use Agreement, the SEER citation is included here:

"Surveillance, Epidemiology, and End Results (SEER) Program ( Limited-Use Data (1973-2005), National Cancer Institute, DCCPS, Surveillance Research Program, Cancer Statistics Branch, released April 2008, based on the November 2007 submission."

As with all of my scripts, lists, web sites, and blog entries, the following disclaimer applies. This material is provided by its creator, Jules J. Berman, "as is", without warranty of any kind, expressed or implied, including but not limited to the warranties of merchantability, fitness for a particular purpose and noninfringement. in no event shall the author or copyright holder be liable for any claim, damages or other liability, whether in an action of contract, tort or otherwise, arising from, out of or in connection with the material or the use or other dealings in the material.

No comments: