Friday, December 26, 2008

Cancer occurrence by age: distributions and schematics

I just posted, on my web site, a pdf document that compiles age occurrence data for the cancers included in the SEER public use data records (about 3.5 million records including over 700 kinds of cancers collected from 1973-2005).

For each cancer, I binned the number of occurrences of cancers into 5-year intervals, beginning with ages 0-5 and ending with ages 95 and above.

Specifically, each number following the name of the cancer is followed by 20 sequential intervals:


In the document, a schematic representation follows each raw distribution.

The document provides pathologists with a guideline for the expected occurrences of cancers, by age. A good pathologist should be very careful when he/she assigns a diagnosis that does not "fit" the typical age profile of a cancer.

Epidemiologists may benefit by having a single source, indicating the likelihood of any types of specific cancer, in different age populations.

Researchers may, when reviewing all of the distributions at once, develop new questions and hypotheses that could not have been perceived through piecemeal observations.

The document is available at:

In the next few blogs, I'll provide excerpts from the document and explain how the document can be used for clinical and research purposes.

-© 2008 Jules Berman
My book, Principles of Big Data: Preparing, Sharing, and Analyzing Complex Information was published in 2013 by Morgan Kaufmann.

I urge you to explore my book. Google books has prepared a generous preview of the book contents.

tags: big data, metadata, data preparation, data analytics, data repurposing, datamining, data mining, epidemiology