My January 2 blog, I introduced the subject of bimodal cancers. These are cancers that have two peaks in occurrences, by age. In the blog, I included images of the type of age-distribution graphs seen with bimodal and multimodal cancers.
Examples of recognized bimodal cancers are Hodgkin lymphoma (which has two peaks in occurrence: in young adults and in middle-aged adults), and Kaposi's sarcoma (which has two peaks in occurrence: in young people, with AIDS, and in older men, unassociated with AIDS).
The shape of the curve of cancer occurrences, by age, for the different types of cancer, is a fascinating puzzle. If we understand why some cancer curves are bimodal, we can enhance our knowledge of carcinogenesis (the developmental process of cancer) and tumor diagnosis (the features that identify a cancer and that separate a particular type of cancer from all other types of cancer). We can also learn a lot about the meaning of the data that we collect on cancers, and the ways that this data can be analyzed. Most importantly, the insights gained can save lives, by uncovering preventable cancers, and by finding new classes and subclasses of cancer that may benefit from innovative cancer treatments.
Here are the causes for cancer multimodality (multiple peaks in a graph of cancer occurrences by age)
1. Multiple environmental causes targeting different ages
2. Multiple genetic causes with different latencies
3. Multiple diseases classified under one name
4. Faulty or insufficient data
5. Combinations of 1,2,3 and 4
We see examples of all of these possibilities, in the SEER data, and in previously published studies of specific tumors.
We know that specific exposures to a site-specific carcinogen can create spikes in the occurrence of cancers in a particular subpopulation. For example, high-school boys who play baseball sometimes chew tobacco. It helps them maintain focus on their game, and it gives them something to do when they're sitting in the batter's cage. They typically have a favorite spot in the mouths, between the cheek and the gum, where they stick their "chaw". This is the most likely spot for cancer to occur. Cancers caused by chewing tobacco may occur in teen-agers and young adults. A specific type of high-risk behavior, such as tobacco chewing, can create an early peak in incidence for a tumor that normally occurs in a much older age group.
Some tumors have genetic and non-genetic (sporadic) causes. The best-studied example is probably retinoblastoma. Some people are born with mutations that predispose them to develop retinoblastoma. These people typically develop tumors at a very early age. Those who develop retinoblastoma without the inborn genetic mutation [who acquire mutations later in life] typically develop retinoblastoma at a later age.
We also see multimodal distributions when we mistakenly call several different kinds of cancer by the same name. For example, lung cancer in young persons may have a specific mutation that distinguishes it from lung cancer occurring in an older population (Midline carcinoma of Children and Young adults has a characteristic gene arrangement involving the NUT gene). This cancer is separatble from bronchogenic carcinoma of the lung, occurring in older persons. It may turn out that lung cancer of the young may respond to a different treatment than lung cancers caused by smoking.
Finally, we must consider that it is possible that the multimodal curves are simply an artifact produced by the way we collect and analyze data. If the pathologists who rendered the diagnoses, used in the SEER data set, were wrong (i.e., rendered misdiagnoses), we would expect multimodality on that basis (representing the different tumors included under a category that should have included only one kind of cancer).
This actually happens. The best example is malignant fibrous histiocytoma. Current thinking is that this diagnostic entity has been used as a a grab-bag diagnosis for sarcomas that do not fit well into any particular category. There is substantial evidence that many cases of malignant fibrous histiocytoma would have been better diagnosed as leiomyosarcomas or liposarcomas or fibrosarcomas, and a host of rare sarcomas, each with its own characteristic age distribution. By blending these different tumors under a single name, you also blend the age distributions of the reported population.
I prepared a document of bimodal cancer distributions (raw data, normalized data, and graphs). In this document, data on each tumor of a given name was collected, without pre-stratifying tumors based on gender, ethnicity, or anatomic site. Had we done so, we might have found that what we thought was a single tumor may have contained several different tumors (e.g., medullary carcinoma of breast and medullary carcinoma of thyroid). The artifactual aggregation of different tumors under a single name by ignoring well-known distinguishing demographic or anatomic factors, is a potential source of confusion. In later blogs, we'll see some simple ways of eliminating obvious sources of error from our analyses of bimodal populations.
Whitley and Ball have discussed a number of reasons, related to the collection of data, for multimodal peaks.
Elise Whitley Jonathan Ball. Statistics review 1: Presenting and summarising data. Crit Care. 6:66-71, 2002.
"a (bimodal)distribution with two peaks may actually be a combination of two uni-modal distributions (such as hormone levels in men and women). Alternatively, a (multimodal) distribution with multiple peaks may be due to digit preference (rounding observations up or down) during data collection, where peaks appear at round numbers, for example peaks in systolic blood pressure at 90, 100, 110, 120 mmHg, and so on."
Despite these considerations, there are many reasons to believe that many of the the bimodal distributions, found in the SEER data sets, reveal true biological features of the cancer populations.
Reasons why the SEER bimodal graphs are non-artifactual
1. The multimodal peaks are rare among cancers. Of the more than 650 cancers collected in the complete file of cancer occurrences by age, only a couple dozen show multmodality. If there were a consistent error in the way that data were collected, would you not expect to see the same error in the majority of cancer distributions?
2. The SEER data reproduces multimodal peaks in the same cancers for which multimodal peaks have been established from other data sources. For example, the SEER data shows bimodal peaks for Hodgkin lymphoma, Kaposi sarcoma, and secretory carcinoma of the breast.
3. The SEER data provides very large numbers of cases for many of the cancers for which bimodal peaks are found. The shape of the curves cannot be attributed to sparse data, in these cases.
4. As we will see in future blog posts, when we examine the standard devation of the bimodal peaks, and their modes, statistical analysis rejects the null hypotheses (that the observations can be accounted for with by a single population).
5. We will also see that there is internal consistency of the observation of multimodality within the SEER data. In some cases, data is collected, within SEER, on a single tumor, under different names (for example the borderline tumors of the ovary are listed under several closely related terms, as are craniopharyngiomas). In these cases, multimodality is preserved among the same type of cancer, even when the data is collected under different terms.
The persistent message is that multimodality in a cancer distribution is a puzzle worth investigating.
-© 2009 Jules J. Berman
In June, 2014, my book, entitled Rare Diseases and Orphan Drugs: Keys to Understanding and Treating the Common Diseases was published by Elsevier. The book builds the argument that our best chance of curing the common diseases will come from studying and curing the rare diseases.
I urge you to read more about my book. There's a generous preview of the book at the Google Books site. If you like the book, please request your librarian to purchase a copy of this book for your library or reading room.
tags: orphan disease, orphan drugs, rare disease, disease genetics, genetics, bimodal, epidemiology, neoplasms, seer, pathogenesis, subsets of disease