Thursday, July 17, 2014

Pareto's Principle and Long-Tailed Distribution Curves

In June, 2014, my book, entitled Rare Diseases and Orphan Drugs: Keys to Understanding and Treating the Common Diseases was published by Elsevier. The book builds the argument that our best chance of curing the common diseases will come from studying and curing the rare diseases.

The book has an extensive glossary, that explains the meaning and relevance of medical terms appearing throughout the chapters. The glossary can be read as a stand-along document. Here is an example of one term, "Pareto's Principle", excerpted from the glossary.
Pareto’s principle - Also known as the 80/20 rule, Pareto’s principle holds that a small number of items account for the vast majority of observations. For example, a small number of rich people account for the majority of wealth. Just two countries, India plus China, account for 37% of the world population. Within most countries, a small number of provinces or geographic areas contain the majority of the population of a country (e.g., east and west coastlines of the U.S.). A small number of books, compared with the total number of published books, account for the majority of book sales.

Likewise, a small number of diseases account for the bulk of human morbidity and mortality. For example, two common types of cancer, basal cell carcinoma of skin and squamous cell carcinoma of skin, account for about 1 million new cases of cancer each year in the U.S. This is approximately the sum total for all other types of cancer combined. We see a similar phenomenon when we count causes of death. About 2.6 million people die each year in the U.S. [98]. The top two causes of death account for 1,171,652 deaths (596,339 deaths from heart disease and 575,313 deaths from cancer [99]), or about 45% of all U.S. deaths. All of the remaining deaths are accounted for by more than 7000 conditions.

Sets of data that follow Pareto’s principle are often said to follow a Zipf distribution, or a power law distribution. These types of distributions are not tractable by standard statistical descriptors because they do not produce a symmetric bell-shaped curve. Simple measurements such as average and standard deviation have virtually no practical meaning when applied to Zipf distributions. Furthermore, the Gaussian distribution does not apply, and none of the statistical inferences built upon an assumption of a Gaussian distribution will hold on data sets that observe Pareto’s principle.

I urge you to read more about my book. There's a good preview of the book at the Google Books site. If you like the book, please request your librarian to purchase a copy of this book for your library or reading room.

- Jules J. Berman, Ph.D., M.D. tags: 80/20 rule, common disease, data analysis, glossary, orphan disease, orphan drugs, rare disease, statistics