Monday, March 21, 2016


Blog readers can use the discount code: COMP315 for a 30% discount, at checkout.

On March 17, 2016, my book Data Simplification: Taming Information with Open Source Tools was published by Morgan Kaufmann, an imprint of Elsevier. [the Elsevier site indicates that the book is still on preorder, buy you can ignore that]. This past month, I've posted on topics relevant to data simplification. Beginning tomorrow, I'll be moving onto new subjects for this blog site, but I wanted to make one additional comment for anyone who might be on the fence about buying this book.

Most large data projects are total failures (1-21). Furthermore, in my humble opinion, most data projects that are deemed successes at the time of completion are actually failures of a kind, because the data that was collected during the project was abandoned when the project ended. Data shouldn't die. Data should be prepared in a manner that permits anyone (not just the people who planned the project) to confirm the conclusions, to reanalyze the data, to merge the data with other data sources, and to repurpose the data for future projects. To do so, the data must be prepared in a manner that is comprehensible and simplified. My book provides open source tools for creating data that can be used and repurposed, by generations of data scientists.

Enough said! Tomorrow, we move on.

- Jules Berman

key words: computer science, data analysis, data repurposing, data simplification, data science, information science, simplifying data, taming data, jules j berman


[1] Kappelman LA, McKeeman R, Lixuan Zhang L. Early warning signs of IT project failure: the dominant dozen. Information Systems Management 23:31-36, 2006.

[2] Arquilla J. The Pentagon's biggest boondoggles. The New York Times (Opinion Pages) March 12, 2011.

[3] Lohr S. Lessons From Britain's Health Information Technology Fiasco. The New York Times Sept. 27, 2011.

[4] Dismantling the NHS national programme for IT. Department of Health Media Centre Press Release. September 22, 2011. Available from: viewed June 12, 2012.

[5] Whittaker Z. UK's delayed national health IT programme officially scrapped. ZDNet September 22, 2011.

[6] Lohr S. Google to end health records service after it fails to attract users. The New York Times Jun 24, 2011.

[7] An assessment of the impact of the NCI cancer Biomedical Informatics Grid (caBIG). Report of the Board of Scientific Advisors Ad Hoc Working Group, National Cancer Institute, March, 2011.

[8] Heeks R, Mundy D, Salazar A. Why health care information systems succeed or fail. Institute for Development Policy and Management, University of Manchester, June 1999 Available from:, viewed July 12, 2012.

[9] Brooks FP. No silver bullet: essence and accidents of software engineering. Computer 20:10-19, 1987.

[10] Unreliable research: Trouble at the lab. The Economist October 19, 2013.

[11] Kolata G. Cancer fight: unclear tests for new drug. The New York Times April 19, 2010.

[12] Ioannidis JP. Why most published research findings are false. PLoS Med 2:e124, 2005.

[13] Baker M. Reproducibility crisis: Blame it on the antibodies. Nature 521:274-276, 2015.

[14] Naik G. Scientists' Elusive Goal: Reproducing Study Results. Wall Street Journal December 2, 2011.

[15] Innovation or Stagnation: Challenge and Opportunity on the Critical Path to New Medical Products. U.S. Department of Health and Human Services, Food and Drug Administration, 2004.

[16] Hurley D. Why Are So Few Blockbuster Drugs Invented Today? The New York Times November 13, 2014.

[17] Ioannidis JP. Microarrays and molecular research: noise discovery? The Lancet 365:454-455, 2005.

[18] Vlasic B. Toyota's slow awakening to a deadly problem. The New York Times, February 1, 2010.

[19] Lanier J. The complexity ceiling. In: Brockman J, ed. The next fifty years: science in the first half of the twenty-first century. Vintage, New York, pp 216-229, 2002.

[20] Labos C. It Ain't Necessarily So: Why Much of the Medical Literature Is Wrong. Medscape News and Perspectives. September 09, 2014

[21] Gilbert E, Strohminger N. We found only one-third of published psychology research is reliable - now what? The Conversation. August 27, 2015. Available at:, viewed on August 27,2015.

No comments: