Friday, January 29, 2016

Misinterpretation of Results: The Most Pervasive Error in Data Science

The most common source of scientific errors are post-analytic, arising from the interpretation of data (1), (2), (3), (4), (5), (6). Pre-analytic errors and analytic errors, though common, are much less frequently encountered than interpretation errors. Virtually every journal article contains, hidden in the introduction and discussion sections, some distortion of fact or misleading assertion. Scientists cannot be objective about their own work. As humans, we tend to interpret observations to reinforce our beliefs and prejudices and to advance our agendas.

One of the most common strategies whereby scientists distort their own results, is to contrive self-serving conclusions; a process called message framing (7). In message framing, a scientist draws the his or her preferred conclusion, omitting from their discussion any pertinent findings that might diminish or discredit their own conclusions. The common practice of message framing is conducted on a subconscious, or at least a sub-rational, level. A scientist is not apt to read articles whose conclusions contradict his own hypotheses and will not cite disputatious works. Furthermore, if a paradigm is held in high esteem by a majority of the scientists in a field, then works that contradict the paradigm are not likely to pass peer review. Hence, it is difficult for contrary articles to be published in scientific journals. In any case, the message delivered in a journal article is almost always framed in a manner that promotes the author's interpretation.

It must be noted that throughout human history, no scientist has ever gotten into any serious trouble for misinterpreting results. Scientific misconduct comes, as a rule, from the purposeful production of bad data, either through falsification, fabrication, or through the refusal to remove and retract data that is known to be false, plagiarized, or otherwise invalid. In the U.S., allegations of research misconduct are investigated by the The Office of Research Integrity (ORI). Funding agencies in other countries have similar watchdog institutions. The ORI makes its findings a matter of public record (8). Of 150 cases investigated between 1993 and 1997, all but one case had an alleged component of data falsification, fabrication or plagiarism (9). In 2007, of the 28 investigated cases, 100% involved allegations of falsification, fabrication, or both (10). No cases of misconduct based on data misinterpretation were prosecuted (11).

Post-analytic misinterpretation of data is hard-wired into the human psyche. Agencies tasked with ensuring scientific integrity have never seriously confronted the problem of data misinterpretation. Why would they? You can't fight human nature.

In 2011, amidst much fanfare, NASA scientists announced that a new form of life was found on earth, a microorganism that thrived in the high concentrations of arsenic prevalent in Mono Lake, California. The microorganism was shown to incorporate arsenic into its DNA, instead of the phosphorus used by all other known terrestrial organisms. Thus, the newfound organism synthesized a previously unknown type of genetic material (12). NASA's associate administrator for the Science Mission Directorate, at the time, wrote, "The definition of life has just expanded." (13) The Director of the NASA Astrobiology Institute at the agency's Ames Research Center in Moffett Field, California, wrote "Until now a life form using arsenic as a building block was only theoretical, but now we know such life exists in Mono Lake." (13)

Heady stuff! Soon thereafter, other scientists tried but failed to confirm the earlier findings (14). It seems that the new life form was just another old life form, and the arsenic was a hard-to-wash cellular contaminant (11). The best scientists on the planet cannot resist the lure of a scientific interpretation that promotes their own agenda.

The first analysis of data is usually wrong and irreproducible. Erroneous results and misleading conclusions are regularly published by some of the finest laboratories in the most prestigious institutions in the world (15), (16), (17), (18), (19), (20), (21), (22), (23), (24), (25), (26), (19), (27). Every scientific study must be verified and validated, and the most effective way to ensure that verification and validation take place is to release your data for public review.


