Sunday, June 2, 2013

Big Data Versus Massive Data

This post is based on a topic covered in Big Data: Preparing, Sharing, and Analyzing Complex Information, by Jules J Berman.

In yesterday's blog, we discussed the differences between Big Data and small data.  Today,   I wanted to briefly discuss the differences between Big Data and massive data.

Big Data is defined by the three v's: 

1. Volume - large amounts of data;.

2. Variety - the data comes in different forms, including traditional databases, images, documents, complex records;.

3. Velocity - the content of the data is constantly changing, through the absorption of complementary data collections, through the introduction of previously archived data or legacy collections, and from streamed data arriving from multiple sources. 

It is important to distinguish Big Data from "lotsa data" or "massive data".  In a Big Data Resource, all three v's must apply.  It is the size, complexity, and restlessness of Big Data resources that account for the methods by which these resources are designed, operated, and analyzed.

The term "massive data" or "lotsa data" is often applied to enormous collections of simple-format records.  Massive datasets are typically equivalent to enormous spreadsheets (2-dimensional tables of columns and rows), mathematically equivalent to an immense matrix. For scientific purposes, it is sometimes necessary to analyze all of the data in a matrix, all at once.  The analyses of enormous matrices is computationally intensive, and may require the resources of a supercomputer. 

Big Data resources are not equivalent to a large spreadsheet, and a Big Data resource is seldom analyzed in its totality.  Big Data analysis is a multi-step process whereby data is extracted, filtered, and transformed, with analysis often proceeding in a piecemeal, sometimes recursive, fashion.  

If you read  Big Data: Preparing, Sharing, and Analyzing Complex Information you will find that the gulf between massive data and Big Data is profound; the two subjects can seldom be discussed productively within the same venue.

- Jules Berman

key words: Big Data, lotsa data, massive data, data analysis, data analyses, large-scale data, Big Science, simple data, little science, little data, small data, data preparation, data analysis, data analyst

Science is not a collection of facts. Science is what facts teach us; what we can learn about our universe, and ourselves, by deductive thinking. From observations of the night sky, made without the aid of telescopes, we can deduce that the universe is expanding, that the universe is not infinitely old, and why black holes exist. Without resorting to experimentation or mathematical analysis, we can deduce that gravity is a curvature in space-time, that the particles that compose light have no mass, that there is a theoretical limit to the number of different elements in the universe, and that the earth is billions of years old. Likewise, simple observations on animals tell us much about the migration of continents, the evolutionary relationships among classes of animals, why the nuclei of cells contain our genetic material, why certain animals are long-lived, why the gestation period of humans is 9 months, and why some diseases are rare and other diseases are common. In “Armchair Science”, the reader is confronted with 129 scientific mysteries, in cosmology, particle physics, chemistry, biology, and medicine. Beginning with simple observations, step-by-step analyses guide the reader toward solutions that are sometimes startling, and always entertaining. “Armchair Science” is written for general readers who are curious about science, and who want to sharpen their deductive skills.