Wednesday, January 20, 2016

TRIPLES: THE BASIC UNIT OF MEANING IN DATA SCIENCE

Data, by itself, has no meaning. It is the job of the data scientist to assign meaning to data, and this is done with data objects, triples, and classifications. Our most familiar data constructions (eg, spreadsheets, relational databases, flat-file records) convey meaning through triples and data objects; we just don’t perceive them as such.

The three conditions for a meaningful assertion are:
1. There is a specific data object about which the statement is made.
2. There is data that pertains to the specified object.
3. There is metadata that describes the data
Simply put, assertions have meaning whenever a pair of metadata and data (the descriptor for the data and the data itself) is assigned to a specific object. In the informatics field, assertions come in the form of so-called triples, consisting of the object, then the metadata, and then the data.

Here are some examples of triples, as they might occur in a medical dataset:
"Jules Berman" "blood glucose level" "85"
"Mary Smith" "blood glucose level" "90"
"Samuel Rice" "blood glucose level" "200"
"Jules Berman" "eye color" "brown"
"Mary Smith" "eye color" "blue"
"Samuel Rice" "eye color" "green"
Here are a few triples, as the might occur in a haberdasher’s dataset
"Juan Valdez" "hat size" "8"
"Jules Berman" "hat size" "9"
"Homer Simpson" "hat size" "9"
"Homer Simpson" "hat_type" "bowler"
We can combine the triples from a medical dataset and a habderdasher’s data set that apply to a common object:
"Jules Berman" "blood glucose level" "85"
"Jules Berman" "eye color" "brown"
"Jules Berman" "hat size" "9"
Triples can port their meaning between different databases because they bind described data to an object. The portability of triples permits us to achieve data integration of heterogeneous data, and facilitates the design of software agents. Data integration involves merging related data objects, across diverse data sets. As it happens, if data supports introspection and data is organized as meaningful assertions (ie, as identified triples), then data integration is implicit (ie, an intrinsic property of the data). In essence, data integration is awarded to data scientists who apply data simplification techniques.

- Jules Berman (copyrighted material)

key words: triple, meaning, informatics, computer science, data integration, heterogeneous data, jules j berman