Monday, January 25, 2016

A Data Science Definition of Semantics

Semantics is the study of meaning (Greek root, semantikos, signficant meaning). In the context of data science, semantics is the technique of creating meaningful assertions about data objects. A meaningful assertion, as used here, is a triple consisting of an identified data object, a data value, and a descriptor for the data value. In practical terms, semantics involves making assertions about data objects (ie, making triples), combining assertions about data objects (ie, merging triples), and assigning data objects to classes; hence relating triples to other triples. As a word of warning, few informaticians would define semantics in these terms, but most definitions for semantics are functionally equivalent to the definition offered here.

Most language is unstructured and meaningless. Consider the assertion: Sam is tired. This is an adequately structured sentence with a subject verb and object. But what is the meaning of the sentence? There are a lot of people named Sam. Which Sam is being referred to in this sentence? What does it mean to say that Sam is tired? Is "tiredness" a constitutive property of Sam, or does it only apply to specific moments in Sam's life? If so, for what moment in time is the assertion, "Sam is tired" actually true? To a computer, meaning comes from assertions that have a specific, identified subject associated with some sensible piece of fully described data (metadata coupled with the data it describes).

- Jules Berman (copyrighted material)

key words: meaning, data science, informatics, triple, semantics, jules j berman