Tuesday, April 3, 2007

More Introduction to RDF

As discussed in an earlier post, RDF (Resource Description Framework) is a formal method for describing specified data objects with paired metadata and data.

It is important to understand that in informatics, assertions only have meaning when a pair of metadata and data (the descriptor for the data and the data itself) is assigned to a specific subject.

The "triples" that form the basis of RDF specifications are: Specified subject then metadata then data

Examples of triples that might be found in a medical dataset:

“Jules Berman” “blood glucose level” “85”
“Mary Smith” “blood glucose level” “90”
“Samuel Rice” “blood glucose level" "200"
“Jules Berman” “eye color” “brown”
“Mary Smith” “eye color” “blue”
“Samuel Rice” “eye color" "green"

Some triples found in a haberdasher's dataset

“Juan Valdez” “hat size” “8”
“Jules Berman” “hat size” “9”
“Homer Simpson” “hat size” “9”
“Homer Simpson” “hat_type” “bowler”

Triples collected from both datasets whose subject is "Jules Berman"

“Jules Berman” “blood glucose level” “85”
“Jules Berman” “eye color” “brown”
“Jules Berman” “hat size” “9”

This is a simple example of data integration over heterogeneous datasets!

Triples can port their meaning between different databases because they bind described data to a specified subject. This supports data integration of heterogeneous data and facilitates the design of software agents. A software agent, as used here, is a program that can interrogate multiple RDF documents on the web, initiating its own actions based on inferences yielded from retrieved triples.

RDF (Resource Description Framework) is a syntax for writing computer-parsable triples. For RDF to serve as a general method for describing data objects, we need to answer the following four questions:.

1. How does the triple convey the unique identity of its subject? In the triple, “Jules Berman” “blood glucose level” “85”, The name "Jules Berman" is not unique and may apply to several different people.

2. How do we convey the meaning of metadata terms? Perhaps one person's definition of a metadata term is different from another person's. For example, is "hat size" the diameter of the hat, or the distance from ear to ear on the person who is intended to wear the hat, or a digit selected from a pre-defined scale?

3. How can we constrain the values described by metadata to a specific datatype? Can a person have an eye color of 8? Can a person have an eye color of "chartreuse"?

4. How can we indicate that a unique object is a member of a class and can be described by metadata shared by all the members of a class?

In subsequent blog posts, we'll examine how RDF provides answers to these four questions.

-Jules Berman tags: data integration, meaning, rdf, specifications, standards, triples, science
Science is not a collection of facts. Science is what facts teach us; what we can learn about our universe, and ourselves, by deductive thinking. From observations of the night sky, made without the aid of telescopes, we can deduce that the universe is expanding, that the universe is not infinitely old, and why black holes exist. Without resorting to experimentation or mathematical analysis, we can deduce that gravity is a curvature in space-time, that the particles that compose light have no mass, that there is a theoretical limit to the number of different elements in the universe, and that the earth is billions of years old. Likewise, simple observations on animals tell us much about the migration of continents, the evolutionary relationships among classes of animals, why the nuclei of cells contain our genetic material, why certain animals are long-lived, why the gestation period of humans is 9 months, and why some diseases are rare and other diseases are common. In “Armchair Science”, the reader is confronted with 129 scientific mysteries, in cosmology, particle physics, chemistry, biology, and medicine. Beginning with simple observations, step-by-step analyses guide the reader toward solutions that are sometimes startling, and always entertaining. “Armchair Science” is written for general readers who are curious about science, and who want to sharpen their deductive skills.