Sunday, January 17, 2016

Data Specifications, and how they differ from Data Standards

A specification is a formal method for describing objects (physical objects such as nuts and bolts or symbolic objects, such as numbers, or concepts expressed as text). In general, specifications do not require the inclusion of specific items of information (ie, they do not impose restrictions on the content that is included in or excluded from documents), and specifications do not impose any order of appearance of the data contained in the document (ie, you can mix up and rearrange specified objects, if you like). Specifications are not generally certified by a standards organization. They are generally produced by special interest organizations, and the legitimacy of a specification depends on its popularity. Examples of specifications are RDF (Resource Description Framework) produced by the W3C (World Wide Web Consortium), and TCP/IP (Transfer Control Protocol/Internet Protocol), maintained by the Internet Engineering Task Force. The most widely implemented specifications are simple and easily implemented.

Data standards, in general, tell you what must be included in a conforming document, and, in most cases, dictate the format of the final document. In many instances, standards bar inclusion of any data that is not included in the standard (eg, you should not include astronomical data in a standard clinical X-ray report). Specifications simply provide a formal way for describing the data that you choose to include in your document. XML and RDF, a semantic dialect of XML, are examples of specifications. They both tell you how data should be represented, but neither tell you what data to include, or how your document or data set should appear. Files that comply with a standard are rigidly organized and can be easily parsed and manipulated by software specifically designed to adhere to the standard. Files that comply with a specification are typically self-describing documents that contain within themselves all the information necessary for a human or a computer to derive meaning from the file contents. In theory, files that comply with a specification can be parsed and manipulated by generalized software designed to parse the markup language of the specification (eg,XML, RDF) and to organize the data into data structures defined within the file.

- Jules Berman (copyrighted material)

key words: standard, xml, rdf, resource description framework, jules j berman