Sunday, January 31, 2016

Decoding Mayan glyphs: using data science to discover a lost civilization

"It is an amazement, how the voice of a person long dead can speak to you off a page as a living presence." - Garrison Keillor

On the Yucatan peninsula, concentrated within a geographic area that today encompasses the southeastern tip of Mexico, plus Belize, and Guatemala, a great civilization flourished. The Mayan civilization seems to have begun about 2000 BCE, reaching its peak in the so-called classic period (250 - 900 AD). Abruptly, about 900 AD, the great Mayan cities were abandoned, and the Mayan civilization entered a period of decline. Soon after the Spanish colonization of the peninsula, in the 16th century, the Mayans were subjected to a deliberate effort to erase any trace of their heritage. The desecration of the Mayans was led by a Spanish priest named Diego de Landa Calderon (1524-1579). Landa's acts against Mayan culture included:

1. The destruction of all Mayan books and literature (only a few books survived immolation).

2. The conversion of Mayans to Catholicism, in which school-children were forced to learn Roman script and Arabic numerals.

3. The importation of the Spanish Inquisition, accounting for the deaths of many Mayans who preferred their own culture over that of Landa's.

By the dawn of the 2Oth century, the great achievements of the Mayan civilization were forgotten, its cities and temples were thoroughly overgrown by jungle, its books had been destroyed, and no humans on the planet could decipher the enduring stone glyph tablets strewn through the Yucatan peninsula.

In the late twentieth century, culminating from several centuries of effort by generations of archeologists and epigraphers, the Maya glyphs were successfully decoded. The successful decoding of the Mayan glyphs and the discovery of the history and achievements of the Mayan civilization, during its classic period, is, perhaps, the most exciting legacy data project ever undertaken. The story of the resurrection and translation of the Mayan glyphs leaves us with many lessons that apply to modern-day data repurposing projects.

Maya stucco glyphs diplayed in the museum at Palenque, Mexico.
Image source: Wikipedia, public domain, from,
where there is an excellent discussion of Mayan script
Lesson 1. Success follows multiple breakthroughs, sometimes occurring over great lengths of time.

The timetable for the Mayan glyph project extends over more than three centuries.

1566 - Landa, the same man largely responsible for the destruction of the Mayan culture and language, wrote a manuscript in which he attempted to record a one-to-one correspondence between the roman alphabet and the Mayan alphabet, with the help of local Mayans. Landa had assumed that the Mayan language was alphabetic, like the Spanish language. As it happens, the Mayan language is logophonetic, with some symbols corresponding to syllables and other symbols corresponding to words and concepts. For centuries, the so-called Mayan alphabet only added to the general confusion. Eventually, Landa's notes were used, with the few surviving Mayan codices, to crack the Mayan code.

1832 - Constantine Rafinesque decoded the Mayan number system.

1880 - Forstemann, working from an office in Dresden, Germany, had access to the Dresden Codex, one of the few surviving Mayan manuscripts. Using Rafinesque's techniques to decode the numbers that appeared in the Dresden Codex, Forstemann deduced how the Mayans recorded the passage of time, and how they used numbers to predict astronomic events, with great accuracy.

1952 - Yuri Knorosov, working alone in Russia, deduced how individual glyph sympbols were used as syllables.

1958 - Tatiana Proskouriakoff, using Knorosov's syllabic approach to glyph interpretation, convincingly made the first short translations from stelae (i.e., standing stone monuments), and proved that they told the life stories of Mayan kings.

1973 - 30 Mayanists from various scientific disciplines convened at a Palenque, a Mayan site, and, through a team effort, deciphered the dynastic history of six kings.

1981 - David Stuart showed that different pictorial symbols could represent the same symbol, so long as the beginning sound of the word represented by the symbol was the same as the beginning sound of the other syllable-equivalent words. This would be analagous to a picture of a ball, a balance, and a banner, all serving as as interchangeable forms of the sound "ba".

Following Stuart's 1981 breakthrough, the Mayan code was essentially broken.

Lesson 2. Contributions comes from individuals working in isolation and individuals working as a team.

"My feeling is that as far as creativity is concerned, isolation is required." - Isaac Asimov (1)

As social animals, we tend to believe in the supremacy of teamwork. We often marginalize the contributions of individuals who work in isolation. Objective review of most large, successful projects reveals that important contributions come from individuals working in isolation, plus teams, working to accomplish goals that could not be achieved through the efforts of an individual. The task of decoding the Mayan glyphs was assisted by two key individuals, each working in isolation, thousands of miles from Mexico: Ernst Forstemann, in Germany, and Yuri Knorozov, in Moscow. It is difficult to imagine how the Mayan project could have succeeded without the contributions of these two loners. The remainder of the project was accomplished within a community of scientists who cleared the long-forgotten Mayan cities, recovered glyphs, compared the findings at the different sites, and eventually reconstructed the language. Throughout this book, we will examine legacy projects that succeeded due to the combined efforts of teams and of isolated individuals.

Lesson 3. Project contributors come from many different disciplines.

The team of 30 experts convening in Palenque, in 1973 was composed of archeologists, epigraphers, linguists, anthropologists, historians, astronomers, and ecologists.

Lesson 4. Progress was delayed due to influential naysayers.

After the Mayan numbering system had been decoded, and after it was shown that the Mayans were careful recorders of time, and astronomic events, linguists turned their attention to the fascinating legacy of the non-numeric glyphs. Try as they might, Mayanists of the mid-20th century could make no sense of the non-numeric symbols. Eric Thompson (1898 - 1975) stood as the premier Mayanist authority from the 1930s through the 1960s. After trying, and failing, to decipher the non-numeric glyphs, he concluded that these glyphs represented mystic, ornate symbols; not language. The non-numeric glyphs, in his opinion, could not be deciphered because they had no linguistic meaning. Thompson was venerated to such an extent that, throughout his long tenure of influence, all progress in the area of glyph translation was suspended. When Thompson's influence finally waned, a new group of Mayanists came forward to crack the code.

Lesson 5. Ancient legacy data conformed to modern annotation practices.

The original data had a set of properties that were conducive to repurposing: unique, identified objects (e.g., name of king, and name of city), with a time-stamp on all entries, implying the existence of a sophisticated calendar and time-keeping methods). The data was encoded in a sophisticated number system, that included the concept of zero, and was annotated with metadata (i.e., descriptions of the quantitative data. See Glossary item, Metadata).

Lesson 6. Legacy data is often highly accurate data.

Old data is often accurate data, if it is recorded at the time and place that events transpired. Records of crops, numbers of sacrifices, numbers of slaves traded, are the most objective data that we are likely to encounter. In the particular example of the astronomical data included in the Dresden Codex, Mayan astronomers accurately predicted eclipses, measuring decade-long intervals within an accuracy of several minutes.

Lesson 7. Legacy data is necessary for following trends.

There is a tendency to be dismissive of archeologic data, due to the superabundance of more recently acquired data (See Glossary item, Data archeology). A practical way to think about the value of archeological data is that if the total amount of historical data is relatively small, the absolute value of each piece of such data is high. For example, 1990 records on temperature and precipitation may not exhibit the level of detail contained in present-day meteorological files, but the 1990 files may represent the only reliable source of climate data for the era, and it may be impossible to predict long-term climate trends without historical data. Without the availability of old data to establish baseline measurements and trends, the analysis of new data is impeded. Hence, every bit of old data has amplified importance for today's data scientists. The classic empire of the Mayans came to an abrupt ending, about 900 AD. We do not understand the reason for the collapse of Mayan civilization, but untapped clues residing in the Mayan glyphs may reveal disturbing ancient trends that presage a future catastrophe.

Lesson 8. Data worth recording is data worth saving.

Landa destroyed the Mayan libraries in 1562. The few remaining literary works of the ancient Mayans can be translated, but the vast bulk of Mayan literature is a lost legacy. Any one of those disparaged books would be a priceless treasure today.

Book burnings are a time-honored tradition enjoyed the world over by religious zealots (2). Some of the greatest books in history have been burned to a crisp. The first recorded, but least successful, book burning in history occurred around 612 B.C. and involved the library of Ashurbanipal (668 - 627 B.C.), king of the neo-Assyrian empire. Among the texts contained in the library was the Gilgamesh epic, written in about 2500 B.C. Marauders set fire to the palace and the library, with limited effect. Many of the greatest works were written on cuneiform tablets. The fire baked the clay tablets, preserving them to the present day. The Library of Alexandria was the most famous library of the ancient world. As a repository of truth and knowledge, it was a popular target. At least four major assaults punctuated the library's incendiary past: Julius Caesar in the Alexandrian War (48 B.C.), Aurelian's Palmyrine campaign (273 A.D.), the decree of Theophilus (391 A.D.) and the Muslim conquest (642 A.D.). We do not know the number of books held in the Library, but when the Alexandria library was sacked, the books provided sufficient fuel to heat the Roman baths for six months. Book burning never goes out of style. As recently as 1993, during the siege of Sarajevo, the National Library was enthusiastically burned to the ground. Thousands of irreplaceable books were destroyed in the literary equivalent of genocide.

[1] Asimov I. Isaac Asimov Mulls "How Do People Get New Ideas?" MIT Technology Review October 20, 2014.

[2] Berman JJ. Machiavelli's Laboratory. Amazon Digital Services, Inc., 2010.

- Jules Berman (copyrighted material)

key words: mayans, maya, data science, data repurposing, data reanlaysis, cryptography, cryptology, data analysis, decoding, legacy data, old data, data archeology, jules j berman