Restoring Ancient Texts with AI

This DeepMind project uses a neural network, natural language processing and machine learning to help historians piece together fragments of ancient Greek decrees from 420 BC.

By Julian Smith

By Julian Smith May 26, 2022

Whether it’s written on scrolls or papyri or inscribed into pottery or stone, the written accounts of ancient cultures are one of the best ways to understand how people once thought about themselves and their world.

But many of these records have been so damaged by time that they’re close to illegible. On top of that, they have often been moved far from where they were created — and it can be difficult to figure out when they were made since the artifacts are often too delicate for techniques like radiocarbon dating.

Traditionally the task of deciphering ancient texts has fallen to specialists called epigraphists, who draw on past experience and comparable examples to fill in the blanks. A new collaboration between Alphabet’s DeepMind and classical scholars promises to make this task faster and more accurate, opening wider windows into the past.

The team created a deep neural network called Ithaca, named after the Greek island that the hero Odysseus struggled to return to in Homer’s epic Odyssey. The tool was trained on a digitized dataset of 78,608 inscriptions in ancient Greek, dating between the seventh century BC and the fifth century AD.

Ancient Greek is a highly inflected language — meaning word forms can change depending on how they’re used in sentences — and have many varieties of dialects. 

“It was this linguistic complexity that made us interested as it poses an excellent case study for Natural Language Processing and Machine Learning methods,” said DeepMind’s Yannis Assael, who along with Thea Sommerschield authored a paper published in Nature in March 2022. 

The first model of its kind, Ithaca was trained to restore fragmented texts and tease out when and where they were created, all at the same time. It uses pattern recognition to predict missing words, processing the text as characters and words simultaneously. Every small prediction limits the options for subsequent ones, like a puzzle-solver eliminating letters one by one in Wordle — just with many possible answers.

The branches of the decision tree result in multiple solutions that the model rates by confidence level. It also creates a ranked list of 84 possible regions and distribution of 10-year date intervals between 800 BC and AD 800. All of this happens in seconds, compared to the hours that human experts need.

Related

Tech Watching Over Egypt’s Valley of the Kings

“It is observing patterns and learning those patterns at a greater scale and a greater speed than a human could do, and therefore achieving more than a human could,” says Jonathan Prag of Merton College, who collaborated on the project.

In tests, Ithaca restored the fragmented Greek texts with 62% accuracy. When historians incorporated the results into their predictions, it increased their accuracy from 25% to 72%. Ithaca scored 71% on location predictions and dated texts to within 30 years accuracy, compared to human experts’ average of 144 years.

Ithaca has already been put to practical use helping settle a dispute over a group of ancient Athenian decrees. Originally the decrees were thought to have been created before 446 BC, based on specific letterforms that changed around that date. But the dates of many of the decrees seemed to conflict with the accounts of the Athenian historian Thucydides, leading some researchers to propose the decrees had been created around 420 BC.

Sure enough, Ithaca predicted a date around 421 BC. 

“Although it might seem like a small difference, this date shift has significant implications for our understanding of the political history of Classical Athens,” said DeepMind’s Sommerschield.

“We believe this is just the start for the development of tools for exploring the potential for collaboration between machine learning and the humanities,” Assael said. 

The team is working on training versions of Ithaca on other ancient writing systems, including Hebrew and Mayan. They have made the code open source and created a free interactive version online.

“We’re really excited to see the new directions Ithaca will take,” said Assael. “Ancient Greece plays an instrumental role in our understanding of the Mediterranean world, but it’s still only one part of a vast global picture of civilizations.”

Editor’s note: Learn more about DeepMind’s PYTHIA, the first ancient text restoration model that recovers missing characters from a damaged text input using deep neural networks. 

Julian Smith is a contributing writer. He is the executive editor Atellan Media and author of Aloha Rodeo and Smokejumper published by HarperCollins. He writes about green tech, sustainability, adventure, culture and history. 

© 2022 Nutanix, Inc. All rights reserved. For additional legal information, please go here.

Related Articles