Machine learning techniques are providing new tools that could help archaeologists understand the past — particularly when it comes to deciphering ancient texts. The latest example is an AI model created by Alphabet-subsidiary DeepMind that helps not only restore text that is missing from ancient Greek inscriptions but offers suggestions for when the text was written (within a 30-year period) and its possible geographic origins.
“Inscriptions are really important because they are direct sources of evidence … written directly by ancient people themselves,” Thea Sommerschield, a historian and machine learning expert who helped created the model, told journalists in a press briefing.
Due to their age, these texts are often damaged, making restoration a rewarding challenge. And because they are often inscribed on inorganic material like stone or metal, it means methods like radiocarbon dating can’t be used to find out when they were written. “To solve these tasks, epigraphers look for textual and contextual parallels in similar inscriptions,” said Sommerschield, who was co-lead on the work alongside DeepMind staff research scientist Yannis Assael. “However, it’s really difficult for a human to harness all existing, relevant data and to discover underlying patterns.”
That’s where machine learning can help.
The new software, named Ithaca, is trained on a dataset of some 78,608 ancient Greek inscriptions, each of which is labeled with metadata describing where and when it was written (to the best of historians’ knowledge). Like all machine learning systems, Ithaca looks for patterns in this information, encoding this information in complex mathematical models, and uses these inferences to suggest text, date, and origins.
In a paper published in Naturethat describes Ithaca, the scientists who created the model say it is 62 percent accurate when restoring letters in damaged texts. It can attribute an inscription’s geographic origins to one of 84 regions of the ancient world with 71 percent accuracy and can date a text to within, on average, 30 years of its known year of writing.
These are promising statistics, but it’s important to remember that Ithaca is not capable of operating independently of human expertise. Its suggestions are ultimately based on data collected by traditional archaeological methods, and its creators are positioning it as simply another tool in a wider set of forensic methods, rather than a fully-automated AI historian. “Ithaca was designed as a complementary tool to aid historians,” said Sommerschield.
Eleanor Dickey, a professor of classics from the University of Reading who specializes in ancient Greek and Latin sociolinguists, told The Verge that Ithaca was an “exciting development that may improve our knowledge of the ancient world.” But, she added that a 62 percent accuracy for restoring lost text was not reassuringly high — “when people rely on it they will need to keep in mind that it is wrong about one third of the time” — and that she was not sure how the software would fit into existing academic methodologies.
For example, DeepMind highlighted tests that showed the model helped improve the accuracy of historians restoring missing text in ancient inscriptions from 25 percent to 72 percent. But Dickey notes that those being tested were students, not professional epigraphers. She says that AI models may be broadly accessible, but that doesn’t mean they can or should replace the small cadre of specialized academics who decipher texts.
“It is not yet clear to what extent use of this tool by genuinely qualified editors would result in an improvement in the editions generally available — but it will be interesting to find out,” said Dickey. She added that she was looking for to trying the Ithaca model out for herself. The software, along with its open-source code, is available online for anyone to test.
Ithaca and its predecessor (named Pythia and released in 2019) have already been used to help recent archaeological debates — including helping date inscriptions discovered in the Acropolis of Athens. However, the true potential of the software has yet to be seen.
Sommerschield stresses that the real value of Ithaca may be in its flexibility. Although it was trained on ancient Greek inscriptions, it could be easily configured to work with other ancient scripts. “Ithaca’s architecture makes it really applicable to any ancient language, not just Latin, but Mayan, cuneiform; really any written medium — papyri, manuscripts,” she said. “There’s a lot of opportunities.”
One of the key challenges of deep reinforcement learning models — the kind of AI systems that have mastered Go, StarCraft 2, and other games — is their inability to generalize their capabilities beyond their training domain. This limit makes it very hard to apply these systems to real-world settings, where situations are much more complicated and unpredictable than the environments where AI models are trained.
But scientists at AI research lab DeepMind claim to have taken the “first steps to train an agent capable of playing many different games without needing human interaction data,” according to a blog post about their new “open-ended learning” initiative. Their new project includes a 3D environment with realistic dynamics and deep reinforcement learning agents that can learn to solve a wide range of challenges.
The new system, according to DeepMind’s AI researchers, is an “important step toward creating more general agents with the flexibility to adapt rapidly within constantly changing environments.”
The paper’s findings show some impressive advances in applying reinforcement learning to complicated problems. But they are also a reminder of how far current systems are from achieving the kind of general intelligence capabilities that the AI community has been coveting for decades.
The brittleness of deep reinforcement learning
The key advantage of reinforcement learning is its ability to develop behavior by taking actions and getting feedback, similar to the way humans and animals learn by interacting with their environment. Some scientists describe reinforcement learning as “the first computational theory of intelligence.”
The combination of reinforcement learning and deep neural networks, known as deep reinforcement learning, has been at the heart of many advances in AI, including DeepMind’s famous AlphaGo and AlphaStar models. In both cases, the AI systems were able to outmatch human world champions at their respective games.
But reinforcement learning systems are also notoriously renowned for their lack of flexibility. For example, a reinforcement learning model that can play StarCraft 2 at an expert level won’t be able to play a game with similar mechanics (e.g., Warcraft 3) at any level of competency. Even slight changes to the original game will considerably degrade the AI model’s performance.
“These agents are often constrained to play only the games they were trained for — whilst the exact instantiation of the game may vary (e.g. the layout, initial conditions, opponents) the goals the agents must satisfy remain the same between training and testing. Deviation from this can lead to catastrophic failure of the agent,” DeepMind’s researchers write in a paper that provides the full details on their open-ended learning.
Humans, on the other hand, are very good at transferring knowledge across domains.
The XLand environment
The goal of DeepMind’s new project was to create “an artificial agent whose behaviour generalises beyond the set of games it was trained on.”
To this end, the team created XLand, an engine that can generate 3D environments composed of static topology and moveable objects. The game engine simulates rigid-body physics and allows players to use the objects in various ways (e.g., create ramps, block paths, etc.).
XLand is a rich environment in which you can train agents on a virtually unlimited number of tasks. One of the main advantages of XLand is the capability to use programmatic rules to automatically generate a vast array of environments and challenges to train AI agents. This addresses one of the key challenges of machine learning systems, which often require vast amounts of manually curated training data.
According to the blog post, the researchers created “billions of tasks in XLand, across varied games, worlds, and players.” The games include very simple goals such as finding objects to more complex settings in which the AI agents much weigh the benefits and tradeoffs of different rewards. Some of the games include cooperation or competition elements involving multiple agents.
Deep reinforcement learning
DeepMind uses deep reinforcement learning and a few clever tricks to create AI agents that can thrive in the XLand environment.
The reinforcement learning model of each agent receives a first-person view of the world, the agent’s physical state (e.g., whether it holding an object), and its current goal. Each agent finetunes the parameters of its policy neural network to maximize its rewards on the current task. The neural network architecture contains an attention mechanism to ensure the agent can balance optimization for the subgoals required to accomplish the main goal.
Once the agent masters its current challenge, the computational task generator creates a new challenge for the agent. Each new task is generated according to the agent’s training history and in a way to help distribute the agent’s skills across a vast range of challenges.
DeepMind also used its vast computational resources (courtesy of its owner Alphabet Inc.) to train a large population of agents in parallel and transfer learned parameters across different agents to improve the general capabilities of the reinforcement learning systems.
The performance of the reinforcement learning agents was evaluated based on their general ability to accomplish a wide range of tasks they had not been trained on. Some of the test tasks include well-known challenges such as “capture the flag” and “hide and seek.”
According to DeepMind, each agent played around 700,000 unique games in 4,000 unique worlds within XLand and went through 200 billion training steps across 3.4 million unique tasks (in the paper, the researchers write that 100 million steps are equivalent to approximately 30 minutes of training).
“At this time, our agents have been able to participate in every procedurally generated evaluation task except for a handful that were impossible even for a human,” the AI researchers wrote. “And the results we’re seeing clearly exhibit general, zero-shot behaviour across the task space.”
Zero-shot machine learning models can solve problems that were not present in their training dataset. In a complicated space such as XLand, zero-shot learning might imply that the agents have obtained fundamental knowledge about their environment as opposed to memorizing sequences of image frames in specific tasks and environments.
The reinforcement learning agents further manifested signs of generalized learning when the researchers tried to adjust them for new tasks. According to their findings, 30 minutes of fine-tuning on new tasks was enough to create an impressive improvement in a reinforcement learning agent trained with the new method. In contrast, an agent trained from scratch for the same amount of time would have near-zero performance on most tasks.
According to DeepMind, the reinforcement learning agents exhibit the emergence of “heuristic behavior” such as tool use, teamwork, and multi-step planning. If proven, this can be an important milestone. Deep learning systems are often criticized for learning statistical correlations instead of causal relations. If neural networks could develop high-level notions such as using objects to create ramps or cause occlusions, it could have a great impact on fields such as robotics and self-driving cars, where deep learning is currently struggling.
But those are big ifs, and DeepMind’s researchers are cautious about jumping to conclusions on their findings. “Given the nature of the environment, it is difficult to pinpoint intentionality — the behaviours we see often appear to be accidental, but still we see them occur consistently,” they wrote in their blog post.
But they are confident that their reinforcement learning agents “are aware of the basics of their bodies and the passage of time and that they understand the high-level structure of the games they encounter.”
Some of DeepMind’s top scientists published a paper recently in which they hypothesize that a single reward and reinforcement learning are enough to eventually reach artificial general intelligence (AGI). An intelligent agent with the right incentives can develop all kinds of capabilities such as perception and natural language understanding, the scientists believe.
Although DeepMind’s new approach still requires the training of reinforcement learning agents on multiple engineered rewards, it is in line with their general perspective of achieving AGI through reinforcement learning.
“What DeepMind shows with this paper is that a single RL agent can develop the intelligence to reach many goals, rather than just one,” Chris Nicholson, CEO of Pathmind, told TechTalks. “And the skills it learns in accomplishing one thing can generalize to other goals. That is very similar to how human intelligence is applied. For example, we learn to grab and manipulate objects, and that is the foundation of accomplishing goals that range from pounding a hammer to making your bed.”
Nicholson also believes that other aspects of the paper’s findings hint at progress toward general intelligence. “Parents will recognize that open-ended exploration is precisely how their toddlers learn to move through the world. They take something out of a cupboard, and put it back in. They invent their own small goals—which may seem meaningless to adults — and they master them,” he said. “DeepMind is programmatically setting goals for its agents within this world, and those agents are learning how to master them one by one.”
The reinforcement learning agents have also shown signs of developing embodied intelligence in their own virtual world, Nicholson said, like the kind humans have. “This is one more indication that the rich and malleable environment that people learn to move through and manipulate is conducive to the emergence of general intelligence, and that the biological and physical analogies of intelligence can guide further work in AI,” he said.
Sathyanaraya Raghavachary, Associate Professor of Computer Science at the University of Southern California, is a bit more skeptical on the claims made in DeepMind’s paper, especially the conclusions on proprioception, awareness of time, and high-level understanding of goals and environments.
“Even we humans are not fully aware of our bodies, let alone those VR agents,” Raghavachary said in comments to TechTalks, adding that perception of the body requires an integrated brain that is co-designed for suitable body awareness and situatedness in space. “Same with the passage of time — that too would require a brain that has memory of the past, and a sense for time in relation to that past. What they (paper authors) might mean relates to the agents’ tracking progressive changes in the environment resulting from their actions (eg. as a resulting of moving a purple pyramid), state changes which the underlying physics simulator would generate.
Raghavachary also points out, if the agents could understand the high-level structure of their tasks, they would not need 200 billion steps of simulated training to reach optimal results.
“The underlying architecture lacks what it takes, to achieve these three things (body awareness, time passage, understanding high-level task structure) they point out in conclusion,” he said. “Overall, XLand is simply ‘more of the same.’”
The gap between simulation and the real world
In a nutshell, the paper proves that if you can create a complex enough environment, design the right reinforcement learning architecture, and expose your models to enough experience (and have a lot of money to spend on compute resources), you’ll be able to generalize to various kinds of tasks in the same environment. And this is basically how natural evolution has delivered human and animal intelligence.
In fact, DeepMind has already done something similar with AlphaZero, a reinforcement learning model that managed to master multiple two-player turn-based games. The XLand experiment has extended the same notion to a much greater level by adding the zero-shot learning element.
But while I think that the experience from the XLand-trained agents will ultimately be transferable to real-world applications such as robotics and self-driving cars, I don’t think it will be a breakthrough. You’ll still need to make compromises (such as creating artificial limits to reduce the complexity of the real world) or create artificial enhancements (such as imbuing the machine learning models with prior knowledge or extra sensors).
DeepMind’s reinforcement learning agents might have become the masters of the virtual XLand. But their simulated world doesn’t even have a fraction of the intricacies of the real world. That gap will continue to remain a challenge for a long time.
Ben Dickson is a software engineer and the founder of TechTalks. He writes about technology, business, and politics.
All the sessions from Transform 2021 are available on-demand now. Watch now.
DeepMind today detailed its latest efforts to create AI systems capable of completing a range of different, unique tasks. By designing a virtual environment called XLand, the Alphabet-backed lab says that it managed to train systems with the ability to succeed at problems and games including hide and seek, capture the flag, and finding objects, some of which they didn’t encounter during training.
The AI technique known as reinforcement learning has shown remarkable potential, enabling systems to learn to play games like chess, shogi, Go, and StarCraft II through a repetitive process of trial and error. But a lack of training data has been one of the major factors limiting reinforcement learning–trained systems’ behavior being general enough to apply across diverse games. Without being able to train systems on a vast enough set of tasks, systems trained with reinforcement learning have been unable to adapt their learned behaviors to new tasks.
DeepMind designed XLand to address this, which includes multiplayer games within consistent, “human-relatable” digital worlds. The simulated space allows for procedurally generated tasks, enabling systems to train on — and generate experience from — tasks that are created programmatically.
XLand offers billions of tasks across varied worlds and players. AI controls players in an environment meant to simulate the physical world, training on a number of cooperative and competitive games. Each player’s objective is to maximize rewards, and each game defines the individual rewards for the players.
“These complex, non-linear interactions create an ideal source of data to train on, since sometimes even small changes in the components of the environment can result in large changes in the challenges for the [systems],” DeepMind explains in a blog post.
XLand trains systems by dynamically generating tasks in response to the systems’ behavior. The systems’ task-generating functions evolve to match their relative performance and robustness, and the generations of systems bootstrap from each other — introducing ever-better players into the multiplayer environment.
DeepMind says that after training systems for five generations — 700,000 unique games in 4,000 worlds within XLand, with each system experiencing 200 billion training steps — they saw consistent improvements in both learning and performance. DeepMind found that the systems exhibited general behaviors such as experimentation, like changing the state of the world until they achieved a rewarding state. Moreover, they observed that the systems were aware of the basics of their bodies, the passage of time, and the high-level structure of the games they encountered.
With just 30 minutes of focused training on a newly presented, complex task, the systems could quickly adapt, whereas agents trained with reinforcement learning from scratch couldn’t learn the tasks at all. “DeepMind’s mission of solving intelligence to advance science and humanity led us to explore how we could overcome this limitation to create AI [systems] with more general and adaptive behaviour,” DeepMind said. “Instead of learning one game at a time, these [systems] would be able to react to completely new conditions and play a whole universe of games and tasks, including ones never seen before.”
VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.
Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:
up-to-date information on the subjects of interest to you
gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More