Features of Evil AI: Unpredictability

Road_Rage.jpg

In 1960, MIT mathematician Norbert Wiener wrote an article in the journal Science titled “Some Moral and Technical Consequences of Automation”. He prefaced the article with the following: “As machines learn they may develop unforeseen strategies at rates that baffle their programmers.” Since Wiener wrote this sentence, we have seen numerous examples of intelligent machines exhibiting surprising or unpredictable behavior. One recent example is AlphaGo, which in 2017 became the first AI program to ever defeat the world champion in the game of Go. Other players described the algorithm’s style as “alien” and “from an alternate dimension.”

Another recent example was demonstrated by the company OpenAI’s attempt to build AIs that play ‘hide and seek’ in a 3-dimensional virtual environment. The AIs used a technique called reinforcement learning in order to adapt their strategies to each other. And this led them to discover strategies that surprised the researchers. The hiders learned to build a fort to prevent the seekers from ever spotting them. The seekers learned that they could jump onto a box and ‘surf’ it over into the fort. The hiders then learned to lock all the boxes into the fort early on, to undermine the availability of such surfboards. Not even the programmers imagined these possibilities ahead of time.

This type of unexpected ‘emergent’ behavior is common when AI systems can adapt to a constantly changing environment. There are so many possible scenarios that the programmers simply cannot foresee or enumerate. Even if we conduct extensive testing inside virtual environments, many AIs will surprise us once they get in contact with the real world, with its own unpredictability.

One might ask, though, whether any of this is new. After all, we are not always able to predict the behavior of other people. Nor are we able to predict the behavior of animals, like dogs, cats or horses. There is a crucial difference, however. We have had thousands, perhaps millions of years, to build mental models of how humans and other animals behave. This makes their unpredictability somewhat bounded: a human may be a drunk stumbling across the street, an insecure bully harassing others, or a violent criminal committing murder. A dog may be aggressive, depressed, or in pain. But we are yet to build a mental picture, and a vocabulary, to describe the ways in which machines can misbehave. We still lack a diagnostic manual of mental disorders for computers.

References

  • Wiener, N. Some Moral and Technical Consequences of Automation. Science 131, 1355–1358 (1960).

  • Chan, D. The AI that has nothing to learn from humans. Atlantic 20, 354–359 (2017).

  • Schulz, E. & Dayan, P. Computational Psychiatry for Computers. iScience 23, 101772 (2020).

Previous
Previous

Features of Evil AI: Imperviousness to punishment

Next
Next

Features of Evil AI: Autonomy