
When it comes to games like chess or Go, Artificial Intelligence (AI) programs have far surpassed the best players in the world. These “super-human“ AIs are unmatched competitors, but possibly harder than competing against humans is collaborating with them. Can same technology get along with people?
In a new study, the MIT Lincoln Laboratory researchers sought to find-out how well humans could play cooperative card game, Hanabi with an advanced AI model trained to excel at playing with teammates, it had never met before. In single blind experiments, participants played 2 series of the game. One with AI agent as their teammate and other with a rule-based agent, a bot manually programmed to play in a pre-defined way.
The results really surprised the researchers. Not only the scores no better with AI teammate than with rule-based agent, but humans consistently hated playing with their AI teammate. They found it to be unpredictable, unreliable & untrustworthy and felt negatively, also when the company scored well. A paper explaining this study has been accepted to 2021 Conference on Neural Information Processing Systems (NeurIPS).
“It really highlights the nuanced distinction between creating Artificial Intelligence which performs objectively well, and creating AI that’s subjectively trusted or preferred,” says Ross Allen, co-author of paper and a researcher in Artificial Intelligence Technology Group. “It may seem those things are so close that there is not really daylight between them, but this study showed that those are really 2 separate problems. We need to work on dis-entangling those.”
Humans hating their AI teammates could be a concern for researchers designing this technology to one day work with humans on real challenges, like defending from missiles or performing complex surgery. This dynamic called teaming intelligence, is a next frontier in Artificial Intelligence research and it uses a particular type of AI, called reinforcement learning.
A reinforcement learning AI isn’t told which actions to take, but rather discovers which actions yield the most numerical “reward” by trying-out scenarios again & again. It’s this technology that has yielded the super-human chess & Go players. Unlike rule-based algorithms, these Artificial Intelligence are not programmed to follow “if/ then” statements, because possible outcomes of the human tasks they are slated to tackle, like driving a car, are far too many to code.
“Reinforcement learning is much more general-purpose way of developing AI. If you can train it to learn, how to play chess, that agent would not necessarily go drive a car. But you can use the same algorithms to train a different-agent to drive a car, given right data,” Allen says. “The sky’s limit in what it could, in theory, do.”
Bad hints, bad plays
Today, researchers are using Hanabi, to test the performance of reinforcement-learning-models developed for collaboration, in much same way that chess served as a benchmark for testing competitive AI, for decades.
The game of Hanabi is akin to a multi-player form of Solitaire. Players work together to stack cards of same suit in order. However, players might not view their own cards, only cards that their teammates hold. Each player is strictly-limited in what they can communicate to their teammates to get them to pick best card from their own hand to stack next.
The Lincoln Laboratory researchers didn’t develop either AI or rule-based agents used in this experiment. Both the agents represent the best in their fields for Hanabi performance. In fact, when AI model was previously paired with an AI teammate, it had never played with before, team achieved the highest ever score for Hanabi play between 2 unknown AI agents.
“That was an important result,” Allen says. “We thought, if these AI have never met before can come together & play really well, then we should be able to bring humans that also know how to play really well together with AI and they will also do really well. That is why we thought AI team would objectively play better and also why we allowed that humans would prefer it because generally we will like something better if we do well.”
Neither of those expectations, came true. Objectively, there was no statistical difference in the scores between AI & rule-based agent. Subjectively, all 29 participants reported in surveys a clear preference toward rule-based teammate. The participants weren’t informed, which agent they were playing with for which games.
“One participants said that they were so stressed-out at the bad play from AI agent that they actually got a headache,” says Jaime Pena, a researcher in AI Technology & Systems Group and an author on paper. “Another-said that they thought rule-based agent was dumb but workable, whereas AI agent showed that it understood the rules, but that its moves weren’t cohesive with what a team looks-like. To them, it was giving bad-hints, making bad plays.”
Inhuman creativity
This perception of AI making “bad plays“ links to surprising behavior researchers observed previously in reinforcement learning work. For instance, in 2016, when DeepMind’s AlphaGo first defeated one among the world’s best Go players, one among the most widely praised moves made by AlphaGo was move 37 in game 2, a move so-unusual that human commentators thought it was a mistake. Later analysis revealed that the move was actually extremely well calculated and was described as “genius.”
Such moves would be praised when an AI opponent performs them, but they are less likely to be celebrated in a team setting. The Lincoln Laboratory researchers found that weird or seemingly illogical moves were the worst offenders in breaking human trust in their AI teammate in these closely coupled teams. Such moves not only diminished players perception of how well they & their AI teammate worked together, but also how much they wanted to work with AI at all, especially when any potential pay-off was not immediately obvious.
“There was a lot of commentary about giving-up, comments like” I hate working with this thing,”” adds Hosea Siu, also an author of paper and a researcher in Control & Autonomous Systems Engineering Group.
Participants who rated themselves as Hanabi experts, which majority of players in this study did, more often gave-up on AI player. Siu finds this concerning for AI developers, because main users of this technology will likely be domain experts.
“Let has say you train-up a super smart AI guidance assistant for a missile defense scenario. You are not handing it off to a trainee; you are handing it off to your experts on your ships who have been doing this for 25 years. So, if there’s a strong expert bias against it in gaming scenarios, it is likely going to show-up in real world ops,” he adds.
Squishy humans
The researchers note that AI used in this study was not developed for human preference. But that is part of the problem, not many are. Like most collaborative AI models, this model was designed to score as high as possible and its success has been benchmarked by its objective performance.
“Then we will not create AI that humans actually want to use,” Allen says, if researchers do not focus on the question of subjective human preference. “It is easier to work on AI that improves a very clean number. It is much harder to work on AI that works in this mushier world of human preferences.”
Solving this harder problem is the aim of the MeRLin (Mission-Ready Reinforcement Learning) project, which this experiment was funded under in Lincoln Laboratory’s Technology Office, in collaboration with U.S. Air Force Artificial Intelligence Accelerator and the MIT Department of Electrical Engineering & Computer Science. The project is studying what has prevented collaborative AI technology from leaping-out of the game space and into messier reality.
The researchers think that the ability for AI to explain its actions will engender trust. This will be the focus of their work for next year.
“You can imagine we re-run the experiment, but after the fact and this is much easier said than done, human could ask, ‘Why did you do that move, I did not understand it?’ If AI could provide some insight into what they thought was going to happen-based on their actions, then our hypothesis is that humans would say, ‘Oh, strange way of thinking about it, but I get it now,’ and they would trust it. Our results would fully change, yea though we did not change the underlying decision-making of AI,” Allen says.
Like a huddle after a game, this type of exchange is often what helps humans build comradery & cooperation as a team.
“Maybe it is also a staffing bias. Most AI teams do not have people who want to work on these squishy humans & their soft problems,” Siu adds, laughing. “It has people who want to do math & optimization. And that is the basis, but that is not enough.”
Mastering a game like Hanabi between AI & humans could open-up a universe of possibilities for teaming intelligence in the future. But until researchers can close the gap between how well an AI performs & how much a human likes it, technology may well remain at machine versus human.
The findings were published on Arxiv.