I wonder what will happen if I press this button? Algorithms armed with a sense of curiosity are teaching themselves to discover and solve problems they’ve never encountered before.
Faced with level one of Super Mario Bros, a curiosity-driven AI learned how to explore, avoid pits, and dodge and kill enemies. This might not sound impressive – algorithms have been thrashing humans at video games for a few years now – but this AI’s skills all were all learned thanks to an inbuilt desire to discover more about the game world.
Conventional AI algorithms are taught through positive reinforcement. They are rewarded for achieving some kind of external goal, like upping the score in a video game by one point. This encourages them to perform actions that increase their score – such as stomping on enemies in the case of Mario – and discourages them from performing actions that don’t increase the score, like falling into a pit.
This type of approach, called reinforcement learning, was used to create AlphaGo, the Go-playing computer from Google DeepMind that beat Korean master Lee Sedol by four games to one last year. Over thousands of real and simulated games, the AlphaGo algorithm learned to pursue strategies that led to the ultimate reward: a win.
But the real world isn’t full of rewards, says Deepak Pathak, who led the study at the University of California, Berkeley. “Instead, humans have an innate curiosity which helps them learn,” he says, which may be why we are so good at mastering a wide range of skills without necessarily setting out to learn them.
So Pathak set out to give his own reinforcement learning algorithm a sense of curiosity to see whether that would be enough to let it learn a range of skills. Pathak’s algorithm experienced a reward when it increased its understanding of its environment, particularly the parts that directly affected it. So, rather than looking for a reward in the game world, the algorithm was rewarded for exploring and mastering skills that led to it discovering more about the world.
This type of approach can speed up learning times and improve the efficiency of algorithms, says Max Jaderberg at Google’s AI company DeepMind. The company used a similar technique last year to teach an AI to explore a virtual maze. Its algorithm learned much more quickly than conventional reinforcement learning approaches. “Our agent is far quicker and requires a lot less experience from the world to train, making it much more data efficient,” he says.
Fast learner
Imbibed with a sense of curiosity, Pathak’s own AI learnt to stomp on enemies and jump over pits in Mario and also learned to explore faraway rooms and walk down hallways in another game similar to Doom. It was also able to apply its newly acquired skills to further levels of Mario despite never having seen them before.
But curiosity could only take the algorithm so far in Mario. On average, it explored only 30 per cent of level one as it couldn’t find a way past a series of pits that could only be overcome through a sequence of more than 15 button presses. Rather than jump to its death, the AI learned to turn back on itself and stop when it reached that point.
The AI may have been flummoxed because it had no idea that there was more of the level to explore beyond the pit, says Pathak. It didn’t learn to consistently take useful shortcuts in the game either, since they led it to discovering less of the level so didn’t satiate its urge for exploration.
Parker is now working on seeing whether robotic arms can learn through curiosity to grasp new objects. “Instead of it acting randomly, you could use this to help it move meaningfully,” he says. He also plans to see whether a similar algorithm could be used in household robots similar to the Roomba vacuum cleaner.
But Jaderberg isn’t so sure that this kind of algorithm is ready to be put to use just yet. “It’s too early to talk about real-world applications,” he says.
Journal reference: https://arxiv.org/abs/1705.05363
No comments:
Post a Comment