Today concludes the five ‘Go’ matches played by AlphaGo, an AI system built by DeepMind and South Korean champion, Lee Sedol. AlphaGo managed to win the series of games 4-1.
‘Go’ is a strategy-led board game in which two players aim to gather and surround the most territory on the board. The game is said to require a certain level of intuition and be considerably more complex than Chess. The first three games were won by AlphaGo with Sedol winning the fourth round, but still unable to claim back a victory.
What is DeepMind?
Google DeepMind is an artificial intelligence division within Google that was created after Google bought University College London spinout, DeepMind, for a reported £400 million in January 2014.
The division, which employs around 140 researchers at its lab in a new building at Kings Cross, London, is on a mission to solve general intelligence and make machines capable of learning things for themselves. It plans to do this by creating a set of powerful general-purpose learning algorithms that can be combined to make an AI system or “agent”.
These are systems that learn automatically. They’re not pre-programmed, they’re not handcrafted features. We try to provide a large set of raw information to our algorithms as possible so that the systems themselves can learn the very best representations in order to use those for action or classification or predictions.
The systems we design are inherently general. This means that the very same system should be able to operate across a wide range of tasks.
That’s why we’ve started as we have with the Atari games. We could have done lots of really interesting problems in narrow domains had we spent time specifically hacking our tools to fit the real world problems – that could have been very, very valuable.
Instead we’ve taken the principal approach of starting on tools that are inherently general.
AI has largely been about pre-programming tools for specific tasks: in these kinds of systems, the intelligence of the system lies mostly in the smart human who programmed all of the intelligence into the smart system and subsequently these are of course rigid and brittle and don’t really handle novelty very well or adapt to new settings and are fundamentally very limited as a result.
We characterise AGI [artificial general intelligence] as systems and tools which are flexible and adaptive and that learn.
We use the reinforcement learning architecture which is largely a design approach to characterise the way we develop our systems. This begins with an agent which has a goal or policy that governs the way it interacts with some environment. This environment could be a small physics domain, it could be a trading environment, it could be a real world robotics environment or it could be a Atari environment. The agent says it wants to take actions in this environment and it gets feedback from the environment in the form of observations and it uses these observations to update its policy of behaviour or its model of the world.
How does it work?
The technology behind DeepMind is complex to say the least but that didn’t stop Suleyman from trying to convey some of the fundamental deep learning principles that underpin it. The audience – a mixture of software engineers, AI specialists, startups, investors and media – seemed to follow.
You’ve probably heard quite a bit about deep learning. I’m going to give you a very quick high-level overview because this is really important to get intuition for how these systems work and what they basically do.
These are hierarchical based networks initially conceived back in the 80s but recently resuscitated by a bunch of really smart guys from Toronto and New York.
The basic intuition is that at one end we take the raw pixel data or the raw sensory stream data of things we would like to classify or recognise.
This seems to be a very effective way of learning to find structure in very large data sets. Right at the very output we’re able to impose on the network some requirement to produce some set of labels or classifications that we recognise and find useful as humans.
How is DeepMind being tested?
DeepMind found a suitably quirky way to test what its team of roughly 140 people have been busy building.
The intelligence of the DeepMind’s systems was put through its paces by an arcade gaming platform that dates back to the 1970s.
Suleyman demoed DeepMind playing one of them during his talk – space invaders. In his demo he illustrated how a DeepMind agent learns how to play the game with each go it takes.
We use the Atari test bed to develop and test and train all of our systems…or at least we have done so far.
There is somewhere on the magnitude of 100 different Atari games from the 70s and 80s.
The agents only get the raw pixel inputs and the score so this is something like 30,000 inputs per frame. They’re wired up to the action buttons but they’re not really told what the action buttons do so the agent has to discover what these new tools of the real world actually mean and how they can utilise value for the agent.
The goal that we give them is very simply to maximise score; it gets a 1 or a 0 when the score comes in, just as a human would.
Everything is learned completely from scratch – there’s absolutely zero pre-programmed knowledge so we don’t tell the agent these are Space Invaders or this is how you shoot. It’s really learnt from the raw pixel inputs.
For every set of inputs the agent is trying to assess which action is optimal given that set of inputs and it’s doing that repeatedly over time in order to optimise some longer term goal, which in Atari’s sense, is to optimise score. This is one agent with one set of parameters that plays all of the different games.
Live space invaders demo
An agent playing space invaders before training struggles to hide behind the orange obstacles, it’s firing fairly randomly. It seems to get killed all of the time and it doesn’t really know what to do in the environment.
After training, the agent learns to control the robot and barely loses any bullets. It aims for the space invaders that are right at the top because it finds those the most rewarding. It barely gets hit; it hides behind the obstacles; it can make really good predictive shots like the one on the mothership that came in at the top there.
As those of you know who have played this game, it sort of speeds up towards the end and so the agent has to do a little bit more planning and predicting than it had done previously so as you can see there’s a really good predictive shot right at the end there.
100 games vs 500 games
The agent doesn’t really know what the paddle does after 100 games, it sort of randomly moves it from one side to the other. Occasionally it accidentally hits the ball back and finds that to be a rewarding action. It learns that it should repeat that action in order to get reward.
After about 300 games it’s pretty good and it basically doesn’t really miss.
But then after about 500 games, really quite unexpectedly to our coders, the agent learns that the optimal strategy is to tunnel up the sides and then send them all around the back to get maximum score with minimum effort – this was obviously very impressive to us.
We’ve now achieved human performance in 49/57 games that we’ve tested on and this work was recently rewarded with a front cover of Nature for our paper that we submitted so we were very proud of that.
How is it being used across Google?
Google didn’t buy DeepMind for nothing. Indeed, it’s using certain DeepMind algorithms to make many of its best-known products and services smarter than they were previously.
Our deep learning tool has now been deployed in many environments, particularly across Google in many of our production systems.
In image recognition, it was famously used in 2012 to achieve very accurate recognition on around a million images with about 16 percent error rate. Very shortly after that it was reduced dramatically to about 6 percent and today we’re at about 5.5 percent. This is very much parable with the human level of ability and it’s now deployed in Google+ Image Search and elsewhere in Image Search across the company.
As you can see on Google Image Search on G+, you’re now able to type a word into the search box and it will recall images from your photographs that you’ve never actually hand labelled yourself.
We’ve also used it for text and scription. We use it to identify text on shopfronts and maybe alert people to a discount that’s available in a particular shop or what the menu says in a given restaurant. We do that with an extremely high level of accuracy today. It’s being used in Local Search and elsewhere across the company.
We also use the same core system across Google for speech recognition. It trains roughly in less than five days. In 2012 it delivered a 30 percent reduction in error rate against the existing old school system. This was the biggest single improvement in speech recognition in 20 years, again using the same very general deep learning system across all of these.
Across Google we use what we call Tool AI or Deep Learning Networks for fraud detection, spam detection, hand writing recognition, image search, speech recognition, Street View detection, translation.
Sixty handcrafted rule-based systems have now been replaced with deep learning based networks. This gives you a sense of the kind of generality, flexibility and adaptiveness of the kind of advances that have been made across the field and why Google was interested in DeepMind.
GoogleDeepmind, AlphaGo and Go: the discussion
Techworld editors discuss the implications of AlphaGo beating all the humans.
The number of scientists and world-famous entrepreneurs speaking out on the potential dangers of AI is increasing week-by-week, with renowned physicist Stephen Hawking and PayPal billionaire Elon Musk being two of the most outspoken anti-AI advocates.
The pair, along with several others including Bill Gates and Sky cofounder Jaan Tallinn, believe that machines will soon become more intelligent than humans, just as they do in recent Hollywood blockbuster Ex Machina.
Despite this, Google is keen to develop its AI algorithms as much as possible in order to improve its offerings and boost its profits.
Suleyman tried to put people’s minds at ease and explain the logic behind all the hype.
Over the last 18 months or so, AI breakthroughs have, I think, created a sense of anxiety or in some cases hype around the potential long term direction of the field.
This of course is not least induced by Elon [Musk] who recently Tweeted that we need to be super careful with AI because it’s “potentially more dangerous than nukes” and that’s obviously backed up by various publications including Nick Bostrom’s – all culminating in this kind of sense that AI has the potential to end all human kind.
If you didn’t really pay attention to the field and all you did was read, as I think the vast majority of people do, descriptions of the kind of work that we do on the web then you could be forgiven for believing that AI is actually about this. Whether it’s Terminator coming to blow us up or societies of AIs or mad scientists looking to create quite perverted women robots.
This narrative has somehow managed to dominate the entire landscape, which I think we find really quite remarkable.
It’s true that AI has in some sense really arrived. This isn’t just a summer. These are very concrete production breakthroughs that really do make a big different but it’s also sad how quickly we adapt to this new reality. We rarely take time to acknowledge the magic and the potential of these advances and the kind of good that they can bring. In some sense, the narrative has shifted from isn’t it terrible that AI has been such a failure to isn’t it terrible that AI has been such a success.
Just to address directly this question of existential risk. Our perspective on this is that it’s become a real distraction from the core ethics and safety issues and that it’s completely overshadowed the debate.
The way we think about AI is that it’ll be a hugely powerful tool that we control and direct whose capabilities we limit, just as we do with any other tool that we have in the world around us, whether they’re washing machines or tractors.