Google reveals AI system taking charge of its data centres

Mon, 20th Aug 2018

FYI, this story is more than a year old

The steady march of artificial intelligence (AI) rolls on with Google revealing its latest innovation supporting data center cooling and industrial control.

In 2016 Google and DeepMind collaborated to develop an AI-powered recommendation system with the goal to improve the energy efficiency of Google's data centers.

And now, Google has taken this same AI system and enhanced it to remove human-implemented recommendations and instead let it directly control data center cooling itself – while under expert supervision of course.

So how does it work?

“Every five minutes, our cloud-based AI pulls a snapshot of the data center cooling system from thousands of sensors and feeds it into our deep neural networks, which predict how different combinations of potential actions will affect future energy consumption,” Google's Amanda Gasparik and DeepMind's Chris Gamble and Jim Gao reported in a release.

“The AI system then identifies which actions will minimise the energy consumption while satisfying a robust set of safety constraints. Those actions are sent back to the data center, where the actions are verified by the local control system and then implemented.

The idea effectively emerged from a trial and error approach of the previous AI recommendation system, as while Google data center operators praised the system for revealing new best practices (like spreading the cooling load across more equipment rather than less), actually putting the recommendations into practice required too much operator effort and supervision.

“We wanted to achieve energy savings with less operator overhead. Automating the system enabled us to implement more granular actions at greater frequency, while making fewer mistakes,” says Google data center operator Dan Fuenffinger.

Hence Google implemented the new AI system to remove some of the manual implementation.

Google has thousands of servers and it is mission critical that they all run reliably and efficiently. In light of this, the company asserts it has tailored the AI agents from the ground up with safety and reliability the priority, using eight different mechanisms in an effort to guarantee reliable system behaviour.

For example, one simple step Google has put into place is to estimate uncertainty. There are billions of actions involved with the data centers and for every one of these the AI agent determines its confidence on whether it's a good step – actions with low confidence are eliminated from consideration.

Another example is two-layer verification, whereby optimal actions computed by the AI are vetted against an internal list of safety constraints that are established by the data center operators. Furthermore, the operators are always in control and can exit from AI control mode at any time.

While the AI system has the ability to determine the data centers actions, Google says it has purposefully limited the system's optimisation boundaries in a bid to prioritise safety and reliability.

After being in operation for a matter of months the system has already proven itself with consistent energy savings of around 30 percent on average. Furthermore, Google expects this to improve over time as the system gains access to more data and the boundaries expanded as the technology matures.

"It was amazing to see the AI learn to take advantage of winter conditions and produce colder than normal water, which reduces the energy required for cooling within the data center. Rules don't get better over time, but AI does,” says Fuenffinger.

Google asserts that it is excited about the technology, and that data centers are just the beginning as it believes the AI system can be implemented in several other industrial settings.