Story image

Google reveals AI system taking charge of its data centres

20 Aug 18

The steady march of artificial intelligence (AI) rolls on with Google revealing its latest innovation supporting data centre cooling and industrial control.

In 2016 Google and DeepMind collaborated to develop an AI-powered recommendation system with the goal to improve the energy efficiency of Google’s data centres.

And now, Google has taken this same AI system and enhanced it to remove human-implemented recommendations and instead let it directly control data centre cooling itself – while under expert supervision of course.

So how does it work?

“Every five minutes, our cloud-based AI pulls a snapshot of the data centre cooling system from thousands of sensors and feeds it into our deep neural networks, which predict how different combinations of potential actions will affect future energy consumption,” Google’s Amanda Gasparik and DeepMind’s Chris Gamble and Jim Gao reported in a release.

“The AI system then identifies which actions will minimise the energy consumption while satisfying a robust set of safety constraints. Those actions are sent back to the data centre, where the actions are verified by the local control system and then implemented.”

The idea effectively emerged from a trial and error approach of the previous AI recommendation system, as while Google data centre operators praised the system for revealing new best practices (like spreading the cooling load across more equipment rather than less), actually putting the recommendations into practice required too much operator effort and supervision.

“We wanted to achieve energy savings with less operator overhead. Automating the system enabled us to implement more granular actions at greater frequency, while making fewer mistakes,” says Google data centre operator Dan Fuenffinger.

Hence Google implemented the new AI system to remove some of the manual implementation.

Google has thousands of servers and it is mission critical that they all run reliably and efficiently. In light of this, the company asserts it has tailored the AI agents from the ground up with safety and reliability the priority, using eight different mechanisms in an effort to guarantee reliable system behaviour.

For example, one simple step Google has put into place is to estimate uncertainty. There are billions of actions involved with the data centres and for every one of these the AI agent determines its confidence on whether it’s a good step – actions with low confidence are eliminated from consideration.

Another example is two-layer verification, whereby optimal actions computed by the AI are vetted against an internal list of safety constraints that are established by the data centre operators. Furthermore, the operators are always in control and can exit from AI control mode at any time.

While the AI system has the ability to determine the data centres actions, Google says it has purposefully limited the system’s optimisation boundaries in a bid to prioritise safety and reliability.

After being in operation for a matter of months the system has already proven itself with consistent energy savings of around 30 percent on average. Furthermore, Google expects this to improve over time as the system gains access to more data and the boundaries expanded as the technology matures.

"It was amazing to see the AI learn to take advantage of winter conditions and produce colder than normal water, which reduces the energy required for cooling within the data centre. Rules don’t get better over time, but AI does,” says Fuenffinger.

Google asserts that it is excited about the technology, and that data centres are just the beginning as it believes the AI system can be implemented in several other industrial settings.

Huawei unveils new cloud region in South Africa
The announcement makes it the world’s first cloud service provider that operates a local data centre to provide cloud services in Africa.
HPE extends cloud-based AI tool InfoSight to servers
HPE asserts it is a big deal as the system can drive down operating costs, plug disruptive performance gaps, and free up time to allow IT staff to innovate.
'Public cloud is not a panacea' - 91% of IT leaders want hybrid
Nutanix research suggests cloud interoperability and app mobility outrank cost and security for primary hybrid cloud benefits.
Altaro introduces WAN-optimised replication for VMs
"WAN-optimised replication allows businesses to continue working in the case of damage to on-premise servers."
DDN part of data mining mission on Mars
DataDirect Networks (DDN) today announced that it will be playing a role in one of NASA’s most critical missions.
Opinion: Data centre management can learn from the Navy
While a nuclear submarine may seem like a completely different beast from a data centre, the similarities in how they should be managed are striking and many.
14 milestones Workday has achieved in 2018
We look into the key achievements of business software vendor Workday this year
HPE building new supercomputer with €38m price tag
It will be installed at the High Performance Computing Center of the University of Stuttgart and will be the world's fastest for industrial production.