Story image

Google’s scalable supercomputers now publicly available

08 May 2019

In what it says is a bid to accelerate the largest-scale machine learning (ML) applications deployed today, Google has opened up its supercomputers.

The global tech giant has created silicon chips called Tensor Processing Units (TPUs), which when assembled into multi-rack ML supercomputers called Cloud TPU Pods can complete ML workloads in minutes or hours that previously took days or weeks on other systems.

Now, Google Cloud TPU v2 Pods and Cloud TPU v3 Pods are publicly available in beta to help ML researchers, engineers, and data scientists iterate faster and train more capable machine learning models.

“Google Cloud is committed to providing a full spectrum of ML accelerators, including both Cloud GPUs and Cloud TPUs. Cloud TPUs offer highly competitive performance and cost, often training cutting-edge deep learning models faster while delivering significant savings,” says Google Brain Team Cloud TPUs senior product manager Zak Stone.

The benefits for ML teams building complex models and training on large data sets, Stone says, include shorter time to insight, higher accuracy, frequent model updates, and rapid prototyping.

“While some custom silicon chips can only perform a single function, TPUs are fully programmable, which means that Cloud TPU Pods can accelerate a wide range of state-of-the-art ML workloads, including many of the most popular deep learning models,” says Stone.

“Cloud TPU customers see significant speed-ups in workloads spanning visual product search, financial modeling, energy production, and other areas. In a recent case study, Recursion Pharmaceuticals iteratively tests the viability of synthesized molecules to treat rare illnesses. What took over 24 hours to train on their on-prem cluster completed in only 15 minutes on a Cloud TPU Pod.”

According to Stone, a single Cloud TPU Pod can contain more than 1,000 individual TPU chips which are connected by an ultra-fast, two-dimensional toroidal mesh network. The TPU software stack then uses this mesh network to enable many racks of machines to be programmed as a single, giant ML supercomputer via a variety of flexible, high-level APIs.

“The latest-generation Cloud TPU v3 Pods are liquid-cooled for maximum performance, and each one delivers more than 100 petaFLOPs of computing power. In terms of raw mathematical operations per second, a Cloud TPU v3 Pod is comparable with a top 5 supercomputer worldwide (though it operates at lower numerical precision),” says Stone.

“It’s also possible to use smaller sections of Cloud TPU Pods called ‘slices.’ We often see ML teams develop their initial models on individual Cloud TPU devices (which are generally available) and then expand to progressively larger Cloud TPU Pod slices via both data parallelism and model parallelism to achieve greater training speed and model scale.”

Transformation of industry verticals through 5G – Frost & Sullivan
5G has the potential to transform industry verticals through indoor connectivity, but certain key verticals will experience stronger growth than others.
Forescout strengthens investment in OT security
Forescout’s latest features will provide enterprises with improved productivity, lower risk profiles and faster mitigation of threats.
AWS announces the general availability of AWS Ground Station
Once customers upload satellite commands and data through AWS Ground Station, they can supposedly download large amounts of data over the high-speed AWS Ground Station network.
Data centre disruptor up $110m in funding
Guardicore plans to use funding to fuel company growth and disrupt the firewall and data centre security space.
Mobility and digitalization fuelling Next Generation data storage
Global Next Generation Data Storage Market size is expected to reach $106.3 billion by 2024, according to a new report by KBV Research. 
Connectivity in an age where the application is king
If you want to turn your organisation into a digital enterprise, you need to transform the edge, because that's where all the applications are delivered closest to the user.
Server Technology beats out competition at DCS Awards
Server Technology has taken out the top spot for the Data Centre PDU Innovation of the Year at the DCS Awards.
LogRhythm releases cloud-based SIEM solution
LogRhythm Cloud provides the same feature set and user experience as its on-prem experience.