Story image

GTC18 - NVIDIA ready to go all out on inferencing

28 Mar 2018

NVIDIA has announced a series of new technologies and partnerships that expand its potential inference market while lowering the cost of delivering deep learning-powered services.

An inference engine is the aspect of AI that allows a device to figure out new information based on a set of rules and what it already knows.

"GPU acceleration for production deep learning inference enables even the largest neural networks to be run in real-time and at the lowest cost," says NVIDIA accelerated computing vice president and general manager lan Buck.

"With rapidly expanding support for more intelligent applications and frameworks, we can now improve the quality of deep learning and help reduce the cost for 30 million hyperscale servers."

TensorRT 4, the latest iteration of the inference optimiser, offers highly accurate INT8 and FP16 network execution and can be used to optimise, validate and deploy trained neural networks in hyperscale data centers.

NVIDIA Tesla GPU-accelerated servers can replace several racks of CPU servers for deep learning inference applications and services, freeing up rack space and reducing energy and cooling requirements.

The company says that the new software delivers up to 190x faster deep learning inference compared with CPUs for common applications such as computer vision, neural machine translation, automatic speech recognition, speech synthesis and recommendation systems.

Google and NVIDIA engineers have also integrated TensorRT into TensorFlow 1.7, making it easier to run deep learning inference applications on GPUs.

“The TensorFlow team is collaborating very closely with NVIDIA to bring the best performance possible on NVIDIA GPUs to the deep learning community,” says Google engineering director Rajat Monga.

“TensorFlow's integration with TensorRT now delivers up to 8x higher inference throughput (compared to regular GPU execution within a low latency target) on NVIDIA deep learning platforms with Volta Tensor Core technology, enabling the highest performance for GPU inference within TensorFlow."

NVIDIA engineers have worked with Amazon, Facebook and Microsoft to ensure developers using ONNX frameworks such as Caffe 2, Chainer, CNTK, MXNet and Pytorch can now deploy to NVIDIA deep learning platforms.

NVIDIA partnered with Microsoft to build GPU-accelerated tools to help developers incorporate more intelligent features in Windows applications.

GPU acceleration for Kubernetes was also announced, which will facilitate enterprise inference deployment on multi-cloud GPU clusters.

NVIDIA is contributing GPU enhancements to the open-source community to support the Kubernetes ecosystem.

In addition, MathWorks announced TensorRT integration with MATLAB.

Engineers and scientists can now automatically generate high-performance inference engines from MATLAB for Jetson, NVIDIA Drive and Tesla platforms.

TensorRT can also be deployed on NVIDIA Drive autonomous vehicles and NVIDIA Jetson embedded platforms.

Deep neural networks on every framework can be trained on NVIDIA DGX systems in the data center and then deployed into all types of devices for real-time inferencing at the edge.

Intel building US’s first exascale supercomputer
Intel and the Department of Energy are building potentially the world’s first exascale supercomputer, capable of a quintillion calculations per second.
NVIDIA announces enterprise servers optimised for data science
“The rapid adoption of T4 on the world’s most popular business servers signals the start of a new era in enterprise computing."
Unencrypted Gearbest database leaves over 1.5mil shoppers’ records exposed
Depending on the countries and information requirements, the data could give hackers access to online government portals, banking apps, and health insurance records.
Storage is all the rage, and SmartNICs are the key
Mellanox’s Kevin Deierling shares the results from a new survey that identifies the key role of the network in boosting data centre performance.
Opinion: Moving applications between cloud and data centre
OpsRamp's Bhanu Singh discusses the process of moving legacy systems and applications to the cloud, as well as pitfalls to avoid.
Global server market maintains healthy growth in Q4 2018
New data from Gartner reveals that while there was growth in the market as a whole, some of the big vendors actually declined.
Cloud application attacks in Q1 up by 65% - Proofpoint
Proofpoint found that the education sector was the most targeted of both brute-force and sophisticated phishing attempts.
Huawei to deploy Open Rack in all its public cloud data centres
Tech giant Huawei has unveiled plans to adopt Open Rack proposed by the Open Compute Project in its new public cloud data centres across the globe.