Building for exascale - lessons learnt from the data centre

Wed, 21st Feb 2018

FYI, this story is more than a year old

By James Coomer, VP for Product Management, File Systems and Benchmarking, DDN Storage

It's often said that there are two types of forecasts – lucky or wrong. Predictions remain that supercomputing systems will not reach exascale level (i.e. systems capable of being measured in exaFLOPs which is a billion billion calculations per second) for another 5 years or so. But, this is not the case when you are talking about the readiness of storage systems that can support exascale. The kind of storage architectures that will support these environments are already here; and being utilised in the high-end cloud and supercomputing world. Certainly, from a storage architecture point of view we are well beyond supporting petascale (current generation) systems.

Exascale is just for national labs…

…well no. Firstly we need to define exascale. Literally, it refers to floating point calculation rate, but more broadly it refers to the whole environment that supports such a large compute system, from the applications running atop the compute, through to the storage that manages the data flow to and from the compute. The application of exascale is certainly not just for labs. Just like a space program, the benefits of research and investment into these massive scale national supercomputers are felt well beyond the program itself. Although supercomputer use cases at exascale have been, and will continue to be, national lab based, the impact of exascale will undoubtedly change the face of both the wider High Performance Computing (HPC) sectors and furthermore, business analytics and machine learning.

From weather forecasting, medical research, cosmology, and quantum mechanics to machine learning and AI, exascale storage systems have an application. Simply put, any sector with massive amounts of data that needs to be analysed concurrently at extreme rates will benefit from exascale technology for years to come.

Exascale in the enterprise - is the compute letting down the storage?

Enterprise use cases for exascale-capable storage systems expose a lot of challenges across the board in algorithm design, network architecture, IO paths, power consumption, reliability, and so-on. One of the major areas of concern in the application of supercomputing, machine learning or analytics, is the ability to perform a huge array of tasks simultaneously with minimal disturbance between tasks. Otherwise known as concurrency, this parallel execution is critical to success.

In contrast to previous major supercomputing milestones, exascale will not be reached by increasing CPU clock speeds, but rather through massive core counts enabled by many-core and GPU technologies. However, when you increase core count, the applications must increase in thread count to take advantage of the hardware and this in turn builds a concurrency-management problem which can be a real headache for enterprise data centers and cloud providers, particularly when it comes to I/O and storage management.

Unlike the national labs, rather than managing one monolithic supercomputer, often running a single “grand challenge” application at a time, enterprise data centers are faced with general workloads that vary enormously with massive thread counts and highly varied request patterns all stressing the storage system at any one time. So, what you really need is a new storage architecture that can cope with this explosion in concurrency across the board.

Traditionally HPC applications have required a lot of attention from algorithm developers to ensure that I/O patterns match well the specific performance characteristics of storage systems. Long bursts of ordered I/O from a well-matched number of threads are well-handled by storage systems, but small, random, malaligned I/O from very large numbers of threads can be catastrophic for performance. As we move to exascale every component of the architecture must do its part to address issues like these allowing application developers to focus on other areas for optimisation and scaling.

Changing I/O in the exascale generation

Data-at-scale algorithms are also changing as the workloads that they are handling are transforming - the heightened use of AI across enterprise sectors, in machine-learning for self-driving cars and real-time feature recognition and analytics introduce very different I/O patterns than we are used to seeing in the supercomputing world. Now, I/O is characterised not by an ideal, large I/O, sequential access, but rather a complex mixture of large, small, random, unaligned, high-concurrency I/O in read-heavy workloads, which require storage to provide both streaming performance, high IOPS and high concurrency support.

The key to success as we utilise exascale storage systems will be in the inclusion of systems that can handle the stress associated with this new generation of operation with many core systems and the new spectrum of applications that display very diverse I/O behaviours.

The secrets behind a exascale storage architecture

HPC burst buffers certainly have their place in addressing this problem. Originally conceived to assist supercomputers in dealing with exascale issues of reliability and economically viable I/O, burst buffers were originally intended as an extreme performance, Flash-based area for compute nodes to write to.

We started addressing the challenges of exascale systems around five years ago by developing a sophisticated layer of software that manages I/O in a very different way. We wanted to bridge the chasm between the application and new, solid state ultra-low latency storage devices to fundamentally address the sub-microsecond latencies which were emerging. And, unlike classic flash arrays, to do so at supercomputer (or cloud) scale. Furthermore, we wanted to support not just the limited supercomputer use cases, but instead create a system which could fundamentally do I/O better right across the board.

HPC burst buffers can make exascale I/O a reality today, and enable enterprises to run HPC jobs with much greater speed and efficiency by overcoming the performance limitations of spinning disk. By speeding up applications you can run more jobs faster and in parallel – all very well.

But, you can go quite a long way further by introducing a software defined storage service that introduces a new tier of transparent, extendable, non-volatile memory (NVM), with latency reductions and greater bandwidth and IOPS performance for the next generation of performance-hungry scientific, AI, analytic and big data applications.

This eliminates locking limitations and other filesystem bottlenecks while reducing storage hardware. When you have a very large dataset and a lot of compute, a performant system on paper can easily become gummed up by the internal mechanics of a (parallel) file system in performing lots of filesystem operations and remote procedure calls (RPCs) – due to the indivisibility of concurrency mechanism and deterministic data placement. You can then replace these traditional data and control paths with a new, flash-era paths that expose the IOPS of the underlying media directly to the applications – removing those bottlenecks.

The evolution continues…

The evolution in enterprise data-at-scale will continue to move forward at a significant pace. While most data intensive organisations started off on NFS servers, then moved to scale-out NAS systems, and for tougher workloads used Parallel FileSystems, these enterprises will now need to embrace the new generation of high performance storage architectures to handle the explosion of data-intensive applications and take advantage of flash. This can be achieved, and at massive scale by taking advantage of the many lessons learned from building exascale storage systems, and deploying the new generation of data platforms built for the flash era.