Equinix: Data cache and edge placement

Thu, 26th Oct 2017

FYI, this story is more than a year old

In this blog article, we'll examine how to address data performance by placing data caches/copies in proximity to the users, systems and applications that need them, or - alternately - keeping the data local to where it was created for data sovereignty and compliance.

We'll also discuss solving for multicloud data access by placing data at network intersection points to clouds.

Increasingly, workloads are migrating to various clouds and to the digital edge - where commerce, population centers and digital ecosystems meet.

This strategy leverages local cloud compute and storage resources to gain proximity to users and the systems, processes and applications/analytics that require real-time access to the data in those workloads.

When centralized data is accessed over the WAN, unacceptable latency dramatically slows application and analytics performance, creating an unpleasant experience for users and hindering the ability to gain timely business insights for real-time decision-making.

In addition, copying data into the cloud can be undesirable due to high ingress/egress costs associated with storing, moving and retrieving said data, leaving many businesses at an impasse - and complicating cloud migration.

Today's digital businesses understand they must break through this impasse. They can do so by:

Moving data closer to the edge where business and customer engagement occurs
Placing data in compliant edge control points to meet changing regulations and address concerns around data leakage and breaches
Encrypting and masking stored data for businesses and moving numerous functions to SaaS and cloud providers
Tracking renegade behavior (“shadow IT”) to minimize exposure to risk and reduce data sprawl

The forces compelling digital businesses to place more of their data at the edge are real, however, the transition requires careful design.

The perception is that moving large datasets out to the edge can create management and accountability problems that would not occur if the data were managed centrally.

However, today's IT infrastructures were not designed to manage massive amounts of remote data as it is backhauled to centralized data centers. Since data is fast becoming a corporation's most valuable asset, its distribution requires a careful strategy. In the absence of one, edge data will fragment and become unmanageable and at risk for potential exposure.

Distributing data to the edge, and in particular, to the cloud, poses other issues in addition to the management challenges stated above.

For example, most workloads have been architected and sized based on local I/O expectations. Increasing that number demands higher-bandwidth and lower-latency data transport infrastructures.

In addition, moving a workload to the cloud without its associated data is rarely feasible, given that the data is mostly unclassified - it may be “allowed” to go to cloud, or not.

To move it may require CIO approval, which definitely throws a wrench into the speed of cloud migrations. Also, traditional IT infrastructures typically consist of tightly coupled storage services, including backups, snapshots and offsite replicas, which would need to carry over to datasets in the cloud for end-to-end hybrid IT coverage.

Finally, the cost of storing data on-premises and in the cloud is often misunderstood in planning discussions, which can prevent exploring innovative data management solutions.

Distributed data cache and edge placement

An Interconnection Oriented Architecture (IOA) approach prescribes placing distributed caches for your data in strategic control points at the edge for greater performance, security and control.

By providing direct, secure and proximate interconnection to data within local digital edge nodes, an IOA framework leverages low-latency, high-throughput, direct cloud connectivity for private data access and exchange.

Local data services are deployed in an edge node, where they can harness cloud connectivity for private access. Data can be used by applications from multiple clouds, partners or users without actually being stored in the cloud (see diagram below).

Typical use cases involve a mix of data types/images, media content and workload data sets.

Most data management and cloud providers have tools for centralized management for this environment via a service API, which typically supports more block data interfaces and, increasingly, an object/API.

Taking steps to moving data caching to the edge

The following design pattern steps outline how to successfully establish a distributed, data caching deployment at the edge:

Data Cache and Placement at the Edge

Deploy local, private storage in an edge node, and add a second instance (this could be a second edge node) for failover/recovery, if desired.
Replicate/synch the edge nodes.
Attach interfaces to any segmented networks the edge nodes will be servicing, including cloud access.
Integrate with security boundary control and inspection zone
Register with vendor(s) management tools, and publish a self-service API.
Configure policies, and integrate with policy management. Collect events and logging.
Configure daily snapshots.
Leverage the object interface to backup directly to the distributed repository (or directly to cloud storage).
Configure data to be pushed to the distributed repository or cloud, if one or the other is being used as a cache.

The value of localizing data at the edge

The gains that digital businesses realize by localizing data in a digital edge node are multifold. They include:

Localizing data removes the bulk of latency and is best positioned at the intersection point to the cloud.
Running multicloud application workloads doesn't require moving data — you can access the data in the edge node over direct and secure, low-latency connectivity.
Added layers of data protection allow distributed data repositories to scale, without increasing risk.
Distributed data caches at the edge can migrate data between multiple clouds.
Data edge nodes can act as a data exchange server to monetized data sets accessed/shared with partners across segmented networks.
Additional streaming data and real-time data cache tools can use the digital edge node as a backup store, de-staging lower priority data.
Placing data in the control point forces those that wish to access it to traverse security services, minimizing data leakage and privacy breaches.

Article by Nicolas Roger, Equinix blog network