Story image

Speak like a data center geek: Big data

18 Nov 2016

Big data is big for a lot of reasons. Some are literal (its massive datasets) and some are based on the promise of what it could one day deliver. For instance, IDC estimates a 44 billion gigabyte-sized digital universe by 2020, and the big data inside it offers potentially huge amounts of actionable and mind-blowing insights.

At Equinix, we’re into helping uncover all of it. But a first step is understanding some key big data definitions. That’s what our “How to Speak Like a Data Center Geek” series is for.

We’ll start basic on our first big data entry, since the list of definitions associated with big data is … big.

Big data

Too obvious? Well, we wanted to expand the big data definition a bit beyond what’s clear just by reading it – namely, it involves “big” amounts of “data.” A geek can do better. Here’s a solid definition from McKinsey: “Big data refers to datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze.” So maybe big data can also be accurately called “too big data?”

The 3Vs

In an important 2001 report, Gartner analyst Doug Laney laid out the defining dimensions of big data, and they all happen to begin with “V”:

Volume: This refers to the depth and breadth of the data that must be managed, and is always growing. For instance, IBM says we create 2.5 quintillion bytes of data very day. That’s enough to fill 10 million Blu-ray discs.

Variety: This is the diversity of the types of data that make up big data datasets. It could be from video, audio, text, photos, etc., and proper analysis involves reconciling it all. Velocity: The sheer and increasing speed with which data is acquired and used.

People have added or proposed more Vs over the years (value, veracity, variability), but it all starts with the 3Vs.

Structured Data                 

This is data that has a defined length and format, such as numbers and dates, and is usually stored in a database. It accounts for about 20% of the data out there, and its structured nature makes it easier to access and organize. So it is potentially powerful and widely usable.

Unstructured Data 

This type of data does not follow a predefined data model or fit into relational databases. Examples include video, the text of email messages and social media. This makes up the bulk of the big data universe and has huge potential, but also presents bigger challenges for those trying to organize and gain insight from it.

Analytics

DataInformed’s has a concise definition of analytics: “Using software-based algorithms and statistics to derive meaning from data.” But the reality is that big data analytics could have an entire Geek entry on its own (and maybe someday, it will). Here are a few subgroups of big data analytics: behavioral analytics, event analytics, location analytics, text analytics. The bottom line is that without good analytics, big data is akin to a mountainous pile of papers dumped on the floor of a 100-acre warehouse. Big data analytics makes big data make sense.

Article by Jim Poole, Equinix blog network

Protecting data centres from fire – your options
Chubb's Pierre Thorne discusses the countless potential implications of a data centre outage, and how to avoid them.
Opinion: How SD-WAN changes the game for 5G networks
5G/SD-WAN mobile edge computing and network slicing will enable and drive innovative NFV services, according to Kelly Ahuja, CEO, Versa Networks
TYAN unveils new inference-optimised GPU platforms with NVIDIA T4 accelerators
“TYAN servers with NVIDIA T4 GPUs are designed to excel at all accelerated workloads, including machine learning, deep learning, and virtual desktops.”
AMD delivers data center grunt for Google's new game streaming platform
'By combining our gaming DNA and data center technology leadership with a long-standing commitment to open platforms, AMD provides unique technologies and expertise to enable world-class cloud gaming experiences."
Inspur announces AI edge computing server with NVIDIA GPUs
“The dynamic nature and rapid expansion of AI workloads require an adaptive and optimised set of hardware, software and services for developers to utilise as they build their own solutions."
Norwegian aluminium manufacturer hit hard by LockerGoga ransomware attack
“IT systems in most business areas are impacted and Hydro is switching to manual operations as far as possible.”
HPE launches 'right mix' hybrid cloud assessment tool
HPE has launched an ‘industry-first assessment software’ to help businesses work out the right mix of hybrid cloud for their needs.
ADLINK and Charles announce multi-access pole-mounted edge AI solution
The new solution is a compact low profile pole or wall mountable unit based on an integration of ADLINK’s latest AI Edge Server MECS-7210 and Charles’ SC102 Micro Edge Enclosure.