Story image

Speak like a data center geek: Big data

18 Nov 16

Big data is big for a lot of reasons. Some are literal (its massive datasets) and some are based on the promise of what it could one day deliver. For instance, IDC estimates a 44 billion gigabyte-sized digital universe by 2020, and the big data inside it offers potentially huge amounts of actionable and mind-blowing insights.

At Equinix, we’re into helping uncover all of it. But a first step is understanding some key big data definitions. That’s what our “How to Speak Like a Data Center Geek” series is for.

We’ll start basic on our first big data entry, since the list of definitions associated with big data is … big.

Big data

Too obvious? Well, we wanted to expand the big data definition a bit beyond what’s clear just by reading it – namely, it involves “big” amounts of “data.” A geek can do better. Here’s a solid definition from McKinsey: “Big data refers to datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze.” So maybe big data can also be accurately called “too big data?”

The 3Vs

In an important 2001 report, Gartner analyst Doug Laney laid out the defining dimensions of big data, and they all happen to begin with “V”:

Volume: This refers to the depth and breadth of the data that must be managed, and is always growing. For instance, IBM says we create 2.5 quintillion bytes of data very day. That’s enough to fill 10 million Blu-ray discs.

Variety: This is the diversity of the types of data that make up big data datasets. It could be from video, audio, text, photos, etc., and proper analysis involves reconciling it all.
Velocity: The sheer and increasing speed with which data is acquired and used.

People have added or proposed more Vs over the years (value, veracity, variability), but it all starts with the 3Vs.

Structured Data                 

This is data that has a defined length and format, such as numbers and dates, and is usually stored in a database. It accounts for about 20% of the data out there, and its structured nature makes it easier to access and organize. So it is potentially powerful and widely usable.

Unstructured Data 

This type of data does not follow a predefined data model or fit into relational databases. Examples include video, the text of email messages and social media. This makes up the bulk of the big data universe and has huge potential, but also presents bigger challenges for those trying to organize and gain insight from it.

Analytics

DataInformed’s has a concise definition of analytics: “Using software-based algorithms and statistics to derive meaning from data.” But the reality is that big data analytics could have an entire Geek entry on its own (and maybe someday, it will). Here are a few subgroups of big data analytics: behavioral analytics, event analytics, location analytics, text analytics. The bottom line is that without good analytics, big data is akin to a mountainous pile of papers dumped on the floor of a 100-acre warehouse. Big data analytics makes big data make sense.

Article by Jim Poole, Equinix blog network

Is Supermicro innocent? 3rd party test finds no malicious hardware
One of the larger scandals within IT circles took place this year with Bloomberg firing shots at Supermicro - now Supermicro is firing back.
Record revenues from servers selling like hot cakes
The relentless demand for data has resulted in another robust quarter for the global server market with impressive growth.
Opinion: Critical data centre operations is just like F1
Schneider's David Gentry believes critical data centre operations share many parallels to a formula 1 race car team.
MulteFire announces industrial IoT network specification
The specification aims to deliver robust wireless network capabilities for Industrial IoT and enterprises.
Google Cloud, Palo Alto Networks extend partnership
Google Cloud and Palo Alto Networks have extended their partnership to include more security features and customer support for all major public clouds.
DigiCert conquers Google's distrust of Symantec certs
“This could have been an extremely disruptive event to online commerce," comments DigiCert CEO John Merrill. 
Schneider Electric's bets for the 2019 data centre industry
From IT and telco merging to the renaissance of liquid cooling, here are the company's top predictions for the year ahead.
China to usurp Europe in becoming AI research world leader
A new study has found China is outpacing Europe and the US in terms of AI research output and growth.