Story image

Lessons learned from running the world’s largest data centers

15 Jan 2018

While managing facility operations for large data centers certainly takes specialized skills in a range of disciplines, the more you do it, the better you get at it.

Given that Schneider Electric has more than 800 people managing facility operations for some 100 large data centers around the globe, it’s fair to say we’ve learned a great deal.

In fact, I recently viewed a webinar that a colleague of mine presented on the topic, “Lessons Learned from Running the World’s Largest Data Centers.” 

In this post, I’ll pass along at least a few of those lessons (and invite you to check out the webinar for the rest).

Most of the lessons we’ve learned fall into one of five general categories:

  • Competency
  • Standardization
  • Risk management
  • Tracking and reporting
  • Operation and maintenance costs


In terms of competency, the main issue is that most companies have expertise that lies in areas other than managing data centers, a topic we covered in this previous post.

That’s as it should be.

If you’re in, say, retail, healthcare or manufacturing, your expertise lies in those areas; the data center is merely a supporting function.

But it’s an issue if you want to run the data center using internal employees, because you don’t have a large workforce to pull from. I’ve been to conferences where entire panels have been dedicated to the issue of training millennials in data center operations. Universities are only now starting programs to address the issue.

As a result, we routinely see companies with data center infrastructure management (DCIM) and other tools installed, but they’re not using them to their full extent – because they simply don’t have the appropriate expertise.


With respect to standardization, companies tend to run into trouble after mergers and acquisitions, or if they experience rapid growth.

They wind up with a series of data centers, with no common set of standards in terms of how to operate them.

No matter if you’ve got two data centers or 20, you need to share learnings among all of them.

Schneider Electric’s standards and procedures are best in class in part because we are diligent about sharing what we learn in operating each one of the 100 or so that we operate. We use those learnings to continually update our processes and procedures so when a problem occurs, we have sound emergency procedures in place to follow.

They should include back-out procedures to follow in the event something unexpected happens after a data center change – to prevent the issue from getting worse.

Risk management

Such procedures are closely related to the risk management topic. One of the big lessons here is to have a full-system approach to data center management.

If you need to take a component out of service to perform maintenance, for example, you need to first understand the impact and dependencies of that component with respect to the rest of the data center.

Doing so requires a thorough understanding of the data center.

For any data center we manage, Schneider Electric likes to get in on the construction phase, or as close to it as possible.

That way we can gain a thorough understanding of the architectural drawings, piping, wiring and so forth – all of which is knowledge that helps mitigate the risk that goes into operating a data center.

Tracking and reporting

Tracking and reporting is an area that gets overlooked far too often, leading to wasted operational costs.

With proper tracking and reporting, you should be able to identify stranded IT capacity – that old rack of servers over in the corner, for example, that nobody is really sure still serves a purpose. (We’ve all seen those, right?) 

Reclaiming that capacity can help you stave off a data center expansion by getting more out of the space you’ve already got.

Operation and maintenance costs

Which leads to the final area, operation and maintenance costs.

We’ve learned plenty of lessons in how to keep these costs down, like using condition-based and predictive maintenance to replace components only when they really need it, as opposed to when some schedule says they do. 

And if you effectively track your assets (see previous point), then you can start determining which ones require the most maintenance – and potentially save money by replacing them. 

Article by Anthony DeSpirito, Schneider Electric Data Center Blog 

Opinion: Modular data centers mitigate colocation construction risks
Schneider's Matthew Tavares believes modular data centers are key for colocation providers seeking a competitive advantage with rapid deployment.
VMware announces new features in WMware Cloud, Dell EMC integrations
VMware announced VMware Cloud Foundation 3.7 is expected to be available on Dell EMC VxRail in VMware’s Q1FY20. joins European Data Centre Association
The company announced today it has joined other heavyweights in the European Data Centre Association (EUDCA).
Opinion: Meeting the edge computing challenge
Scale Computing's Alan Conboy discusses the importance of edge computing and the imminent challenges that lie ahead.
Protecting data centres from fire – your options
Chubb's Pierre Thorne discusses the countless potential implications of a data centre outage, and how to avoid them.
Opinion: How SD-WAN changes the game for 5G networks
5G/SD-WAN mobile edge computing and network slicing will enable and drive innovative NFV services, according to Kelly Ahuja, CEO, Versa Networks
TYAN unveils new inference-optimised GPU platforms with NVIDIA T4 accelerators
“TYAN servers with NVIDIA T4 GPUs are designed to excel at all accelerated workloads, including machine learning, deep learning, and virtual desktops.”
AMD delivers data center grunt for Google's new game streaming platform
'By combining our gaming DNA and data center technology leadership with a long-standing commitment to open platforms, AMD provides unique technologies and expertise to enable world-class cloud gaming experiences."