Article by Schneider Electric Data Center Software Solutions vice president Domenic Alcaro
A Ponemon report found human error accounts for nearly one quarter of all unplanned data centre downtime, which Gartner says costs the average company $300,000 per hour.
To reduce the amount of human error in data centre management, we would do well to learn lessons from what may seem an unlikely source: the U.S. Navy, and in particular nuclear submarines.
How nuclear subs relate to data centre skills
While a nuclear submarine may seem like a completely different beast from a data centre, the similarities in how they should be managed are striking and many.
A nuclear sub contains a nuclear reactor plant, a steam plant, electrical and cooling plants, auxiliary systems and more – all stuffed into the back half of the sub.
You can imagine the complexity that goes into such a vessel, yet the Navy has succeeded in minimising human error in the environment by implementing detailed processes and policies – and ensuring they are consistently followed. In addition, multiple levels of system redundancy and interlocks exist, with a back-up system to the back-up system in many cases.
Still, whenever humans are involved, you can’t completely eliminate the possibility of human error.
In the Navy’s case, what it can do is put an intense focus on the people serving on board. It starts with a competitive selection process followed by 15 months of training before a sailor arrives on board. Once on board, an intense training and qualification process continues indefinitely. Learning never stops.
Apply nuclear sub lessons to data centre jobs
Data centres today need to be operated with this same kind of mission-critical mentality, and thus data centre facilities managers should follow many of the same principles as the Navy.
It starts with hiring the right people. Schneider Electric makes no secret about the fact that it seeks out military veterans for its Data Centre Facility Operations group, the folks who run some of the world’s largest data centres.
We’ve found military veterans have the right background for success in data centre careers. They understand the importance of having well-documented processes and procedures, and following them religiously.
In the data centre, that means having standard operating procedures (SOPs) for everyday operations and methods of procedure (MOPs) for conducting maintenance routines.
Having an emergency operation procedure (EOP) that is easy to memorise and readily available is also priceless in a time of crisis.
Data centre personnel must know exactly how to stabilise a data centre should a generator not start or if a breaker unexpectedly trips.
The U.S. Navy has formal training around the methodical sharing of information using status boards, change control processes, and documentation of all maintenance.
These are all sound practices for running any mission-critical facility, including a data centre.
Finally, data centre personnel, like the sailors on those nuclear subs, should always be learning.
Continuing education via on-the-job training as well as formal schooling and periodic drills are imperative to minimising human error and fostering continued process improvements.
That’s why Schneider Electric has a formal Critical Environment Technician (CET) training programme in place for the folks who run our customers’ data centres.
They learn data centre skills including how to effectively use advanced monitoring and management tools such as EcoStruxure IT to ensure data centre uptime.
The programme is also crucial for employee retention, which is a big issue in the data centre realm; so long as employees are learning, they tend to want to stay.