From the time I began directing data center operations for a multi-tenant colocation facility, I tried to keep an eye on the little things. It is my job to make sure all of the policies and procedures are written and held to.
I’ve seen every SOP for this building and know that they are living documents that can be changed, if needed. I have tried my best to predict what could happen and mitigate it before it ever became an issue. There are plenty of sources on the internet to tell you about the big things – CMMS, DCIM, PM Schedules, Testing, Commissioning, PCI Compliance, etc. But here I am going to bring to light a few of the smaller things that usually pass under the radar, but can make or break your security reputation.
Data Center Rules: Give a Little Take a Little
I will let you in on a secret – the more precise a regulation, the more chance it has for someone to circumvent it through a technicality. And people will do it for the most profane of reasons. This is one way to go about it:
- You get every customer and employee sign the DC rules to acknowledge they have read them. Hold all employees and customers accountable, with reasonable ramifications.
- You keep them short and sweet to one page, so they are more likely to be remembered.
- And lastly, instead of listing all of the ways a person could try to work around a security checkpoint (and possibly giving them some ideas) I simply state “No user shall attempt to access areas of the building to which they are unauthorized.” If they need more access to the building locations, it is then authorized on a case by case basis.
Of course, other times you do need to be more specific.
Also, be sure to follow-up with technological and other advances: at times you need to slightly modify the rules to loosen or tighten the scope. For example, two things that weren’t an issue when the rules were first written are Google Glass, where we had to rewrite our photography/video rule a little more vaguely, and E-Cigarettes, with which we got more precise on the “Smoking” rule.
A piece of advice from me to you – when fine tuning, always think like a hacker trying to get around you on a technicality.
Equipment Is and Will Be Prone to Failure
The more a piece of equipment is used, the more likely it is to fail.
Case in point: security appliances. In every data center I have worked in, as a customer or operator, the electronic security system fails at some point. This will always tend to happen on your busiest day or while on a sales tour. Fingerprint readers, hand imagers, vascular scanners, weight sensors, iris scanners, IR scanners… You name it; it will fail at some point. You, or your customers need access to their machines. What do you do? Think this through. What are your contingency failover plans? Do you have automatic failover with failure detection? What about a long-term outage while waiting on some parts from a vendor? Better to think about it now, than be sorry later.
It is Okay To Be The “Bad Guy”
Sometimes, you need to put your foot down and tell customers or even your employers “No.” Sure, it is possible they will override you, but it is better to try to maintain safety at the cost of convenience. Safety is a core premise that your empire is built upon. The physical safety of customers and employees, the safety of the equipment running at all times, the safety of the network connectivity staying live. Every rule is written, in every book you have, should support this goal, and you should be able to explain why.
If someone is trying to do something against those policies and harm your data center security, you need to say no tactfully, and be ready to explain it if (when) they do try to counter your decision. For example, we have a “no unattended microwave use” rule, and some people think it is silly. In a different environment, I saw what a 30-minute microwave burrito did to the air quality. But if there is the slightest chance of that smoke getting to the dc floor, boy oh boy, are you in a deep…trouble.
Smoke and Dust ALWAYS Get To The DC Floor
Speaking of smoke, unless you are working in some sort of clean room environment, smoke and dust will simply find a way. At PhoenixNAP, when we took ownership of this building that we converted into a data center, we gutted the back, and started on the meet me room. After the meet-me room was built, we decided we wanted to sandblast and seal the brick walls of the cross-connect room. Silt got everywhere in the building, even though the CCR was isolated, plastic and taped off and, for the most part, just a brick room.
Office space, our newly installed MMR, the construction going on for the DC floor, everything got coated with a fine layer of silt. Why is this important to remember? From the moment you go live, you are working on a production system that you cannot take offline. Any further construction, expansions, demos, even adding circuits, is done in a hot environment. If you do not plan carefully, you may have dust triggering your VESDA, a lot of DC floor and cabinets to clean, under-floor cleaning, and possible SLA issues. So try to lock down the remodeling before going live, if possible.
Get the Goodies: Don’t Forget The Rebates
Check with your local utility to find out what sorts of things qualify for rebates. We have had excellent luck in spending the time to work on the rebates. It seems boring to some, but it is well worth it. If you can, ask your utility rep if they work with a partner to assist with rebates. Some rebates require that you notify them of the intent to purchase to qualify. Some items we have received rebates for include the VFDs on our pumps, efficiency of the UPS system, DC motors in our CRAHs, switching to more efficient lighting and switching to more power efficient servers.
All of this adds up, but in the beginning, it can be easy to get wrapped up in the construction and installation. Be sure to plan your paperwork accordingly so that, if you need to, you can file your rebates under two separate fiscal years, and not hit the annual maximums.
Be Like Rosie
There was an old Jetsons episode where Rosie the robot goes on a fritz and starts over-tidying everything, repeating “A place for everything, and everything in its place!” It was funny to watch, and as a child, bordering on scary. However, when you are running a critical facility, we all have to be more like Rosie.
Sit down with the veterans, the commissioning agents and talk about times they have seen sites dumped because of a weird concurrence of events that caused something that was not in the right place to hit a manual off switch. Everyone laughs, but these really are resume updating events. There is no reason for it. If you need It, make a place for it. Make sure it is there any time it is not actively in use. Your datacenter will look and be tidier, your staff will always know where their tools are, and you can strive to eliminate another source of human error events.
The Fire Marshal Wins
You have sat down with your designers, security and drawn on your experiences at other data centers and decided you will build somewhat of a unicorn – that “perfect” security system. Unfortunately for you, you have to get the fire marshal to sign off on it. Their priority is usually not the same as your intention. If you start the design process with personal safety first and server safety second, you can design a system that will facilitate both of your needs and no scrambling for emergency open buttons and mantrap failures.
In the end, there is no way you can be ready for every eventuality, and will probably spend a lot of time planning for things that never, actually, happen. However, talk to guys that have been around, and ask about the little things. See what you might pick up.
This, of course, is not a comprehensive list, but merely a start of a discussion. What unexpected experiences have you had while in your environment? Is there ANYTHING you wish you had known when starting out? We would love to hear your data center story or two that will add to our shared knowledge. Heck, we may even publish it!
Author: Tom Busha, Director of Data Center Operations at phoenixNAP