During a meeting last week I recalled an incident from early on in my career, of something that should never have happened.
I had occasion to visit our company data centre, newly built and opened, for a meeting (my life is really that exciting!). As I had not seen the data centre I was invited over with the carrot being a tour of the machine room. Me being an inveterate lover of data centres and technology (yes…buttons and flashing lights, not my fault! I grew up in the 70’s on a diet of 1960’s sci-fi!) I could not refuse. This facility was not what one would call small or cheap. Had all the latest kit from servers, switches and storage to the best in fire suppression as well as HVAC (Heating, Ventilation, Air Conditioning) and power. Fantastic structured cabling, managed server racks. Enough good stuff to make you want to move in, although the chillers would have been a bit problematic. Given I rather like being warm.
So while waiting at the entrance to the actual machine room, looking through a rather very large reinforced plate glass window, admiring the view I noticed some movement out of the corner of my eye. As one does, waiting at the entrance of a sealed and access controlled facility. At first I thought my eyes were acting up or that I had a particularly bad case of floaters. So I did what every sane person would do and rubbed my eyes and tried to look for whatever it was that I had seen.
After a while I managed to track down this thing that had caught my attention. It was a flurry of white and grey and a bit of pink I suppose. Yes. It was a pigeon. In the data centre. Being rather taken aback I rubbed my eyes again and, yes, it was a pigeon. Flying. Loose. In the data centre.
Around the corner of the area I was in, was the office of the centre director. I popped my head in and asked if they had a minute and and any experience in animal control, being the humourous kind of guy that I am.
The sight of several engineers, managers and a director chasing down a pigeon, that turned out to be breeding pair was something that did make my trip well worthwhile. Downside was that the tour was canceled but thoroughly understandable given the circumstance. Its not often one sees a pigeon in a data centre.
Eventually we discovered what had happened. It seems that during the site survey of the original building there were some ducting holes that ran from the room to the outside and not been filled in. This building having been originally a secure telco exchange. So assumptions were made.
Is there a moral to this? Why yes. Yes there is.
It seems that the NSA has some data centre issues as well.
Chronic electrical surges at the massive new data-storage facility central to the National Security Agency’s spying operation have destroyed hundreds of thousands of dollars worth of machinery and delayed the center’s opening for a year, according to project documents and current and former officials.
There have been 10 meltdowns in the past 13 months that have prevented the NSA from using computers at its new Utah data-storage center, slated to be the spy agency’s largest, according to project documents reviewed by The Wall Street Journal.
To my mind, if you are spending millions or billions you should be following some kind of methodology rather than plug every thing in and hope it all works. In my first example, assuming that a building is secure because of its past life is not a good thing. You need to approach the survey (in this case) with a clean slate and probe every aspect of the facility.
In the case of the NSA power issues I find it hard to believe that the power is so flaky as to cause actual damage and what seems to be a huge risk to life and limb. In this case, a unique facility with perhaps unique power needs, cannot be approached in a business as usual manner. There has to be a methodology to be followed. My first question in any kind of root cause analysis would be if there was recognition of the systems uniqueness, to what level had the vendors been involved in the design, and most importantly had there been any testing done?
Of course each outage we experience, or pigeon, is not a time for finger pointing but rather to learn from those issues and ensure that such events do not happen again.
(some circumstances regarding the pigeon incident have been changed to protect the innocent, and the pigeon…but the salient points are true – there was pigeon, it was a new DC, it was expensive and the building was considered secure, and yes the director did chase the bird)