I’ve never seen an IT department that didn’t have some secrets. I’m not talking about the root password here. I’m talking about skeletons in the closet…the kind that make systems managers lose sleep.
If you’re a CEO, there’s a pretty good chance that your IT department knows something about your business that you don’t, and you should probably find out about it. I’d encourage you to go ask them yourself—your CIO or CTO might not know either. You could delegate this investigation to them, but the answers you get back may well be filtered by whatever bias they have.
Technical Debt and Risks
The goal of this conversation is to identify technical risks that could substantially impact your ability to do business. The “bad answers” to the questions that you ask will point to issues that, if left unaddressed, will eventually become problems that will need an urgent solution. Applying that solution, however, may not be quick and often won’t be inexpensive either.
As you have this conversation, when you find issues, don’t look for someone to blame. Technical debt is common to many companies and often accretes due to factors beyond the control of your IT staff. Common examples relate to legacy systems that can’t be maintained or test systems that were promoted to production to meet a sales requirement before they could be properly configured for use in the production environment. It's easy to argue for the use of a working system if one doesn't understand the implications of having to maintain it. It takes a very strong IT manager to refuse a request of that type.
Questions For Your IT Staff
Here are some of the questions you should consider asking your IT staff:
“Are there any systems that you are afraid to upgrade or makes changes to”? – This is important, because systems that fall into this category are often not well understood. They may be orphans that were set up by people who no longer work for the company and for which there is no practical documentation. Such systems may be a critical, though mysterious, part of a dataflow process that nobody can fully explain. If changes are considered risky, then rebuilding a failed system may be impossible. There is potential for the system to cause a cascade of changes that need to be made in other areas. Eliminating or replacing the system may require making software changes to a product so that the system is no longer needed. If the answer to this question is “yes”, then a good follow up question would be “What would happen if we lost this system”?
“Are there any systems whose failure would significantly impact our ability to do business”? followed by “How long would it take to recover from such a failure”. I think the reasoning behind this question is fairly obvious. You want to understand if any systems are single points of failure and how long it might take to replace them. Understanding your tolerance for such failures is the first step to creating a mitigation plan for correcting a problem. If you can’t withstand as much downtime as would be required to replace the system (including the time to order and receive parts, if necessary) then you might need to look at having redundant parts or a hot-standby ready to go. (Virtualization may help solve some of that problem.)
“What’s the state of our backup system?” and “When was the last time our ability to restore from backups was tested?”. Despite seeming like something that should be easy, backups are surprisingly hard to get right. A lot of attention to detail is required and the nature of backup reports is such that it is easy for staff to become fatigued and fail to notice important problems. There are lots of scary statistics about how often backups fail and what the chances of a company staying in business are if they suffer a major data loss. (spoiler: only 6% of companies last more than 2 years.) A periodic test plan for ensuring that restores are working is essential.
“Do we have any significant security weaknesses” – There are lots of things that you might hear when you ask this question. It could be network problems, out of date software that is vulnerable to attacks, ex-employees who might still have login privileges or any of a number of other problems.
“If you were on vacation, would somebody else be able to fix any problems that might come up”? – Depending on the number of IT staff you have and the culture of the company, this might or might not be a problem. Well documented systems and sufficient cross training are vital to the ongoing well-being of an organization. But sometimes IT staff don’t have the time (or skill) to write good documentation. Additionally, they may feel very protective of their domains and not want to share their knowledge. A person who is indispensable because of their unique IT knowledge is just as much of a problem as a solitary system that can’t be replaced.
Finally, get input about how to fix the potential problems. Asking “What needs to happen to make these things better”? goes a long way in making someone feel that their opinion counts. It’s unlikely that their answer won’t come with a cost. The resources that will be needed often include more time, more staff, more experienced staff, more equipment and access to services or software to keep things running smoothly.
It’s unlikely you’ll have the budget to take care of everything at once but try to get a plan in place to address the big problems first and conquer the smaller ones over time. The mere fact that you had the conversation is likely to help the IT staff reset some of their priorities and start thinking about how to clean the skeletons out of the closet.
If you would like to talk about this more or need assistance in figuring out your own vulnerabilities, feel free to contact us.