Cloud Security, Business continuity

Expect the Unexpected: How MSPs Can Help Prepare for IT Outages

Share
Encrypted authentication logs with detection alerts, cybersecurity, detection

As disruptive as IT outages are, they are also, unfortunately, ubiquitous. Hardly a week goes by without news of yet another outage -- whether unintentional (as with the July 2024 CrowdStrike incident) or because of a ransomware or other cyberattack.

That doesn't mean they have to catch us off guard -- MSPs and MSSPs can play a key role in bolstering customers' incident readiness and make sure their clients are as prepared as they can be to expect the unexpected, survive an IT outage or service disruption and come back stronger. The first step is paying attention to the basics.

Back to Basics

IT infrastructure and services have become as mundane as plumbing and electrical, Dmitry Sotnikov, chief product officer, Cayosoft, told ChannelE2E. And while that has enabled giant leaps forward in innovation and digital transformation, it has also enabled a certain amount of complacency from organizations when it comes to compliance, risk and even cybersecurity.

"These mundane systems that you depend on, that become sort of like your dial tone, or your plumbing, they don't tend to receive the same attention anymore as more 'exciting' tech," Sotnikov said.

MSPs and MSSPs should also go beyond simply meeting compliance mandates and test them often to make sure the chosen solutions work as intended. "So often companies will just buy a generic backup solution that means they can check the compliance boxes for backup and recovery and move on. But there's a huge difference between having the solution, having your backups, tapes, files, whatever and being able to actually recover and get back to business quickly if something happened. They are not training and testing to see how quickly they can recover," he said.

In the event of a ransomware attack, for example, organizations often don't realize that when the FBI investigates, they may lose access to their systems entirely due to law enforcement actions, Sotnikov said. Or the attackers may also corrupt their data backups and recovery systems, Sotnikov said.

It's also important to inventory critical systems and categorize which are necessary for doing business. If, for example, you are serving retail customers, then the bare minimum would enable them to transact with customers.

"Figure out what is your priority for your organization and market," Sotnikov said. "Out of all the applications and systems you have, which are business critical? If your customer is a hospital, what systems allow them to admit patients? And which systems enable those systems? Those are your tier zero, your systems that need to be back up and running, not in weeks, but within, ideally, minutes."

Once those are identified and you have in place processes to get those back up and running in an acceptable time frame, then you can start testing and putting exercises in place to simulate an outage and recovery. You want to practice this enough that it becomes second nature to make sure that you stay resilient -- it shouldn't be a yearly exercise, Sotnikov said.

"This shouldn't be a one-time exercise that you just did once. You don't want to discover when a disaster strikes that okay, that system used to work a year ago, but not anymore. You want to make sure that you have those tests, those exercises that just happen as part of your regular ongoing processes," he said.

Holding Clients Accountable

MSPs and MSSPs can hold clients accountable for these exercises by making it part of their customer contracts, said Brian Helwig, CEO of MSP360. There should be a written agreement that designates certain times, multiple times each contract year, to resilience, he said.

"It should be contractual -- maybe once per quarter, you agree, as a business, to dedicate two or three hours or to going through, say, a little bit of password training, for example," Helwig said. "So you use something like [microlearning platform] 7taps, which is a partner of ours, to go through and remind your customers how to update and change their passwords, for instance, to make sure your security awareness is up to date. Then, maybe you do a failover to another system that operates for a bit that afternoon and designate a day and time when the MSP will put it back," Helwig explained. These are simple actions MSPs and MSSPs can take to truly become a customer's trusted partner.

But one major obstacle services providers often encounter is that customers or potential customers don't see the value in preparing for a possible scenario that they don't believe could ever happen to them, he said.

"It can be really hard to convince them of the value when they think it very rarely happens. Even after a major cybersecurity incident, if you're lucky, maybe 5% of companies listen to what happened and decide to make a change," he said.

But there are simple ways to handle this, Helwig said. First, always offer resilience services, but if customers refuse, let them know -- and write it into the contract -- that when an outage occurs, your rate will double.

"It's a little bit if carrot and a little bit of stick, right? You can tell them, 'Hey, this is in your best interest,' and you can make that a part of your service, but you can't force them to accept it. But you can build in consequences while still keeping them safe," he said.

You should also implement and follow deployment best practices for software updates, Helwig added, which could have at least identified and mitigated the CrowdStrike incident.

"Usually you'd have a staging environment for a day, even two days for a software update and then roll it out more widely. Change management best practices, like who has access to the systems, and also educating your customer about what happens if there's going to be a significant change in their infrastructure -- if you're doing that on Thursday, then Friday they may have some challenges," he said.

Using an RMM can also help make sure your customers infrastructure is current, is working as intended and that things like security patches and dependencies are up to date. Object locking can also make sure backups remain safe and aren't corrupted, which can make it easier to roll back to a stable, secure version in the event of a cyberattack or outage, Helwig said. Finally, making sure documentation is updated and that institutional knowledge is passed along can also be a key way to ensure resilience.

When it comes down to it, there's nothing too magical about resilience, Sotnikov said. It comes down to preparation, planning and practice.

"Now that ransomware is unfortunately part of our lives, and we are seeing more and more outages, this is the new normal, unfortunately," he said. "We have to be prepared as an industry to understand that resiliency is a business critical function."

Sharon Florentine

Sharon manages day-to-day content on ChannelE2E and serves as senior managing editor for CyberRisk Alliance’s Channel Brands. She also covers enterprise-class technology companies, strategic alliances and channel partner strategies. Sharon is a veteran tech journalist and editor with more than 25 years experience in the industry, and has previously held key editorial, content and leadership positions at Techstrong Group, CIO.com, Ziff Davis Enterprise and CRN.