DISASTER RECOVERY FOR PEOS
Disasters are inevitable, and their timing is unpredictable. Preparing your company and employees before disaster strikes can make the difference between a catastrophe or an inconvenience. While no one wants to experience a business disruption, especially any technology-related disruption, there are many reasons that you could end up in that position. A 2018 BCI Survey Report said that the top disruptors are general IT outages, cyberattacks, and transport disruption, usually caused by natural or manufactured disasters. For PEOs, creating a Disaster Recovery Plan (“DRP”) protects your company and the livelihood of all the companies and worksite employees you serve by providing a roadmap for how to recover and return your business to normal once a disaster strikes. A documented and tested DRP will help minimize your downtime, the impact to your revenue, and the impact on your customers.
If you need more reason to invest the time to create a DRP, look no further than the financial impact on your business. According to Gartner, IT downtime costs companies $5,600 per minute on average across the United States.
DISASTER RECOVERY METRICS
The two most important metrics for disaster recovery are recovery time objective (RTO) and recovery point objective (RPO), as these concepts help companies understand the business impact of disruptions and develop plans to address them.
This is how long a particular service can be offline without a significant business impact. For PEOs, the ability to process payroll cannot be down for more than a few minutes because payroll is a core service of their business. In contrast, accessing historical data reports could be down for a couple of hours without significant impact.
RPO refers to your target for the maximum amount of data that the business can tolerate losing measured in time (i.e., starting from when the failure occurs and going back to your last working backup). We usually discuss security changes or data backup when addressing vulnerabilities during disasters. The best way to prevent data loss would be to back up critical information in tiered servers or in the cloud. The RPO determines how frequently this needs to be done for each asset or function. This metric tells you how outdated your data can afford to be when an unplanned incident occurs.
For example, you can loose marketing metrics that were generated over the last 24 hours without causing real damage to your core business services, but losing more than the last 5 minutes of banking transactions would result in a large impact. In this example, you would set a longer RPO for the system that stores your marketing metrics and a relatively short RPO for the system that stores your banking transactions.
To ensure that you protect the most financially impactful items in your environment, you should perform a business impact analysis (BIA) for your critical business processes. The goals of the BIA are to identify the business impact when key processes are not functional and to identify the technology or planning necessary to restore functionality. If you have in-house IT personnel, they should help lead this project. If not, it is best to work with your managed provider or a trusted consultant.
For most PEOs, the items that you should spend the most time analyzing are:
Internet connectivity – With most of your platforms being cloud-based and not hosted on-premise, internet connectivity is one of the essential items.
Payroll platform – If your payroll platform is hosted, you need to have a conversation with your software vendor to ensure that they have a written DRP with documented RTO and RPO for each critical system.
Financial platforms – All financial platforms, including what you use to transmit ACH files to banks for money movement.
After you complete the BIAs for your critical processes, you should start working on risk assessments for those same processes to identify the conditions or situations that may lead to an outage.
Creating a world-class infrastructure is a good first step. However, an excellent second step is to create redundancy. Identify all possible reasons for failure; electrical, air conditioning, network, and computer systems, and ask yourself what would happen if a computer system were to go down unexpectedly. Would your business stop, or do you have a backup plan? Creating redundancy will reduce the time it takes for your business to come back up after a disaster. Common types of redundant sites are as below.
Cold sites are infrastructural backups — office spaces with power, cooling, and communication systems. They do not house any hardware or have a network configured. In the case of primary system failure, the operational teams will need to migrate servers and set everything up from scratch. It is the least expensive option. However, it requires extra labor after the fact and may not meet the organization’s RTO requirements if there is a problem with executing the transfer.
A hot site is the exact copy of the primary data center setup. It has all the necessary hardware, software, and network configured. Data is backed up based on RPO goals. In case of outages, the operations connect to the hot site without delay and continue with minimal downtime. Since this requires a constantly functioning setup, this is the most expensive option. It is also the most effective.
A warm site houses the necessary hardware with some pre-installed software and network configuration. Only mission-critical assets are backed up at less frequent intervals. This is a good option for organizations with less critical data and higher RPOs. A cost-benefit analysis may be required to decide between hot and warm sites.
Cloud disaster recovery is like traditional disaster recovery, but with some special considerations that will largely depend on your infrastructure. For example, if you are already using a Cloud Service Provider (e.g., Amazon AWS, Google Cloud Platform, Microsoft Azure, Oracle Cloud, etc.) for critical business functions, then you will need to account for the complexities of migrating data and services from those providers in the event of an extended outage.
With SaaS, the provider is supplying both the application and the infrastructure beneath the application as part of the service you are buying. Since the provider is responsible for maintaining the availability of the software, your disaster recovery focus will be on the business function being performed by the SaaS solution and the required data. For example, suppose you use a SaaS solution to execute ACH payments. In that case, your DRP should document the process for manually obtaining the correct data for ACH payments, your bank’s back-up payment process, and the steps you need to follow to switch over in an outage.
Another key consideration with SaaS providers is whether they have put in place the appropriate safeguards that will enable them to maintain the availability of the software you are buying. You should ask for applicable certifications which show a vendor’s compliance with recognized information security and data protection industry standards, such as ISO 27001 and SOC 2 Type 2. Ideally, you should request certifications and have these conversations before you begin a relationship with a vendor, but, if not, you should do it as part of this process.
Document your disaster recovery plan, simulate a real-world disaster, and test it several times a year to remove any kinks in your plan. If you take backups, restore them to understand how quickly your systems can be online. If you run your infrastructure in the cloud, work with your vendors to understand their service level agreements (SLAs) and document customer communication plans during downtime.
Things are stressful during a disaster. All team members must play their part during such times to prevent extended business downtime. Assign individuals roles and responsibilities so everybody knows their responsibilities when disaster strikes.
A well-documented, frequently tested, and up-to-date DRP will go a long way to minimize downtime and revenue impact.
For more information, please visit NAPEO’s cyber resources page at napeo.org/cyber.
Mulberri Sunnyvale, CA