Beating Ransomware with proper cloud architecture
Jason Woodman – Certified Kubernetes Architect, Principal Site Reliability Engineer
Intelletive founder & CTO
Ransomware is a form of malicious software, or malware, designed to deny access to a computer system or data until a ransom is paid. Attacks have sharply increased in both number and sophistication over the past several years with the average ransom paid out in Q3 of 2020 being $233,817 according to cybersecurity company Coveware. In October 2020 the Federal Bureau of Investigation released the following warning: “the FBI, the Cybersecurity and Infrastructure Security Agency and the Department of Health and Human Services explain they have “credible information of an increased and imminent cybercrime threat to U.S. hospitals and healthcare providers.” Simply put, ransomware threatens the United States healthcare system. Though the threat is not limited to healthcare systems, any organization with systems or data that hold value can be, and increasingly are, targets of ransomware.
In January 2021, the Cybersecurity and Infrastructure Security Agency (CISA) announced the “Reduce the Risk of Ransomware Campaign”, a focused, coordinated, and sustained effort to encourage public and private sector organizations to implement best practices, tools, and resources that can help them mitigate this cybersecurity risk and threat.
While there are many publications, checklists, and blueprints available to help plan your ransomware mitigation strategy, this article will focus on the use of the modern cloud utilizing DevOps tools and techniques to form a more secure environment for your business computing and a more recoverable one.
Restore from Backup or Disaster Recovery?
Most organizations with critical systems or data perform periodic backups of their data and sometimes will also replicate their infrastructure components, such as VM’s. For smaller organizations, or organizations with a smaller dependency on IT systems, executing a simple restore process is usually adequate to recover from a ransomware attack, and be of minimal expense. But what about companies with larger IT systems and more IT/data dependent organizations? Ransomware oftentimes has been in their data center or cloud ecosystem for months or years before it is ever activated, the backup systems themselves are compromised due to generations of backups being infected, and to restore data without knowing the date of original infection is to re-infect the data center. Losing the infected data is to lose generations of transactions that will have to be manually reconstructed. The financial loss is extreme, either in paid ransom or in resource hours spent to rebuild. There must be a better way to recover.
Scenario: Medium-Sized multi-site Bank is the target of ransomware
A medium-sized national bank with 85 locations across the United States has all of its information systems centralized into 1 co-location datacenter in the middle of the country. This architecture has been working great for them for years. Most of their internal services are web applications which all the branch locations connect to via a secure VPN tunnel back to the colo. This design has proved to have many advantages over the years:
- All datastores are centralized in the colo, allowing for nightly backups of all databases and file shares.
- VPN tunnels provide strong encryption to protect data in transit between the branches.
- IT labor costs have been well under control for years, having 1 solid team to manage the colo and its operations.
- Hardware costs for branch locations are relatively low, leveraging off-the-shelf computers and laptops for banking staff.
- Adding new branch locations is both fast and affordable from an IT perspective with predictable costs.
- Compliance has been relatively easy to maintain as the same virtual machines have been maintained and groomed for years.
The Ransomware Attack
The President of the banking chain is enjoying her morning coffee early on a Tuesday morning, reviewing the purchase agreement for a new branch location when a new email arrives in her inbox at 9:00 am. The email states that she has 1 hour to send $50,000 to a particular PayPal account or all of her IT systems will be shut down. Not thinking about it too much she forwards the email to her CTO to see if there is any cause for concern. By 10:00 am realizing that an hour has passed she checks her inbox again. Another email stating her systems have been shut down, the only way to bring them up is to purchase the “fix” for $50,000. A minute or so later her phone is ringing and text messages are rolling in that all the tellers and drive-thrus are unable to serve customers as they can not access their software systems. Customers are waiting patiently for now, but there is a growing concern that their money is not safe and accessible.
The CTO calls and says his staff is able to log into systems for the most part but is not able to bring up the web services or access admin tools. He knows now that this is a ransomware attack. He re-assures the president that he can have all data restored from backups within a few hours, except that any new data from this morning will be lost. Having to make a tough decision the president authorizes him to go forward with the restore process and starts preparing communications to the branches on how to handle the data loss and also how to explain the “hiccup” in services to their clients. The entire organization is in damage control mode, trying to minimize the losses and customer attrition from this incident.
By 1 pm all backups have been restored and systems are back online. The damage control campaign continues, re-assuring customers their money is safe and accessible. This continues into the evening and many employees work all night to catch up and rectify the 2 to 3 hours of data loss that was sustained.
By Friday morning it seems things are back to normal, the board of directors has been informed that monetary losses in the tune of tens of thousands of dollars have been incurred in the incident, with the total loss likely being well into 6 figures by the time customer attrition has been accounted for over the coming weeks and months. All in all, this has been a huge financial setback for the organization, but their long-standing business has been salvaged.
Plans are put in place to increase the frequency of backups and improve data restoration systems and processes. The CTO and his staff begin researching vendors and solutions to implement these new policies.
The Second Attack
The following week on Monday morning, this time at 11 am, the President receives another email that says: “Restoring your data will not save you, we have control of your datacenter. The price is now $200,000. We urge you to comply immediately, upon receipt of payment you will receive another email with the key to unlock your systems. Please act quickly and within the hour, your business depends on it.”
This time the President immediately calls the CTO, both are frantic as the new backup and restore systems and processes have not yet been selected and are nowhere near implemented. They decide to escalate the situation to the board of directors. Board members are contacted and unanimously agree that the organization can not afford another outage, this time with even more data loss. While $200,000 is a very large number to them, they decide to pay the ransom as the number seems smaller than the losses that will be incurred with another outage. The president sends the money to the PayPal account, shortly after she receives an email congratulating her on her quick and decisive actions to save her company.
The Third Attack
Weeks have passed since the second attack and the CTO and his staff have been able to implement new systems that allow them to back up the data every hour, and do a full restore within 2 hours. While the cost of the ransomware attacks has had a lot of financial impact on the organization, they have managed to keep the company afloat and mitigate customer attrition.
At 10 am on a seemingly normal Tuesday morning the President is again receiving calls from the branches that all systems are down, the CTO confirms that systems are inaccessible. This seems like a normal hiccup in services that are seen from time to time, IT staff begin working on the issue when another email is received by the President. It states, “We are delighted to see your continued success and ability to recover from your IT challenges last month. Please make your monthly payment of $50,000 and we will send you another key to unlock your systems. Do not bother restoring your data from backups, we have taken the precaution of encrypting those as well for your safety.”
Again the payment is made in desperation. The board is furious, the CTO and his staff are out of answers. The question on everyone’s mind, is there a better solution?
Cloud Native Computing – A Solution to Ransomware
The repeated ransomware attacks against the regional bank are made possible by traditional infrastructure. Simple backups have not saved them from the repeated attacks, and subsequent payments to the hijackers. Migrating their infrastructure to cloud native solutions can and will eliminate their vulnerability to these attacks. Let’s outline what that might look like for the regional bank.
A new perspective – Cloud Native Computing
Cloud native technologies allow for a perception shift in IT infrastructure, the popular analogy is looking at the servers and devices that comprise the enterprise’s infrastructure as cattle as opposed to pets. Meaning we approach it more holistically, embracing the constant change and ephemeral nature of cloud native computing.
Traditional IT – they have “Pets”
Servers and devices that are treated as indispensable or unique systems that can never be down. They are manually configured and maintained, oftentimes with names and operators who have intimate knowledge of them and their configuration and nuances. They are a daily part of our lives and when they are purring, they make us happy and comfortable. They’ve become pets.
Cloud Native – they have “Cattle”
Servers and devices that are built using automated tools, engineered from the ground up with the expectation that they can and will fail. These systems are designed with concepts such as auto-scaling, self-healing, external configuration management, and automated security and deployments. They don’t have names or nuance aside from maybe a serial number or unique identifier, nor is there any attachment to them. Constant change is not only tolerated but embraced. A server or device can be killed as easily as it is set up.
Cloud infrastructure can quickly be stood up and tore down, making it a perfect host for automated security and infrastructure. The underlying hardware and its configuration is exposed via REST APIs, allowing for automated pipelines and Infrastructure as Code. We can treat our infrastructure just as we do the software we develop. Code is stored in a repository under version control allowing for change management and externalized configuration management. Changes are subject to code reviews and automated testing and deployment.
It would be hard to talk about Cloud Native technologies without talking about Kubernetes. We have many blog articles about the benefits and value of Kubernetes, but in this case, we want to focus on the ephemeral nature of software workloads running in Kubernetes. The software is containerized and immutable, forcing all data stores to be separated and isolated. Ransomware
Response with Kubernetes in the Cloud
Let’s look at how the regional bank might respond to a ransomware attack if its infrastructure is viewed as cattle and deployed on Kubernetes in the cloud.
The Attack – (In the cloud native environment)
The President of the bank is enjoying her morning coffee and the dreaded email comes stating that her systems are compromised and she can pay for the decryption key to recover her systems. She forwards the email to the CTO, who now has a decision to make. Do we restore from backup, or do we declare this a disaster incident, and execute the disaster recovery plan? Knowing his infrastructure is ephemeral and was created with code and deployed with automated pipelines, he knows he can have the completely new infrastructure in under 1 hour, leaving the attackers with no leverage over the organization.
The CTO declares the attack a disaster and recovery immediately begins, he assures the President they will have completely new infrastructure within the hour. IT staff immediately stand up a new Azure subscription that has nothing connected to the existing subscription or servers, virtual networks, or datastores that are running in it. Passing the new subscription ID into their automated terraform pipeline, within minutes they have an exact replica of their existing, compromised infrastructure except this one is ransomware free. Software pipelines are kicked off, deploying all workloads to the Kubernetes cluster while datastores are being restored from offsite backups. Within minutes automated testing has validated the infrastructure is not only functional and ready to take production load, but it is also secured. IT staff redirect DNS entries to the new infrastructure in the new subscription and they are live. The compromised subscription is immediately destroyed and the ransomware attackers no longer have any sort of presence in their environment. There is no reason to attack this organization again, a ransom will never be paid.
Taking Control, a defensive posture
Ransomware is a very real threat to modern-day IT teams and the outcome of a ransomware attack will always negatively impact a business. Any business, healthcare facility, school, university, or even municipal government can be or has been attacked by Ransomware. The risk runs beyond just the dollar amount demanded as ransom. The breach of data privacy also opens the victim organization to potentially millions of dollars in litigation and compensation fees as well. That’s the second wave of victimization.
But all is not lost, and there is a better way. Utilizing cloud infrastructure, tooling, and Kubernetes, combined with a skilled DevOps team to design and deploy this new operations environment, any organization can greatly reduce or even eliminate the threat of ransomware. The evolution of your operational environment into Kubernetes will not only address the risk of a ransomware attack it will also introduce you to the many benefits of a cloud native environment.