Chaos Engineering for Red Teams - Part 2
How chaos engineering can be used to strengthen the provisioning process of red team infrastructure.
In the last post I covered what is chaos engineering and how it can be leveraged to build operational resiliency during red team engagements.C
Chaos engineering isn’t just useful for testing how infrastructure holds up during live operations—it can also add a lot of value to the provisioning process. For red teams operating at a high maturity level, provisioning isn’t just a matter of spinning up a few servers and installing tools. It involves creating infrastructure that is automated, reliable, and flexible enough to support different types of operations, often under tight time constraints. Introducing chaos engineering into this process can help identify and eliminate failure points before they impact real-world engagements.
Follow my journey of 100 Days of Red Team on WhatsApp, Telegram or Discord.
One key challenge is that provisioning processes often assume ideal conditions. What happens if the cloud region you’re deploying to suddenly becomes unavailable? What if your scripts fail midway through installing a required dependency, or a software module gets updated and breaks compatibility with the rest of your tooling? These are common issues, and chaos engineering helps you prepare for them by intentionally injecting failures and observing how the system recovers or doesn’t.
Even though red team infrastructure might be tailored per engagement, the underlying provisioning logic often shares common building blocks - deploying virtual machines or containers, installing agents, configuring redirectors, setting up C2 frameworks, and deploying support services like DNS or logging. These components can be tested independently through chaos tests to validate that they work reliably even when things go wrong.
Here are some practical examples of what can be chaos engineered during provisioning:
Cloud provider region outage - Does the provisioning process automatically fails over to another region or provider if the default region is unavailable?
Installation failures - What happens when installation scripts are intentionally corrupted or required packages are removed? Does the automation detect the problem and recovers?
Dependency updates - How does the provisioning handles newer versions of third-party modules or libraries? Does it break any functionality? What happens if it does?
Decommissioned tools or libraries - What happens when a required library or module has been decommissioned? Is there a fallback strategy?
License issues - How does the process handles scenarios where paid tools have expired or licenses have been exhausted? Does the process detects this early or provides alternatives or warnings?
The value here lies in predictability and robustness. You’re not chaos engineering for every possible engagement, you’re building trust in your provisioning pipeline. You want confidence that, regardless of cloud provider, time zone, or last-minute operational pivot, the system can bring up reliable infrastructure.
It’s fair to ask whether this is overkill. For red teams just getting started, it might be. But for teams operating at a Level 4 or higher on the Red Team Capability Maturity Model—those with repeatable, automated workflows and a need for high reliability—this kind of testing can significantly reduce downtime and increase readiness.
Red Team Notes
- Chaos engineering can improve the provisioning process of red team infrastructure by simulating failures like cloud region outages, broken dependencies, software updates, and license issues.
- By validating the provisioning pipeline under stress, red teams—especially those at Level 4 or higher on the Capability Maturity Model—can build more robust, adaptive, and reliable infrastructure, reducing the risk of deployment failures during critical operations.
Follow my journey of 100 Days of Red Team on WhatsApp, Telegram or Discord.