Chaos Engineering for Red Teams - Part 1
How chaos engineering can be leveraged to strengthen red team infrastructure by simulating common disruptions.
Chaos engineering is a practice that originated in the world of reliability and system operations. It involves intentionally introducing failures or unpredictable conditions into a system to test how it behaves and whether it can recover gracefully or not. The goal is to uncover weaknesses before they lead to real problems. While chaos engineering is commonly used in areas like Site Reliability Engineering (SRE), it can also be a powerful tool for red teams aiming to build resilient and adaptive infrastructure.
When applied to red team operations, chaos engineering can be approached from two main angles. First, it can be used to test the resilience of infrastructure during live operations—how well red team infrastructure performs when faced with unexpected disruptions during an engagement. Second, it can be used to evaluate the resilience and adaptability of the provisioning process—how quickly and reliably infrastructure can be spun up or migrated in response to changes like shifting cloud regions or providers. A third, blended perspective combines both, such as when a redirector fails mid-operation and the system dynamically replaces it to maintain continuity. In this first part, we’ll focus on the first angle: testing operational resilience of the red team infrastructure.
Follow my journey of 100 Days of Red Team on WhatsApp, Telegram or Discord.
One key consideration when designing chaos engineering experiments for operational resiliency of red team infrastructure is that every target environment is different. Each engagement will involve a unique combination of security tools, network architecture, and behavioral patterns. To ensure the chaos tests are widely applicable, they should be designed to be environment-agnostic. This means focusing on disruption types that are common across environments, rather than on environment-specific conditions.
Here are concrete examples of operational chaos tests:
Drop outbound beacon traffic for 5–10 minutes - Simulate temporary network loss. Observe if implants resume communication once the channel is restored.
Block a commonly used port (e.g., 443 or 8080) - Does the beacon falls back to alternate ports or protocols? This will obviously require multiple listeners of same type running on different ports and the logic to fallback to a different port be implemented in the beacon payload.
Corrupt or remove a staged payload - Can staged payloads self-heal or rotate upon emulated detection and quarantine by security tools?
Kill the implant process on a test endpoint - Test if infrastructure can detect the lost beacon and trigger a replacement deployment or secondary access channel. This may require manual intervention on the red team operator’s behalf.
Inject DNS resolution failures for C2 domains - Mimic DNS sinkholing or tampering. Do fallback domains kick in?
Throttle bandwidth or add latency between beacon and C2 - Is the communication stable over degraded connection?
Each of these chaos tests is independent of the tools or defensive technologies used in the environment. The key is to test for resilience behaviors rather than specific threat responses. Can the system recover from lost connectivity? Can it reroute communication after a port is blocked? Does it detect and respond to traffic anomalies?
Red Team Notes
- Chaos engineering is a method to test the operational resilience of a system.
- It can be used to test resiliency of red team infrastructure by simulating generic disruptions like beacon drops, port blocks, or DNS issues.
- Chaos tests should be designed to be environment-agnostic that focus on common failure scenarios instead of target-specific details.
Follow my journey of 100 Days of Red Team on WhatsApp, Telegram or Discord.