Operationalizing Prompt Injection and AI Jailbreaks

Learn how red teams can abuse AI vulnerabilities like prompt injection and jailbreak in real-world enterprise environments.

Dec 29, 2025

Prompt injection has become a practical risk in enterprise environments. While much of the public discourse focuses on tricking consumer chatbots into saying prohibited words, the real risk for organizations lies in how Large Language Models (LLMs) are integrated into internal workflows. This trust creates a new class of security issues. Instead of injecting code, attackers inject instructions. If those instructions influence decisions or actions, the impact can be serious.

To understand the attack surface, we must look at the typical enterprise architecture. An LLM often sits between a user interface and a backend database or internal API, acting as a reasoning engine. It takes a system prompt, which defines its rules and capabilities, and concatenates it with user input. This combined text is processed to generate a response or trigger an action. The vulnerability arises because LLMs cannot inherently distinguish between instructions (the system prompt) and data (the user input). If an attacker can craft input that the model interprets as a new instruction, they can override the system's intended logic. This is functionally similar to SQL injection, where data masquerades as code to alter a query.

Follow my journey of 100 Days of Red Team on WhatsApp, Telegram or Discord.

From a red team perspective, it is useful to distinguish between prompt injection and jailbreaking, although they often overlap. Jailbreaking typically refers to bypassing the safety guardrails trained into the model itself, such as forcing it to generate hate speech or dangerous instructions. Prompt injection, however, targets the application logic surrounding the model. It involves subverting the specific instructions given to the model by the developers. For enterprise assessments, prompt injection is generally the higher-value target because it leads to business logic bypasses, whereas jailbreaking often results only in reputational damage or policy violations.

High-impact scenarios appear when these models are connected to real workflows. Internal assistants are a common example. These assistants may summarize incidents, answer employee questions, or explain alerts. If an attacker can insert instructions into user input or referenced data, they may influence how the assistant behaves. This can lead to disclosure of internal information or misleading guidance. In some cases, the assistant may reveal parts of its internal logic or system instructions.

The real power of prompt injection comes from chaining it with other vulnerabilities or tool capabilities. Many enterprise LLMs are now equipped with "tools" or plugins that allow them to query databases, send emails, or execute code. If a model has access to a Python interpreter to perform data analysis, a successful injection can convince the model to execute arbitrary Python code. Similarly, if the model can query a customer database, an injection can trick it into dumping records that the current user should not have access to. The LLM effectively becomes a confused deputy, performing privileged actions on behalf of the attacker.

Testing for these vulnerabilities requires a methodical approach. A red team engagement should start by mapping all entry points where user data feeds into the LLM. This includes direct chat inputs, but also indirect sources like file uploads or scraped web content. Testers should attempt to define the system prompt by asking the model to reveal its instructions. Once the constraints are known, the goal is to craft inputs that delimit the original instructions and introduce new high-priority commands. Techniques often involve using formatting cues, such as special delimiters or markdown, to confuse the model's parsing structure.

To sharpen your skills, I highly recommend practicing on purpose-built CTF platforms like Lakera’s Gandalf or the Prompt Airlines challenge.

Follow my journey of 100 Days of Red Team on WhatsApp, Telegram or Discord.

Neural Foundry

Dec 30

Excellent framing of the confused deputy problem. The comparison to SQL injection is dead on, but whats trickier in practice is that most devs still treat user input validation as a solved problem until they realize delimiter tricks arent caught by traditional WAF rules. I've seen entreprise teams burn weeks trying to sandbox prompts only to discover the LLM itself doesnt respect context boudnaries the way code does.

100 Days of Red Team

Discussion about this post

Ready for more?