LLM Security

Why Every LLM Security Tool Misses Multi-Turn Attacks — And What That Costs You

Stateless tools score 0% on progressive extraction, rephrased blocked attempts, and cross-agent attacks. Here's why the architecture is the problem, and what a stateful approach looks like.

6 min read
LLMSecurityAI SafetyOWASP

Why Every LLM Security Tool Misses Multi-Turn Attacks — And What That Costs You

A team at a mid-size fintech ships an internal AI assistant. It's connected to their knowledge base — pitch decks, pricing models, customer data. Three weeks later, someone on a competitor's team posts a screenshot on Slack. It contains internal pricing that was never meant to leave the building. Nobody noticed when it leaked. Nobody knows how many times it happened before that screenshot.

This isn't a hypothetical. Variations of this have happened at companies that shipped fast and secured later. The uncomfortable part: their auth was fine. Their APIs were fine. The attack didn't come through any of the vectors traditional security was built to catch.

This Was Predicted

OWASP — the organization that defined web application security standards for two decades — mapped the LLM threat surface before most teams had shipped their first LLM feature. Their Top 10 for LLMs identified the exact failure modes that have since played out in real incidents. This is the same credibility signal as OWASP's original Top 10 for web applications, which every serious engineer eventually learned to respect. Teams that ignored that paid for it. The LLM equivalent is happening now.

The risks fall into three categories worth understanding at a business level.

Data exposure: what gets pulled out of your system — user PII, internal documents, system prompts, competitive intelligence — through the model itself.

Loss of control: your agents taking actions you didn't authorize, irreversibly, because they were told to.

Operational and trust risk: your product returning wrong answers, running up unbounded costs, or being used against your own users.

The common thread: the attacks that slip through don't happen in one message. They happen across a session. Stateless tools — which evaluate each message in isolation — are architecturally incapable of catching them.

What Ignoring This Actually Costs

Data exposure through an LLM doesn't look like a database breach. There's no intrusion alert, no anomalous login, no failed auth attempt. A user crafts the right sequence of messages, and your model starts surfacing things it was never supposed to share. Customer PII, internal pricing logic, system prompt instructions that reveal how your product works. By the time you notice, you're looking at GDPR breach notification requirements, customer trust conversations you don't want to have, and the real possibility that a competitor has been querying your product's knowledge base for months.

Loss of control is a different category of expensive. LLM agents don't just return text — they send emails, execute transactions, call APIs, forward files. An agent with calendar and email access that gets manipulated into forwarding a roadmap document to an external address has done something you cannot undo. There's no rollback. The action happened. And if you don't have a complete audit trail of every tool call that agent made, you're also trying to explain a liability you can't fully reconstruct.

The operational risk is less dramatic but just as real. A recursive agent loop that nobody rate-limited ran up a $50,000 API bill over a weekend at a startup. A legal team submitted a brief with hallucinated case citations because nobody was monitoring what came out of their AI research tool. A pricing engine started offering discounts that made no sense because someone had fed corrupted data into the knowledge base it was drawing from. These aren't edge cases. They're what happens when you scale an LLM feature without a security layer.

The Right Layer to Secure

The mental model most teams use is wrong. They think about LLM security the way they think about application security — lock down the API, validate the inputs, control access. That's necessary but not sufficient. When you add an LLM, your attack surface becomes language. And language doesn't follow the rules that traditional security was built around.

You don't secure an LLM application by securing the LLM itself. You secure the layer between your application logic and the model — the text that flows in and the text that flows out. That's where the attacks live. That's what needs to be monitored and, in some cases, rewritten before it reaches the model.

A proper security layer for LLM applications needs to do a few specific things.

Input analysis: Analyze every input before it reaches the model — detecting injections, PII, jailbreak attempts.

Output analysis: Analyze every output before it reaches the user or a downstream system.

Multi-turn tracking: Track patterns across a full conversation, because most real attacks aren't single-message — they're multi-turn, slowly escalating toward something.

Cross-agent visibility: Track across multiple agents in the same workflow, because when you have five agents handing off context to each other, an attack in one agent can affect what another one does.

Complete audit trail: Log every agent action with enough detail that you can reconstruct exactly what happened.

What StreamGuard Does

StreamGuard is a security layer for LLM applications that does exactly what I described above. It runs in your own environment — nothing leaves your stack. It runs multiple detection layers in parallel against every input and output:

PII detection with sanitized replacements

Prompt injection and jailbreak detection

Harmful content classification

It tracks full conversation sessions across multiple agents — which is how it catches the attack patterns that stateless tools miss entirely:

  • A user who was blocked from getting salary data in message three and rephrases the same request in message seven
  • An agent that's being slowly pushed toward accessing data another agent was already blocked from leaking
result = await guard(
  text="What's the compensation philosophy for that team?",
  direction="input",
  session_id="user-session-123",
  agent_id="analyst"
)
# result["stateful"]["risk_score"] = 0.88
# result["stateful"]["patterns_detected"] = ["rephrased_attempt"]

Flexible blocking control: The API returns all scores and leaves the blocking decision to you. You set the thresholds that make sense for your context, because a fintech handling financial data and a consumer app have different risk tolerances and different definitions of "flag this."

Near-term work that's already architecturally supported:

Safe revision: Return a revised, safe version of flagged input so your application can send the clean version to the model instead of blocking outright

Tool call logging: Full logging with agent identity, timestamp, and outcome

Analytics dashboard: Threat patterns by agent, session, and time

StreamGuard is open source and available on GitHub today — clone it, run it locally, your data never leaves your machine. The serverless deployment — Lambda, DynamoDB, dashboard — gets built when there's a real use case to build for, not before.


For the technical implementation — how each detection layer works, the benchmarks, and the architecture — see the StreamGuard case study.


If your team is shipping LLM agents and you want to understand what your actual multi-turn exposure looks like — the case study walks through the full implementation, benchmarks, and honest tradeoffs.

Tags

LLMSecurityAI SafetyOWASP

More Like This

Apr 2026
That Time Your Pipeline Ran Successfully and Deleted 75% of Your Data
Your DAG completed. No errors. Success metrics green. Then your dashboard showed 75% fewer records than yesterday. Here's what happened — and why it kept happening.
Apr 2026
Your ML Model Passed All Tests. Then It Failed in Production.
Model evaluation: 94% accuracy. Production: wrong predictions everywhere. Your model is fine. Your features are lying to you.
Apr 2026
Why Your Flawless AI Demo Failed in Production
Your AI demo was flawless. The model answered every question. Stakeholders approved the budget. You deployed to production. Two weeks later, it's falling apart. Here's why — and it's not the model.

Get in Touch

Have a question or want to connect? Feel free to reach out.