Perspective

What happens after an AI agent does something it shouldn't?

May 2, 2026 · Kevin Minn, Founder, Vindicara

A map of AI agent security tooling, and the layer most teams don't realize they're missing.

It is 2:14 AM. A multi-agent customer support system has been running for six months, handling 40,000 conversations a day. Tonight one of the agents called a refund tool 31 times in 90 seconds, all to the same customer account, none of which a human approved. The customer notices in the morning and calls in. The SOC opens the trace.

What the SOC has is a JSON file in whatever shape LangSmith, or Arize, or Datadog, or your own logger, writes. The trace is mutable. It is in your application's own format. It is unsigned. It contains the prompt, the model output, the tool call arguments, the tool return value. It does not contain a cryptographically verifiable chain that says "this exact decision was made by this exact agent at this exact time, before anyone could touch it."

So what do you tell the customer? What do you tell the bank? What do you tell the regulator who shows up six months later because three of those refunds got flagged as money laundering?

The AI agent security market has spent two years building tools that prevent bad things from happening. A smaller wedge of the market has built tools that observe bad things happening. The third part of the lifecycle, what happens after, is where the tooling thins out. This post is a map of who builds what, where the gaps are, and what we built to fill one of them.

Three layers

Classic enterprise security has a three-layer shape that has been stable since the 1990s.

A WAF blocks malicious requests before they reach the app. A SIEM aggregates logs and alerts on suspicious patterns. Forensic imaging captures disk and memory state when an incident is being investigated, in a chain-of-custody form that a regulator, court, or insurer will accept.

These three layers do different jobs. They compose: a mature security stack runs all three. Removing any one of them breaks the others. Without prevention you drown in incidents. Without observability you cannot tell prevention is working. Without forensics you cannot prove what happened, contest a finding, file an insurance claim, or comply with post-incident regulatory obligations.

AI agent security is going through the same evolution. The pre-incident and during-incident layers have multiple credible vendors. The post-incident layer, for agent semantics specifically, is sparse.

Three layers, mapped to AI agents:

Pre-incident. Prevention. Block prompts, gate tool calls, scan inputs, enforce output schemas.
During-incident. Observability. Traces, evals, dashboards, latency, prompt versioning, cost.
Post-incident. Forensics. Reconstruct what happened. Sign it. Verify it against an authorization scope. Hand it to legal, regulators, or insurers in a form that holds up.

The rest of this post walks through each layer, names the vendors that operate in it, says honestly what the layer does well and what it cannot do, and ends at the third layer, which is where Project AIR ships.

Pre-incident: runtime defense

The first layer is prevention. Tools here intercept prompts, outputs, or tool calls in real time and decide whether to allow them through.

The names: Lakera Guard sits in front of LLM endpoints and classifies prompt-injection attempts. Protect AI Guardian does runtime model scanning and policy enforcement. HiddenLayer focuses on ML supply-chain risk and runtime detection. NeMo Guardrails (NVIDIA) is a declarative input/output gating framework you wire into LangChain or LlamaIndex agents. Guardrails AI enforces structured output schemas and policy violations on LLM responses. CalypsoAI and Robust Intelligence sit adjacent, broadening into governance and red-teaming respectively.

What this layer does well: it stops the obvious bad things before they reach a tool call. Prompt injection, PII leakage, jailbreak attempts, schema violations, the easy 80%. Anyone deploying agents to production should be running something at this layer. The cost of not running prevention is paid in incidents that should never have happened.

What this layer cannot do: tell you what happened after a block decision was logged. The block fires, the request is denied, the log entry is a row in a stream nobody designed for evidentiary use. The log is mutable, in the application's own format, and signed by no one. When a regulator or insurer asks "show me what your agent did at 2:14 AM," the prevention log answers part of the question, "we blocked these requests," but does not produce a record that can be independently verified against tampering.

This is not a criticism. WAFs do not solve every security problem either. The point is that prevention is the first layer of a stack, not the whole stack. When prevention fails, and prevention fails, you need the next layers to be there.

During-incident: observability

The second layer is observability. Traces, evals, dashboards, latency, prompt versioning.

The names: LangSmith (LangChain's hosted observability), Arize Phoenix (open source agent traces with LLM-eval primitives), Langfuse (open source LLM observability and prompt management), Helicone (LLM proxy with built-in tracing), Weights & Biases Weave (the W&B extension into LLM observability), Datadog LLM Observability (the obvious enterprise default for shops already on Datadog), New Relic AI Monitoring (same play for the New Relic shops), Honeycomb for teams using it for general distributed tracing.

What this layer does well: ops visibility on a complex stack. You can see what your agent did, how long each step took, which prompts were used, what the model returned, what the cost was, and which evals fired. This is what your platform team needs to keep the system running. The vendors here are good at what they do, the open source options are mature, and the hosted offerings have reasonable pricing for the value.

What this layer cannot do: produce evidence with integrity guarantees. A trace is mutable. It can be edited, truncated, replayed in a different order, or dropped entirely. Nothing in the schema enforces "this is what actually happened, signed by the agent's runtime, before anyone could tamper with it." The trace tells operations "the system is healthy, here is what it is doing." It does not tell legal "here is signed, tamper-evident evidence the agent did X at time T." Those are different questions and they require different artifacts.

A trace is the output of an observability layer. A signed forensic record is the output of a forensics layer. The fact that both contain similar fields, prompt, output, tool calls, latency, does not make them interchangeable for the use case post-incident workflows actually need.

Post-incident: forensics

The third layer is where it gets thinner.

You can pipe LLM traces into general SIEM tooling. Splunk, Datadog Cloud SIEM, Elastic Security, Sumo Logic, all of them ingest application logs and apply detection rules. They are mature, they are battle-tested, they have integrations with everything. What they do not do is model agent semantics at the schema level. They treat an agent trace as one more application log stream. The concepts that matter for agent forensics, capsules, tool authorization scope, signed envelope chains, multi-agent coordination, are not first-class in any general SIEM today. You can build them on top, the same way you can build anything on top of Splunk, but you are doing schema work the SIEM does not help with.

Below the SIEM layer is AWS CloudTrail, GuardDuty, and equivalents from GCP and Azure. These are infrastructure-level. They tell you a Lambda invoked another Lambda. They are agent-blind by design.

In the agent-aware corner, there are two pieces worth naming.

First, AgDR (the Agent Data Record format), originated by accountability.ai and the work of Mahmoud Mohamed Anwar (me2resh). AgDR is an open schema specification for signed agent records. It defines a record envelope, a payload structure, a signing primitive, and a verification model designed specifically for agent forensics. It is the closest thing the field has to a canonical schema for this layer. We adopted it.

Second, Project AIR, the open source SDK and CLI we ship as projectair on PyPI under MIT. AIR is the reference implementation that produces AgDR-format Signed Intent Capsules from running agents. The capsule is signed with Ed25519 over a BLAKE3-hashed payload, chained to the previous capsule in the session, verifiable with a published key, and emitted in real time as the agent runs. AIR ships 16 detectors: 10 covering the OWASP Top 10 for Agentic Applications (ASI01 through ASI10), 3 covering the OWASP Top 10 for LLM Applications categories most relevant to agent runtimes (LLM01 prompt injection, LLM06 sensitive information disclosure, LLM04 model denial of service), and 3 AIR-native detectors (forensic chain integrity, plus NemoGuard safety and corroboration). 14 run offline out of the box; the 2 NemoGuard detectors activate with an NVIDIA NemoGuard NIM. ASI10 is implemented as Zero-Trust behavioral-scope enforcement against an operator-declared scope, which is what the OWASP spec mitigation describes. It is not anomaly detection. The learned-baseline anomaly variant is on the roadmap.

AIR also ships air report article72, an EU AI Act Article 72 post-market monitoring evidence generator. We believe we are the only OSS project shipping that today.

That is the layer. SIEMs that do not model agent semantics. Infrastructure logs that are agent-blind. AgDR as the open schema. AIR as the OSS reference SDK that produces it.

Why this matters

Three forcing functions are converging on the third layer in 2026.

Compliance. The EU AI Act came into force in stages through 2025 and 2026. Article 72 specifically obligates providers of high-risk AI systems to maintain a post-market monitoring system that documents incidents, behavioral changes, and corrective actions. NIST AI RMF organizes its MEASURE and MANAGE functions around evidence that an AI system's behavior is being monitored, documented, and acted on. SOC 2 AI controls, in the form they are starting to take in 2026 audits, ask for the same thing in different language: "show me the records." The records have to come from somewhere.

Insurance. Cyber insurance carriers underwriting AI workloads are starting to ask reconstructibility questions during renewal. "If your agent caused damage at 2:14 AM, can you produce a record that holds up to subrogation?" The answer "we have a LangSmith trace" is being followed up by "is the trace signed?" If the answer is no, the carrier either prices the risk higher or excludes the workload from the policy.

Litigation. When an agent causes damage to a customer or counterparty, the question of who decided what, when, with what authorization, becomes a legal question. Signed records are the difference between "we have logs that suggest the agent did X" and "here is tamper-evident evidence, verifiable by anyone holding our public key, that the agent did X at time T under authorization scope Y." One of those is a story. The other is a fact pattern in a legal sense.

These three are not theoretical. They are showing up in renewal questionnaires, audit findings, and regulator letters today. The teams building production agents in 2026 will have answers ready or be caught flat-footed when the first incident lands.

How the layers compose

The three layers are not competing. They compose.

A mature stack runs all three:

Prevention sits in front, blocking the easy 80% before they become incidents.
Observability runs continuously, telling ops the system is healthy and triaging the live anomalies prevention let through.
Forensics records every agent action in a signed, tamper-evident form, so when prevention misses something and observability flags it late, there is a record that holds up to scrutiny.

In conversations with teams running agents in production, most have layers 1 and 2 in some form. Layer 3, for agent semantics specifically, is the one most teams have not built. Some have nothing. Some have application logs they assume will hold up if needed. A small number have built bespoke signing on top of their observability, which is good but expensive to maintain.

Project AIR is our contribution to making the third layer cheap to adopt. MIT license. Pip install. Sixty-second demo. Drop the callback into your LangChain agent or wrap your OpenAI client and you are emitting signed records.

What we shipped

projectair 0.3.2 is live on PyPI under MIT license. Ten detectors covering the full OWASP Top 10 for Agentic Applications shipped in 0.3.0 on April 22, alongside the Article 72 evidence generator. 0.3.1 added a LlamaIndex integration. 0.3.2 added the Google Gemini SDK and Google ADK integrations.

pip install projectair
air demo

Sixty seconds, end to end. You see signed Intent Capsules emitted as the demo agent runs, the chain verified, and a forensic report generated. From there, drop the callback into your real agent: one line for LangChain, a wrap for OpenAI, Anthropic, LlamaIndex, Gemini, or ADK, and you are recording.

The schema is AgDR-compatible, with credit to accountability.ai and Mahmoud Mohamed Anwar (me2resh). The reference implementation is open source under MIT, open contribution, open standard.

Try it

pip install projectair
air demo

Source: github.com/vindicara-inc/projectair. Issues, PRs, and security disclosures welcome. If you are evaluating for a regulated workload, the Article 72 generator is in the OSS package today.

By industry

By use case

By framework

Highlights

Company

Resources

Community