Perspective 6 min read

Implementing Trustworthy Agents: A Forensic Evidence Layer for Production

Anthropic's April 9 paper on trustworthy agents names three places the ecosystem must step up. Project AIR is our answer to evidence sharing and open standards, and a concrete contribution to a problem no single company can solve alone.

Kevin Minn, Founder, Vindicara | April 24, 2026

On April 9, Anthropic published Trustworthy Agents in Practice. It is the most honest thing I have read from a frontier lab about where agent security actually stands.

Two lines from the paper have been sitting with me for two weeks:

"The security and reliability of agents cannot be achieved by any single company working alone."

"This is the kind of infrastructure no single company can build alone."

That is not marketing. That is an admission. And it is a direct invitation to the rest of us building in this space.

What is missing today is not another prevention layer. It is a way to answer, after an incident: what did the agent actually do, and can you prove it?

Anthropic names three places where the ecosystem needs to step up: shared benchmarks, evidence sharing, and open standards. Project AIR is our answer to the second one, and a down payment on the third.

The four components, and where the gap is

The paper identifies four components that determine how any agent behaves: the model, the harness (instructions and guardrails), the tools it can call, and the environment it runs in. Anthropic is upfront that most industry conversation centers on the model, "and understandably so," but that agent behavior depends on all four layers working together.

That is where the gap is.

The model layer has Anthropic, OpenAI, Google. The harness layer has LangChain, LlamaIndex, CrewAI. The tools layer has MCP, which Anthropic created and donated to the Linux Foundation. The environment layer has every cloud provider on earth.

What none of those layers produce is a signed, classified, exportable record of what the agent actually did when something went wrong.

That is the gap AIR fills.

What AIR is

AIR stands for Agent Incident Response. It ships as three surfaces sharing one evidence chain:

air: the CLI. MIT-licensed. Ingests any agent trace, runs all 10 OWASP ASI detectors plus 3 OWASP LLM categories plus an AIR-native chain-integrity check, and produces a signed forensic timeline. pip install projectair, then air trace my-app.log.

airsdk: the Python SDK. MIT-licensed. Drop-in LangChain callback handler. Every agent decision written as an AgDR record (AI Decision Record), with BLAKE3 content hashing, Ed25519 signatures, and UUIDv7 ordering, forward-chained for tamper evidence.

AIR Cloud: hosted incident response. Real-time dashboards, SIEM integrations, compliance exports, insurance-ready evidence packs. Coming soon.

The SDK integration is three lines:

from airsdk import AIRCallbackHandler

handler = AIRCallbackHandler(key="...")
agent = AgentExecutor(callbacks=[handler])

Every tool call, every environment interaction, every refusal or acceptance gets hashed into a forward-chain. The chain is cryptographically verifiable. The classification is based on an open, shared taxonomy. The exports are in formats that legal teams, SOC analysts, and insurance underwriters already accept.

Why this matters for the four-layer framework

Anthropic's paper is specific about where the model layer can and cannot help. Prompt injection, they write, has "no single line of defense" that is sufficient. They train the model, monitor traffic, red-team their systems. And still, as the paper puts it: "even together, these safeguards are not a guarantee."

That is honest. And it means when something does go wrong, the response layer matters as much as the prevention layer.

AIR operates across the tools and environment layers. When an agent calls a tool it should not have had access to, AIR signs and classifies that call. When an environment transitions, a new MCP server registered, a permission escalated, a file written to a sensitive path, AIR records the transition. When the agent completes a task or fails one, AIR produces evidence you can verify.

In the framework's language: AIR is how the transparency principle gets made real after the fact. You can claim transparency. Or you can prove it.

The forensic evidence layer is open source today

air and airsdk are MIT-licensed. pip install projectair and the signed chain starts now.

View on GitHub How AIR works

Why OWASP ASI, not a vendor taxonomy

The classification layer matters almost as much as the signing layer. A signed blob of data nobody can interpret is just encrypted noise.

AIR classifies every recorded event against the OWASP Top 10 for Agentic Applications 2026 (ASI01 through ASI10), plus three OWASP LLM categories (LLM01, LLM04, LLM06) and one AIR-native chain-integrity check.

OWASP ASI is an open, shared taxonomy, vendor-neutral, and already on the radar of every security team paying attention. We did not invent a taxonomy because there is no reason to fragment the field further.

The same reasoning Anthropic used when they donated MCP to the Linux Foundation applies here: open protocols let security properties be designed in once, rather than patched together one deployment at a time. Open protocols also keep competition focused on the quality and safety of the agent, rather than on who controls the integrations.

Real incidents, real mappings

Every public agent breach in the last eighteen months maps to an ASI signature. ForcedLeak (Salesforce Agentforce) was ASI01: goal hijack via indirect prompt injection in trusted CRM records. The Salesloft Drift breach was ASI03: inherited OAuth credentials reused to escalate access into systems the operator never authorised the agent to reach. GitHub Copilot YOLO mode was ASI02: tool misuse through auto-approved destructive shell calls. ServiceNow Now Assist was ASI01 + ASI03: indirect injection from user-supplied ticket fields driving the agent into actions outside its authorised scope.

Each of those incidents left behind fragmented, unsigned traces scattered across logs. None of them produced a single evidence bundle a legal team, SOC analyst, or insurance carrier could act on without weeks of reconstruction.

The incidents table on vindicara.io walks through what AIR's detection signatures would have caught at the step the breach actually happened. Every mapping is against the OWASP 2026 taxonomy.

What we are doing, and what we are asking for

projectair ships on PyPI today. The MIT SDK and CLI are live. The design partner program for AIR Cloud opens May 4: three production LangChain deployments, sixty days of feedback, preferred pricing in return.

But the larger ask is for the ecosystem. Anthropic's paper names the three gaps: benchmarks, evidence sharing, open standards.

AIR contributes to evidence sharing and open standards. We would like to work with labs, standards bodies, and infrastructure providers on making the evidence format interoperable, so an incident detected in a LangChain agent on AWS produces the same AgDR record as one detected in a CrewAI agent on Azure, verifiable by any downstream consumer.

If you are running agents in production, try it. pip install projectair and run air demo. You will see exactly what your current tooling does not capture.

The paper is right. This is not infrastructure one company can build alone.

Build the evidence layer with us.

pip install projectair to start. Open issues, PRs, and threat-model contributions welcome on GitHub.

View on GitHub How AIR works

Kevin Minn is the founder of Vindicara. AIR is MIT-licensed at vindicara.io. The CLI and SDK are on PyPI: pip install projectair. Source at github.com/vindicara-inc/projectair.

References

Anthropic, Trustworthy Agents in Practice (April 9, 2026): anthropic.com/research/trustworthy-agents
OWASP Top 10 for Agentic Applications 2026 (ASI01 through ASI10)
Model Context Protocol (Linux Foundation Agentic AI Foundation): modelcontextprotocol.io

Quickstart

Run your first air trace in 5 minutes

Compliance

EU AI Act Article 72: A Developer's Guide

LinkedIn X Facebook Reddit HN Email