Building MCP servers that survive a regulator's audit

Most LLM tutorials demonstrate one thing: how to call a model. How to wire a tool. How to get a response. Almost none demonstrate how to make that call survive an audit. When an LLM executes a SQL query in a regulated workflow, the tool call is the action. It touches customer data. It produces output that lands in a report someone signs off on. If a regulator asks what happened three months later, "the model probably called the right function" is not an answer.

The gap between a working MCP server and an auditable one is four primitives that most implementations skip. auditguard-mcp is a reference implementation that ships all four. This post walks through what they are, why they are not optional, and how they compose into a seven-step pipeline that produces a regulator-readable audit trail.

The architectural question

In regulated workflows, LLM tool use is not a convenience feature. It is an operational boundary. When a model decides to run sql_query against a customer database, it crosses from reasoning into action. That crossing point is where compliance lives.

Most MCP servers treat tool calls as transparent passthroughs. The LLM asks for SQL. The server runs SQL. The LLM gets results. This is fast. It is also un-auditable. If the query returned customer SSNs to an intern, there is no record of who saw what, under which policy, with which PII redactions applied. The tool executed. Nothing else happened.

The fix is not a bolt-on. The fix is a pipeline that gates every tool call through four primitives before execution and two more after. The primitives are RBAC, PII detection, policy enforcement, and structured audit logging.

The four primitives are not features. They are the entry condition. If a request cannot pass all four, the tool never runs.

Four primitives

1. RBAC. The cheapest denial path.

Role-based access control answers one question before anything else: is this actor allowed to call this tool at all? The check is O(1) set membership. For SQL queries, it also parses the query AST and validates that every table and column is in the role's allowlist. For API calls, it checks restricted fields. If any check fails, the pipeline halts before PII scanning, before policy, before execution. Fail-closed.

Three roles ship in the reference implementation: intern (blocked from everything), analyst (allowed tables and columns minus SSN/account_number), compliance_officer (all columns, strict policy). The role-to-permissions mapping is a Pydantic model, not a YAML file. Type errors surface at import time, not at request time. SQL queries are parsed with sqlglot to extract table and column references from the AST. Regex-based SQL parsing would miss aliases, subqueries, and CTEs.

2. PII safety. Detection, not regex.

Most production PII detection is regex-based. Regex catches SSN patterns and email addresses. It misses contextual PII. A sentence like "the Henderson trust's primary contact" contains no syntactic PII pattern, but a token classifier identifies "Henderson" as a private_person span. Regex does not.

auditguard-mcp uses OpenAI's Privacy Filter, a 1.5B-parameter sparse mixture-of-experts with 128 experts and top-4 routing (50M active parameters per inference). Released April 22, 2026.

"1.5B total parameters (50M active), runs in a browser or laptop. Built for on-premises deployment." Source: Privacy Filter model card, April 2026

It runs locally on CPU. No data sent to any API. Supports 8 PII categories using constrained Viterbi decoding over a BIOES tagging scheme to extract coherent spans rather than independent token predictions. Robust BIOES span decoding maps token predictions back to exact character offsets in the original text. This is the unglamorous part most integrations skip and the part that determines whether detections are correct or off-by-several-characters.

3. Policy enforcement. Six actions, config-driven.

Once PII spans are detected, the policy engine decides what to do with each one. Six actions are available:

ALLOW: pass through unchanged. REDACT: replace with [category]. HASH: replace with [category:sha256-first-8]. Preserves identity consistency for correlation. VAULT: store the raw text in a vault file, replace with a UUID reference. REVIEW: leave intact but flag for human review, writing to a review queue. BLOCK: halt the request immediately and raise a PolicyViolation.

The engine processes detections in reverse offset order to preserve character positions during mutation. It returns a SanitizedInput with the mutated text plus a structured record of every mutation applied. Not just a redacted string. A decision trail. These six actions map directly to regulatory requirements: REDACT and HASH implement data minimization. VAULT implements the right to audit with access controls. REVIEW implements human-in-the-loop for high-risk decisions. BLOCK implements the principle that some data must never transit an LLM context.

4. Structured audit logging. Hashes, not raw data.

The audit log is an append-only JSONL file. One line per request. Every record contains the actor identity, tool name, SHA-256 hashes of raw input and raw output, inbound and outbound PII detections with raw text stripped, every policy decision applied, the policy version and model version in use, latency, and terminal status.

The log never contains raw PII. Detections record category, offset, and confidence. The raw text that was detected is hashed. A regulator can verify that the pipeline ran, inspect every decision, and confirm the policy version in effect at the time of the request. They cannot reconstruct the customer's SSN from the audit log. That is the point.

The seven-step pipeline

Every tool call flows through _run_pipeline_v2() in server.py, which routes to the async or Temporal backend based on the AUDITGUARD_BACKEND environment variable. Both backends call the same pure stage functions in pipeline/stages.py. The function is the hub where all four primitives compose, built on FastMCP:

Layer 1: RBAC gate. O(1) tool name check, SQL AST parse, API field validation.

Layer 2: Inbound PII scan. Privacy Filter on the raw query.

Layer 3: Inbound policy. Apply actions to detected PII spans.

Layer 4: Tool dispatch. Execute with timeout, return raw output.

Layer 5: Outbound PII scan. Privacy Filter on canonical JSON result.

Layer 6: Outbound policy. Apply actions to output detections.

Layer 7: Audit logger. Write the structured JSONL record.

The ordering is deliberate. Cheapest checks first. RBAC denies without touching the PII model. The PII model runs before the tool, so a blocked secret never reaches execution. The audit logger fires last, in a finally-equivalent path, so it captures timeouts, errors, and blocked requests in addition to successful ones. A failed request that RBAC-denied an intern produces the same audit record shape as a successful request that ran SQL and redacted three account numbers.

The pipeline is shared. Every MCP tool (sql_query, customer_lookup, customer_search) calls the same _run_pipeline_v2() with tool-specific executors. Adding a new tool means writing the executor function and wiring it to the pipeline. The RBAC, PII, policy, and audit layers are inherited automatically.

Two policy philosophies, one code path

The repo ships two bundled policies that demonstrate how one detection pipeline serves opposite compliance philosophies. The policy is a config object. No code changes required.

permissive_analyst prioritizes data usability. Person names are replaced with [private_person:abc12345] using a SHA-256 short hash. An analyst can still GROUP BY that hash to correlate records belonging to the same entity across tables without ever seeing the entity's name. Account numbers are redacted. Secrets are vaulted. The policy trusts the analyst to do their job but leaves no raw PII in query results.

strict_financial prioritizes absolute privacy. Person names are replaced with [private_person]. No hash. No correlation possible. Account numbers, addresses, emails, phones, and URLs are all redacted. Even statistical correlation across records is blocked. The same detection pipeline, the same engine, the same audit logger. Only the config differs.

The assignment is role-based. The ANALYST role maps to permissive_analyst. The COMPLIANCE_OFFICER role maps to strict_financial. The role-to-permissions mapping lives in rbac.py. The role-to-PolicyMode mapping lives in _policy_mode_for_role() in server.py. The mode-to-config resolution lives in _get_policy_config() in stages.py. Three layers of indirection, each testable in isolation.

What Privacy Filter gets wrong

Shipping an honest limitation is more useful than hiding it. Privacy Filter sometimes over-redacts public entities. If a query returns a transaction counterparty named "Bennett Group," the model may tag it as three separate private_person spans. The company name is public. The detection is aggressive.

We also observe phone number false positives on numeric financial values. A balance like 496959.67 is flagged as private_phone because the digit sequence resembles a phone number pattern. The demo ships a post-detection numeric guard that checks whether a phone detection falls on a purely numeric value inside a JSON number context and suppresses the redaction when the span is unlikely to be a real phone number. This guard lives in policy.py under _is_numeric_json_value().

The design philosophy: detection is a primitive, not a pipeline. The model detects. The policy layer decides. If your use case requires suppressing company name false positives, the correct place is a post-detection filter, not model fine-tuning. Suppress the span. Document the suppression. Let the audit log record that a detection was overridden and why.

Detection is a primitive, not a pipeline. The model's job is to flag spans. The policy layer's job is to decide what to do with them. Conflating the two, trying to make the model smarter about what not to detect, creates a system that is harder to audit and impossible to tune per use case.

The audit trail

A single request produces a single JSONL record. Here is what it contains and what it omits.

What it contains: the actor (role, user_id, session_id), SHA-256 hashes of raw input and raw output, inbound and outbound detections with raw text replaced by category placeholders, every policy decision applied with action and reason, the policy version string, the Privacy Filter model version, latency in milliseconds, and terminal status (success, rbac_denied, blocked, timeout, error, review_queued).

What it omits: raw PII values. The actual SSN. The full account number. The customer name that was detected. Those exist in the vault file if the policy action was VAULT, and the vault file is access-controlled separately. The audit log proves the pipeline ran correctly without becoming a secondary PII store.

This is what a regulator wants to see. An ISO 42001 AI management system audit or a SOC 2 Type II examination asks the same question: can you prove, with immutable records, that every access followed the stated policy and that no protected data leaked? The audit log answers that question.

From async to durable: the Temporal backend

The default async backend (async_runner.py) runs all seven stages in a single Python process. It is fast and zero-dependency. It has one structural limitation: if the process crashes after Stage 4 (tool execution) but before Stage 7 (audit logging), the request and its audit record are lost. The client must retry from scratch.

Temporal solves this. The temporal_runner.py backend wraps each pipeline stage as a Temporal activity with its own retry policy, heartbeat timeout, and error semantics. If the process crashes, Temporal resumes from the last completed activity. No stage runs twice that completed successfully. The audit log gets written even if the worker restarts mid-pipeline.

The tradeoffs are explicit:

	Async (default)	Temporal
Latency overhead	None	~50-100ms per stage
Durability	Lost on crash	Resumes from last completed stage
Human-in-the-loop	No	24-hour signal timeout
Retry per stage	No	Independent policies (RBAC: 3 attempts, Audit: 20)
Operational cost	Zero	Temporal cluster and worker process

The REVIEW action is synchronous in the async backend. In the Temporal backend, it becomes a durable wait. The workflow emits a signal, pauses for up to 24 hours, and resumes when a human reviewer approves or denies the request. This is the production shape of human-in-the-loop compliance.

The Temporal backend also handles ActivityError unwrapping for RBAC denial (so a denied request produces a clean RBACDenied status, not a stack trace), heartbeating for long model inference on CPU, and independent retry policies per stage. The audit logger gets 20 retry attempts because an audit record must never be lost. RBAC gets 3 attempts because denial is fast and idempotent. The worker pre-loads the Privacy Filter model before registering activities, so the first inference does not hit the 60-second activity timeout during model load.

Both backends call the same stage functions in pipeline/stages.py. The architecture is ports-and-adapters. Stage logic is pure and side-effect-free. The adapter chooses async or Temporal via the AUDITGUARD_BACKEND env var. Adding a third backend (AWS Step Functions, Celery, Airflow) is roughly 100 lines of adapter code, not a rewrite.

Temporal tests use an in-memory server. No Docker needed for CI.

What this is not

auditguard-mcp is a reference implementation, not a production-hardened service. Three gaps are documented explicitly:

The REVIEW action is synchronous in the async backend. The Temporal backend makes it a durable wait (see "From async to durable" above), but the async path still returns inline.
The vault is a local JSONL file. In production, replace it with AWS KMS or HashiCorp Vault. The vault entry model is already structured for a key management service.
The transport is stdio, which assumes a trusted local client. For SSE transport, add mTLS client certificate validation to strictly identify the actor.

The repo also does not include an async review queue, streaming architecture, or multi-node scaling. It is the minimum viable pipeline that proves the four primitives compose correctly. 108 tests cover unit, integration, and pipeline stages. The 15-case golden-set eval harness confirms: 100% pass on RBAC accuracy, status accuracy, inbound PII detection, and audit completeness.

The source is at github.com/ree2raz/auditguard-mcp (Apache 2.0). A live demo runs at auditguard.rituraj.info. If you are building MCP servers for regulated workflows, issues and PRs are welcome. If the four-primitive pattern maps to something you are building and you want to trade notes, I am at ree2raz@proton.me.

References

OpenAI Privacy Filter model card. Model architecture, BIOES decoding, operating point calibration.
ISO 42001:2023 - AI management systems. The standard for auditable AI governance.
SOC 2 Type II. Service organization controls for security, availability, and confidentiality.
FastMCP. The MCP framework the server is built on.
BIOES tagging scheme. The span labeling scheme used by Privacy Filter's constrained Viterbi decoder.
sqlglot. SQL parser used for RBAC column-level AST validation.
Temporal. Durable workflow orchestration for the production backend.