Blog
-
May 22, 2026
Five numbers that change your LLM self-hosting cost estimate
Running your own LLM inference is a function of five numbers most estimates miss: total parameters (not active), KV cache dtype, throughput bottleneck, replica count, and tensor-parallel topology. I built a calculator to get them right and benchmarked the results.
-
May 3, 2026
Building MCP servers that survive a regulator's audit
Most LLM tutorials show how to call a model. Almost none show how to make that call survive an audit. A seven-step compliance pipeline for MCP tool servers with RBAC, PII detection, policy enforcement, and structured audit logging.
-
Apr 28, 2026
Auditing debt-collection calls against FDCPA with a single LLM call
Manual QA samples 2% of calls. Scrutiny scores 12 FDCPA/Reg F rules in 60 seconds with a dual-path evaluator. One LLM call for semantic rules, deterministic Python for metadata rules. Here is the architecture and where it breaks.
-
Apr 22, 2026
Designing LLM-based rubric graders for high-stakes compliance
Phrase-matching covers 60-75% of call-center QA. The remaining 25-40% is where regulatory exposure lives. A well-designed LLM rubric grader closes the gap to 90%+. Here is the pattern for closing that gap with LLMs without hiding the limitations.
-
Apr 21, 2026
An RL Environment for Regulatory Compliance Auditing: Design Decisions and Baseline Findings
RegTriage is designed to train RL agents to audit contact center transcripts for compliance violations. Baselines across three open-source models show that Gemma 4 31B nearly solves the Hero Agent trap zero-shot. The RL training signal is still there, but it is weaker than we thought.