Blog | ree2raz

May 22, 2026 Five numbers that change your LLM self-hosting cost estimate

Running your own LLM inference is a function of five numbers most estimates miss: total parameters (not active), KV cache dtype, throughput bottleneck, replica count, and tensor-parallel topology. I built a calculator to get them right and benchmarked the results.

May 3, 2026 Building MCP servers that survive a regulator's audit

Most LLM tutorials show how to call a model. Almost none show how to make that call survive an audit. A seven-step compliance pipeline for MCP tool servers with RBAC, PII detection, policy enforcement, and structured audit logging.

Apr 28, 2026 Auditing debt-collection calls against FDCPA with a single LLM call

Manual QA samples 2% of calls. Scrutiny scores 12 FDCPA/Reg F rules in 60 seconds with a dual-path evaluator. One LLM call for semantic rules, deterministic Python for metadata rules. Here is the architecture and where it breaks.

Apr 22, 2026 Designing LLM-based rubric graders for high-stakes compliance

Phrase-matching covers 60-75% of call-center QA. The remaining 25-40% is where regulatory exposure lives. A well-designed LLM rubric grader closes the gap to 90%+. Here is the pattern for closing that gap with LLMs without hiding the limitations.

Apr 21, 2026 An RL Environment for Regulatory Compliance Auditing: Design Decisions and Baseline Findings

RegTriage is designed to train RL agents to audit contact center transcripts for compliance violations. Baselines across three open-source models show that Gemma 4 31B nearly solves the Hero Agent trap zero-shot. The RL training signal is still there, but it is weaker than we thought.