Regression Detection¶
Overview¶
F12 (bijotel regression) watches your chain for statistical drift.
It catches:
- Model degradation (responses suddenly shorter / longer / more expensive)
- Provider config changes (silent model swaps, token-counting changes)
- Cost anomalies (a rogue agent burning budget overnight)
- Token-usage drift (prompt template change cascading through downstream)
Without making LLM calls — it operates on your existing chain data.
CLI¶
# One-shot regression check on the last 100 entries
bijotel regression --db chain.db --window 100
# Set thresholds
bijotel regression --db chain.db --window 100 --z-threshold 3.0
Output is a JSON-friendly summary: dimensions checked, anomaly counts, sample seqs.
Cron pattern¶
# Hourly regression sweep, log to file
0 * * * * /usr/local/bin/bijotel regression --db /data/chain.db --window 200 >> /var/log/bijotel/regression.log 2>&1
GENA runs this hourly in production.
REST API¶
# Latest result (cached, fast)
curl http://localhost:8080/api/regression/latest
# History
curl http://localhost:8080/api/regression/history?limit=50
# Run on demand
curl -X POST http://localhost:8080/api/regression/run \
-H "Content-Type: application/json" \
-d '{"window": 100}'
Dimensions tracked¶
| Dimension | What drift indicates |
|---|---|
input_tokens |
Prompt template change |
output_tokens |
Model verbosity drift, truncation, refusal patterns |
cost_usd |
Provider price change, model swap, runaway agent |
latency_ms |
Provider degradation, network issues |
Detection methods¶
Two statistical tests, both reported:
- Z-score: anomaly if
|z| > 3.0(configurable). Sensitive to outliers, fast. - IQR: anomaly if value beyond
Q1 - 1.5·IQRorQ3 + 1.5·IQR. Robust to a few extreme outliers.
A dimension is flagged when either method fires.
Reading the results¶
{
"ran_at": "2026-05-25T08:00:00Z",
"window": 100,
"dimensions": {
"input_tokens": {
"mean": 412.3,
"stdev": 89.1,
"anomalies_z": [],
"anomalies_iqr": []
},
"output_tokens": {
"mean": 285.4,
"stdev": 142.0,
"anomalies_z": [
{"seq": 5847, "value": 4096, "z": 26.83}
],
"anomalies_iqr": [
{"seq": 5847, "value": 4096}
]
}
},
"total_anomalies": 1
}
In this example, seq 5847 hit max_tokens (4096) — likely a runaway
loop or a prompt that elicited the full token budget. The signal lets
the operator investigate that specific entry via the dashboard.
Real production catch¶
On 2026-05-23 GENA's regression cron flagged a span where Sonnet
returned 3,847 output tokens against a 90-token historical mean — a
38× spike. Inspecting via bijotel inspect --seq 5234 showed the
prompt had asked for "a complete report" without max_tokens,
resulting in a $0.08 burn on a single call. The signal preceded the
billing alert by 4 hours.