Safety, cost, and cache
Three pre-send guards — secret scanner, .commitbrief/ guard, cost preflight — plus the local response cache and how to manage it.
CommitBrief runs three guards before any LLM call and keeps a local response cache so re-running a review on an unchanged diff is essentially free.
Secret scanner
Pre-send guard that flags credential-shaped strings in the diff before the provider call. Prevents accidental upload of API keys, private keys, tokens, etc. to a third-party LLM.
What it scans
- The diff — only added lines (
+-prefixed, excluding the+++ b/pathheader). Removed and context lines are skipped — the goal is to catch new leaks, not re-flag history that is already on disk somewhere. - User-authored rules content — any non-default
COMMITBRIEF.mdorOUTPUT.md. These join the system prompt verbatim, so a credential pasted into either file would leak just as surely as one in the diff. Embedded defaults are presumed clean and skipped.
Patterns matched
| Name | Pattern (regex) |
|---|---|
| AWS Access Key | AKIA[0-9A-Z]{16} |
| GitHub Token | gh[pousr]_[A-Za-z0-9]{36,} |
| GitLab Token | glpat-[A-Za-z0-9_-]{20,} |
| Anthropic API Key | sk-ant-[A-Za-z0-9_-]{40,} |
| OpenAI API Key | sk-(?:proj-|live-)?[A-Za-z0-9]{40,} |
| JWT | eyJ[A-Za-z0-9_-]{8,}\.eyJ[A-Za-z0-9_-]{8,}\.[A-Za-z0-9_-]{8,} |
| Stripe Live Key | sk_live_[A-Za-z0-9]{24,} |
| PEM Private Key | -----BEGIN [A-Z ]*PRIVATE KEY----- |
What the user sees
⚠ Possible secrets detected in diff (2 line(s)):
line 42: AWS Access Key
line 87: GitHub Token, OpenAI API Key
Send to LLM anyway? [y/N]:
The matched substring is never printed — only line numbers and pattern names. This keeps the secret out of stderr (and any CI log that captures it) even when the scanner fires.
Behavior matrix
| Context | Outcome |
|---|---|
TTY, no --allow-secrets, no --yes | Prompt; default no → abort. |
TTY, --yes | Prompt still shows. --yes does not bypass the scanner. |
Non-TTY, no --allow-secrets | Abort with Aborted (non-interactive); pass --allow-secrets to override. |
Any context, --allow-secrets | Skip the scanner entirely. |
guard.secret_scan: false in config | Skip the scanner entirely. |
The matched-line list is still printed to stderr in the
--allow-secrets case — you opted in, but you still see what
was flagged.
Disabling
# Per-invocation
commitbrief --staged --allow-secrets
# Per-config (whole binary)
commitbrief config set guard.secret_scan false
The whole-binary opt-out is appropriate if you have a separate secrets-management layer (gitleaks, trufflehog, pre-commit hooks) doing the same job and the prompt is just noise.
Cost preflight
Pre-send guard that estimates the dollar cost of a review and prompts/aborts when it exceeds the configured threshold. Catches “oops, I pasted a 50k-line generated file” before tokens are actually spent.
When it runs
Right after the cache lookup, before the provider call. A cache hit skips the preflight entirely (no provider call, no cost).
How the estimate is computed
estimated_input_tokens = chars / 4 (well-known approximation)
estimated_output_tokens = clamp(input / 4, 200, 1500)
estimated_cost = provider.Pricing(model).Cost(...)
Output tokens are capped at 1500 and floored at 200. Underestimating output matters: a high-priced output model would otherwise systematically hide its actual bill.
Threshold
cost:
warn_threshold_usd: 0.50 # default; <=0 disables
Bump it for scheduled jobs (a CI runner doing 100 reviews/day cares less about a single $1 review):
commitbrief config set cost.warn_threshold_usd 5.0
Behavior matrix
| Estimate | Context | Outcome |
|---|---|---|
<= threshold | any | Silent. Review proceeds. |
> threshold | TTY, no --no-cost-check, no --yes | Prompt; default no → abort. |
> threshold | TTY, --yes | Prompt still shows. --yes does not bypass. |
> threshold | Non-TTY, no --no-cost-check | Abort. |
| any | --no-cost-check | Skip the preflight entirely. |
Providers with zero pricing
Ollama, claude-cli, and gemini-cli return Pricing{} (all
zero). The preflight short-circuits silently — estimated cost
is always 0, never above the threshold. The verbose footer shows
— instead of a dollar figure.
The .commitbrief/ guard
Any diff touching files under .commitbrief/ (excluding the root
COMMITBRIEF.md and .commitbriefignore) prompts before any LLM
call. The rationale: .commitbrief/ files are usually
user-specific (per-repo config, OUTPUT.md template) and committing
them may break other developers’ configurations or leak API keys.
| Context | Outcome |
|---|---|
| TTY | Prompt; n → abort, y → proceed. |
Non-TTY, no --yes | Auto-abort. |
Any, --yes | Proceed without prompting. |
This is the one guard --yes still bypasses (since v0.9.1
the secret scanner and cost preflight have dedicated bypass flags).
Cache
CommitBrief caches LLM responses on disk so re-running a review on an unchanged diff is essentially free.
Cache key
SHA-256 over the concatenation:
diff_text + system_prompt + provider_name + model + lang_code + schema_version
Change any one input and you get a fresh review:
- Editing the diff (staging more files, amending the commit, etc.).
- Editing
COMMITBRIEF.md(system prompt differs). - Switching providers (
--provider gemini). - Switching models (
--model gpt-4o-mini). - Switching locale (
--lang tr).
Location
<repo-root>/.commitbrief/cache/ — one JSON file per cached
review. Per-repo: commitbrief cache clear in one repo does not
touch another repo’s cache. Gitignored automatically.
TTL and disable
cache:
enabled: true # false skips reads + writes entirely
ttl_days: 7 # 0 → DefaultTTL (7 days); cannot be negative
Per-invocation: --no-cache to force a fresh provider call.
Management commands
# Wipe everything for this repo.
commitbrief cache clear
# Bounded cleanup with defaults (keep newest 500 + last 7 days).
commitbrief cache prune
# Trim aggressively.
commitbrief cache prune --keep-last 100 --older-than 1d
# Per-provider/model scope.
commitbrief cache prune --provider anthropic --model claude-opus-4-7
commitbrief cache prune accepts duration units d / w / m
(30 days) / y (365 days). Decimals, negatives, and stdlib
h/m/s shorthand all reject — so off-by-one surprises are
impossible.
On-disk format
{
"version": 1,
"created_at": "2026-05-27T18:29:13Z",
"ttl": 604800,
"key": {
"diff_hash": "sha256:abc123...",
"system_prompt_hash": "sha256:def456...",
"provider": "anthropic",
"model": "claude-opus-4-7",
"lang": "en"
},
"result": {
"content": "<LLM response>",
"format": "json",
"tokens": { "input": 2105, "output": 526, "cached": 0 }
}
}
result.format is json on the happy path, markdown-fallback
when the LLM produced unparseable JSON twice, or plain-text for
CLI-tool-backed providers.
Cache-hit reporting
On a hit, the verbose footer shows Saved: $X (the cost figure
that would have been spent) and tokens are marked as
(local cache hit). The pipeline still rebuilds the prompt (so
the dry-run cache-key matches) but skips the provider call.
See also
- Configuration —
guard.secret_scan,cost.warn_threshold_usd,cache.*fields. - Global flags —
--allow-secrets,--no-cost-check,--no-cache,--yes. - Troubleshooting — when guards fire unexpectedly.