Troubleshooting¶
Common failures and how to fix them. Run kuroi doctor first — it checks
most of these in one go.
kuroi doctor output¶
$ kuroi doctor
kuroi version 0.1.0 ok
Python version 3.12.4 ok
Anthropic API key ANTHROPIC_API_KEY is not set; cloud redaction is unavailable PROBLEM
tesseract tesseract not found in PATH (optional for v0.1) warn
qpdf /usr/bin/qpdf ok
Provider anthropic ok
Model claude-opus-4-7 ok
One or more checks failed. See messages above.
Each line shows a label, a detail message, and a status (ok, warn,
or PROBLEM). A PROBLEM exits non-zero; warn is informational. Fix
the highest-listed PROBLEM first.
"ANTHROPIC_API_KEY not set"¶
Persist by adding the export to your shell rc file. Or switch providers:
- Claude CLI (no API key, uses your Claude Code subscription) — see Claude CLI errors below for setup quirks.
- Ollama (local, offline) — see LLM providers.
"File is locked" / LockHeldError¶
kuroi creates an advisory lock file alongside each redacted output to
prevent two runs from racing on the same destination. The lockfile path
is the output path with .kuroi.lock appended. For an output written to
document.redacted.pdf, the lock is document.redacted.pdf.kuroi.lock.
If a previous run crashed without releasing the lock, the next run will fail with a message that names the stray file. Delete it manually:
Stale locks are intentionally not auto-removed: they're a real signal of a prior crash, and the user clears them deliberately.
Provider rate limits¶
If you see 429 Too Many Requests from Anthropic, wait a moment before
retrying. kuroi run processes one PDF per invocation, so when you're
batching in a shell loop, sleep between iterations or break the input
list into smaller chunks.
Claude CLI errors¶
The claude-cli provider shells out to the claude binary (bundled by
claude-agent-sdk, or the @anthropic-ai/claude-code npm install). Three
failure modes are common:
"claude binary not found"¶
The bundled binary is missing or outside PATH. Either:
$ pip install --upgrade claude-agent-sdk
# or, if you installed kuroi with pipx:
$ pipx inject kuroi claude-agent-sdk --force
# or:
$ npm install -g @anthropic-ai/claude-code
If you have a system install elsewhere, point at it:
"Not authenticated" / login required¶
Run the one-time login flow:
This stores OAuth credentials in your home directory. kuroi never sees the credential — it only invokes the binary.
ANTHROPIC_API_KEY shadowing your subscription¶
If you set ANTHROPIC_API_KEY and select --provider claude-cli, the
claude binary will silently prefer per-token API billing over your
subscription. kuroi prints a warning at startup:
warning: ANTHROPIC_API_KEY is set; claude-cli will bill against the API
key, not your subscription. Unset the variable to force subscription
billing.
Unset the variable in the calling shell to force subscription billing:
Per-call timeout¶
Long documents may exceed the default 300-second per-call budget. Raise
it with --claude-cli-timeout, or persist via:
"PDF is too large to extract"¶
Very large PDFs (hundreds of megabytes / thousands of pages) can exceed PyMuPDF's word-extraction limits. Split the file at the shell level first, then redact each part:
$ qpdf --split-pages large.pdf parts.pdf
$ for part in parts*.pdf; do kuroi run "$part" -o "${part%.pdf}.redacted.pdf"; done
Re-merge the redacted parts with
qpdf --empty --pages parts*.redacted.pdf -- merged.pdf.
"Verification gate failed"¶
kuroi run calls kuroi verify on its own output before writing it; if
the verifier finds residual sensitive text, the run aborts and nothing is
written to the destination. To reproduce a problematic run with a fixed
seed and full request/response logs:
-v/-vv are top-level flags on the kuroi command and must come
before the subcommand. --seed is a flag on kuroi run (not on
kuroi verify); verify takes only the PDF path. Then re-run
kuroi verify <output> to see exactly which spans the verifier flagged.
Where logs live¶
- Audit records (per finding):
~/.local/share/kuroi/audit/<timestamp>.jsonl— one JSONL file per run. Configurable withkuroi run --audit-dir <path>. - Backups (full PDFs):
~/.local/share/kuroi/backups/<timestamp>/<filename>.pdf(or$XDG_DATA_HOME/kuroi/backups/...when the env var is set) — one timestamped subdirectory per run. Configurable withkuroi run --backup-dir <path>(andkuroi undo --backup-dir <path>). Skip the backup copy withkuroi run --no-backup. - Config:
~/.config/kuroi/config.toml(or$XDG_CONFIG_HOME/kuroi/config.toml).
-v (info) and -vv (debug) on any command print a runtime trace to
stderr.
Still stuck?¶
Open an issue at github.com/ICIJ/kuroi/issues with:
- The command you ran.
kuroi --versionandkuroi doctoroutput.- The relevant audit JSONL excerpt (with
include_text = falseunder[audit]so you don't leak the data you're trying to redact).
Transient provider failures¶
If kuroi run aborts with Batch N (pages X–Y) failed K times and was
aborted. (where K is max_retries + 1), the orchestrator hit a hard
provider failure (HTTP error, timeout, or malformed response) on every
attempt. Two knobs help:
- Slow Ollama / large model: raise
--max-retriesand/or--retry-backoffto give the daemon more time, e.g.--max-retries 5 --retry-backoff 5. For a long-term setting, pin it in~/.config/kuroi/config.toml:
- Fail fast in CI:
--max-retries 0disables retry so transient errors surface immediately instead of waiting through backoff.
Run with -v to see per-attempt logs (retrying batch 3/12 (pages 9–12)
in 4s (attempt 2/3)) and the upstream WARNINGs that triggered the
retry.
"Could not be processed even after subdividing"¶
Hitting a BatchError: page N (M words) could not be processed even
after subdividing to the minimum chunk size? The active model can't
handle even a small slice of that page.
- Try a model with a larger context window (Anthropic Sonnet/Opus, larger Ollama models).
- For Ollama, raise
--max-retriesto give a slow deployment more per-attempt wall clock — the timeout grows linearly with attempt number (120s × (attempt + 1)), so--max-retries 5extends the per-attempt budget to 720s.