Skip to content

Troubleshooting

Common failures and how to fix them. Run kuroi doctor first — it checks most of these in one go.

kuroi doctor output

$ kuroi doctor
  kuroi version 0.1.0                                                        ok
  Python version 3.12.4                                                      ok
  Anthropic API key                ANTHROPIC_API_KEY is not set; cloud redaction is unavailable  PROBLEM
  tesseract                        tesseract not found in PATH (optional for v0.1)               warn
  qpdf                             /usr/bin/qpdf                                                 ok
  Provider                         anthropic                                                     ok
  Model                            claude-opus-4-7                                               ok

One or more checks failed. See messages above.

Each line shows a label, a detail message, and a status (ok, warn, or PROBLEM). A PROBLEM exits non-zero; warn is informational. Fix the highest-listed PROBLEM first.

"ANTHROPIC_API_KEY not set"

$ export ANTHROPIC_API_KEY=sk-ant-...

Persist by adding the export to your shell rc file. Or switch providers:

  • Claude CLI (no API key, uses your Claude Code subscription) — see Claude CLI errors below for setup quirks.
  • Ollama (local, offline) — see LLM providers.

"File is locked" / LockHeldError

kuroi creates an advisory lock file alongside each redacted output to prevent two runs from racing on the same destination. The lockfile path is the output path with .kuroi.lock appended. For an output written to document.redacted.pdf, the lock is document.redacted.pdf.kuroi.lock.

If a previous run crashed without releasing the lock, the next run will fail with a message that names the stray file. Delete it manually:

$ rm path/to/document.redacted.pdf.kuroi.lock

Stale locks are intentionally not auto-removed: they're a real signal of a prior crash, and the user clears them deliberately.

Provider rate limits

If you see 429 Too Many Requests from Anthropic, wait a moment before retrying. kuroi run processes one PDF per invocation, so when you're batching in a shell loop, sleep between iterations or break the input list into smaller chunks.

Claude CLI errors

The claude-cli provider shells out to the claude binary (bundled by claude-agent-sdk, or the @anthropic-ai/claude-code npm install). Three failure modes are common:

"claude binary not found"

$ kuroi run report.pdf --provider claude-cli
ConfigError: claude CLI not found

The bundled binary is missing or outside PATH. Either:

$ pip install --upgrade claude-agent-sdk
# or, if you installed kuroi with pipx:
$ pipx inject kuroi claude-agent-sdk --force
# or:
$ npm install -g @anthropic-ai/claude-code

If you have a system install elsewhere, point at it:

$ kuroi run report.pdf --provider claude-cli \
    --claude-cli-path /opt/claude/bin/claude

"Not authenticated" / login required

Run the one-time login flow:

$ claude /login

This stores OAuth credentials in your home directory. kuroi never sees the credential — it only invokes the binary.

ANTHROPIC_API_KEY shadowing your subscription

If you set ANTHROPIC_API_KEY and select --provider claude-cli, the claude binary will silently prefer per-token API billing over your subscription. kuroi prints a warning at startup:

warning: ANTHROPIC_API_KEY is set; claude-cli will bill against the API
key, not your subscription. Unset the variable to force subscription
billing.

Unset the variable in the calling shell to force subscription billing:

$ unset ANTHROPIC_API_KEY
$ kuroi run report.pdf --provider claude-cli

Per-call timeout

Long documents may exceed the default 300-second per-call budget. Raise it with --claude-cli-timeout, or persist via:

[claude_cli]
timeout_s = 600

"PDF is too large to extract"

Very large PDFs (hundreds of megabytes / thousands of pages) can exceed PyMuPDF's word-extraction limits. Split the file at the shell level first, then redact each part:

$ qpdf --split-pages large.pdf parts.pdf
$ for part in parts*.pdf; do kuroi run "$part" -o "${part%.pdf}.redacted.pdf"; done

Re-merge the redacted parts with qpdf --empty --pages parts*.redacted.pdf -- merged.pdf.

"Verification gate failed"

kuroi run calls kuroi verify on its own output before writing it; if the verifier finds residual sensitive text, the run aborts and nothing is written to the destination. To reproduce a problematic run with a fixed seed and full request/response logs:

$ kuroi -vv run --seed 42 input.pdf -o input.redacted.pdf

-v/-vv are top-level flags on the kuroi command and must come before the subcommand. --seed is a flag on kuroi run (not on kuroi verify); verify takes only the PDF path. Then re-run kuroi verify <output> to see exactly which spans the verifier flagged.

Where logs live

  • Audit records (per finding): ~/.local/share/kuroi/audit/<timestamp>.jsonl — one JSONL file per run. Configurable with kuroi run --audit-dir <path>.
  • Backups (full PDFs): ~/.local/share/kuroi/backups/<timestamp>/<filename>.pdf (or $XDG_DATA_HOME/kuroi/backups/... when the env var is set) — one timestamped subdirectory per run. Configurable with kuroi run --backup-dir <path> (and kuroi undo --backup-dir <path>). Skip the backup copy with kuroi run --no-backup.
  • Config: ~/.config/kuroi/config.toml (or $XDG_CONFIG_HOME/kuroi/config.toml).

-v (info) and -vv (debug) on any command print a runtime trace to stderr.

Still stuck?

Open an issue at github.com/ICIJ/kuroi/issues with:

  1. The command you ran.
  2. kuroi --version and kuroi doctor output.
  3. The relevant audit JSONL excerpt (with include_text = false under [audit] so you don't leak the data you're trying to redact).

Transient provider failures

If kuroi run aborts with Batch N (pages X–Y) failed K times and was aborted. (where K is max_retries + 1), the orchestrator hit a hard provider failure (HTTP error, timeout, or malformed response) on every attempt. Two knobs help:

  • Slow Ollama / large model: raise --max-retries and/or --retry-backoff to give the daemon more time, e.g. --max-retries 5 --retry-backoff 5. For a long-term setting, pin it in ~/.config/kuroi/config.toml:
[retry]
max_retries = 5
backoff = 5.0
  • Fail fast in CI: --max-retries 0 disables retry so transient errors surface immediately instead of waiting through backoff.

Run with -v to see per-attempt logs (retrying batch 3/12 (pages 9–12) in 4s (attempt 2/3)) and the upstream WARNINGs that triggered the retry.

"Could not be processed even after subdividing"

Hitting a BatchError: page N (M words) could not be processed even after subdividing to the minimum chunk size? The active model can't handle even a small slice of that page.

  • Try a model with a larger context window (Anthropic Sonnet/Opus, larger Ollama models).
  • For Ollama, raise --max-retries to give a slow deployment more per-attempt wall clock — the timeout grows linearly with attempt number (120s × (attempt + 1)), so --max-retries 5 extends the per-attempt budget to 720s.