Skip to content

Audit, diff, undo & backups

Every kuroi run is reversible and auditable. This page covers the four commands you need to inspect, verify, restore, and clean up.

See what changed: kuroi diff

kuroi diff takes the original and the redacted PDF and prints, per page, the bounding box and before-text snippet of every redaction:

$ kuroi diff report.pdf report.redacted.pdf
Page 3: 2 redactions
  - [40,120,180,138]  'j.doe@example.com'
  - [200,400,310,418]  '+33 6 12 34 56 78'
Page 7: 1 redaction
  - [60,210,220,230]  'Jane M. Doe'

Use --format json for one JSON record per page (machine-readable), or --format html -o diff.html for a side-by-side view. Pass -o <path> with any format to write to a file instead of stdout.

Re-check a redacted PDF: kuroi verify

verify runs the same regex rules over the redacted output to catch anything that slipped through:

$ kuroi verify report.redacted.pdf
  PASS  report.redacted.pdf: no residual leaks

If it finds anything, the exit code is non-zero and the leaked spans are listed. Wire kuroi verify into your batch pipeline as a gate.

Restore the original: kuroi undo

$ kuroi undo
  Last backup: 2026-04-30T09-31-02Z-a1b2c3
  Will restore: /home/you/work/report.pdf
Restore now? [Y/n]: y
  Restored.

undo takes no positional argument: it restores the most recent backup in the configured backup directory (default $XDG_DATA_HOME/kuroi/backups/, falling back to ~/.local/share/kuroi/backups/ when the env var is unset; override with --backup-dir). The backup itself is retained until it falls outside the retention window.

Pass --no-backup to kuroi run to skip the backup copy entirely. With --in-place the original file becomes unrecoverable, so kuroi prints a warning before proceeding; with -o the source PDF is untouched and no backup is needed for rollback.

List & garbage-collect backups

Each backup is a timestamped subdirectory of the backup root containing a manifest.json and the original PDF.

$ kuroi backups list
  2026-04-28T08-00-01Z-44ab12  49h ago  /home/you/finance/invoice.pdf
  2026-04-29T14-15-22Z-9f0e21  19h ago  /home/you/work/report.pdf
  2026-04-30T09-31-02Z-a1b2c3  0h ago   /home/you/work/report.pdf

Rows are sorted by directory name (oldest first). Each row shows the session directory name, an age in hours, and the original path recorded in the manifest.

Destructive: review before running

kuroi backups gc permanently deletes backups older than --max-age hours (default: 24). There is no preview / dry-run flag yet; run kuroi backups list --root <dir> first to see what would be eligible, or pass --max-age 0 to disable pruning.

$ kuroi backups gc --max-age 24
  Pruned 1 backup.

To change the default retention window globally, edit ~/.config/kuroi/config.toml and set:

[backup]
retention_hours = 168   # one week

retention_hours = 0 keeps every backup (legal-hold mode).

Where audit logs live

Each run writes a JSONL record to:

~/.local/share/kuroi/audit/<timestamp>.jsonl

The directory is configurable with kuroi run --audit-dir <path>; the filename is the same timestamp string used for the matching backup (e.g. 2026-04-30T09-31-02Z-a1b2c3.jsonl). Each file contains a session_start header, one finding line per applied redaction, and a session_end footer with the verification status. The dataclasses for each line live in src/kuroi/core/audit_records.py.

Set include_text = true under [audit] in ~/.config/kuroi/config.toml to capture matched snippets in the audit log (off by default — opt-in because audit logs themselves can be sensitive):

[audit]
include_text = true

Prompt-cache token fields

Each chunk record carries two cache-related token counters:

  • cache_creation_input_tokens — input tokens written to Anthropic's prompt cache on this call. These are billed at the cache-write rate (1.25× input by default) but mean subsequent matching prompts pay the cheaper read rate.
  • cache_read_input_tokens — input tokens served from cache on this call. These are billed at the cache-read rate (0.1× input by default). A growing share of cache reads across a run is a sign caching is working — it should reduce total cost on repetitive batches.

Cache fields are zero for providers that do not support prompt caching (Ollama, Claude CLI). They are populated by the Anthropic provider when the request structure permits caching.

Next steps