Skip to content

kuroi.providers.base

base

The Provider Protocol every LLM client implements.

Provider

Bases: Protocol

Interface kuroi uses to talk to any LLM, cloud or local.

detect_redactions

detect_redactions(pages: tuple[Page, ...], llm_category_ids: tuple[str, ...], *, instructions: tuple[str, ...] = (), seed: int | None = None, attempt: int = 0, layout_aware: bool = False, model: str | None = None) -> tuple[list[Finding], list[ChunkRecord]]

Identify spans to redact across the supplied pages.

Parameters:

Name Type Description Default
pages tuple[Page, ...]

Word-indexed pages produced by the PDF extractor.

required
llm_category_ids tuple[str, ...]

Category ids the provider is responsible for.

required
instructions tuple[str, ...]

Free-text redaction instructions from the user.

()
seed int | None

Optional sampling seed for reproducible runs.

None
attempt int

Zero-based retry index from the chunker. Providers may use it to scale per-call timeouts (e.g. Ollama gives slow models more time on each retry).

0
layout_aware bool

When True, wrap the prompt with PyMuPDF block boundaries () so the model sees paragraph and other layout-detected structure. Default False.

False
model str | None

If set, dispatch this call against the named model instead of the provider instance's configured self.model. Used by the chunker for per-category routing. Default None (use instance default).

None

Returns:

Type Description
list[Finding]

A tuple of (findings, chunk_records) — findings are the proposed

list[ChunkRecord]

redactions, chunk_records audit the prompts and raw responses for

tuple[list[Finding], list[ChunkRecord]]

each chunk dispatched to the model.