kuroi.providers.base¶

base ¶

The Provider Protocol every LLM client implements.

Provider ¶

Bases: Protocol

Interface kuroi uses to talk to any LLM, cloud or local.

detect_redactions ¶

detect_redactions(pages: tuple[Page, ...], llm_category_ids: tuple[str, ...], *, instructions: tuple[str, ...] = (), seed: int | None = None, attempt: int = 0, layout_aware: bool = False, model: str | None = None) -> tuple[list[Finding], list[ChunkRecord]]

Identify spans to redact across the supplied pages.

Parameters:

Name	Type	Description	Default
`pages`	`tuple[Page, ...]`	Word-indexed pages produced by the PDF extractor.	required
`llm_category_ids`	`tuple[str, ...]`	Category ids the provider is responsible for.	required
`instructions`	`tuple[str, ...]`	Free-text redaction instructions from the user.	`()`
`seed`	`int \| None`	Optional sampling seed for reproducible runs.	`None`
`attempt`	`int`	Zero-based retry index from the chunker. Providers may use it to scale per-call timeouts (e.g. Ollama gives slow models more time on each retry).	`0`
`layout_aware`	`bool`	When True, wrap the prompt with PyMuPDF block boundaries (…) so the model sees paragraph and other layout-detected structure. Default False.	`False`
`model`	`str \| None`	If set, dispatch this call against the named model instead of the provider instance's configured `self.model`. Used by the chunker for per-category routing. Default None (use instance default).	`None`

Returns:

Type	Description
`list[Finding]`	A tuple of `(findings, chunk_records)` — findings are the proposed
`list[ChunkRecord]`	redactions, chunk_records audit the prompts and raw responses for
`tuple[list[Finding], list[ChunkRecord]]`	each chunk dispatched to the model.