Adding an LLM provider¶
A "provider" is anything that implements kuroi.providers.base.Provider.
This guide walks through adding one, using the existing Ollama provider
as a reference.
The Provider Protocol¶
class Provider(Protocol):
name: str # e.g. "anthropic"
model: str # e.g. "claude-opus-4-7"
def detect_redactions(
self,
pages: tuple[Page, ...],
llm_category_ids: tuple[str, ...],
*,
instructions: tuple[str, ...] = (),
seed: int | None = None,
attempt: int = 0,
layout_aware: bool = False,
model: str | None = None,
) -> tuple[list[Finding], list[ChunkRecord]]: ...
Two attributes for identification, one method that takes word-indexed
pages and returns a flat list of Findings plus per-chunk audit
records. Three keyword-only parameters are advisory — providers may
ignore them when their backend doesn't support the feature:
instructions— natural-language redaction instructions from the user (-i/--instruct). Append to the prompt; on Ollama, the chunker has already decomposed multi-rule instructions into atomic sub-rules.attempt— zero-based retry index. Use it to scale per-call budgets (e.g. Ollama gives slow models more time on each retry).layout_aware— whenTrue, wrap the prompt with PyMuPDF block boundaries so the model sees paragraph structure.model— per-call override forself.model, used by the chunker when a category in the rule pack carries its ownmodel:field.
Step 1: Write the client¶
Create src/kuroi/providers/<name>.py. Use providers/ollama.py as a
template — read it end-to-end first; it's about 150 lines and
demonstrates every required pattern.
Skeleton:
from __future__ import annotations
import hashlib
import json
import time
from typing import Any
import httpx
from kuroi.core.audit_records import ChunkRecord
from kuroi.core.findings import Finding
from kuroi.core.pdf import Page, serialize_for_llm
from kuroi.providers._shared import parse_findings_payload
SYSTEM_PROMPT = "..." # static system prompt
OUTPUT_SCHEMA_HINT = "..." # JSON schema the model must emit
def build_user_prompt(
pages: tuple[Page, ...],
llm_category_ids: tuple[str, ...],
) -> str:
"""Construct the user-message body sent to the model."""
doc = serialize_for_llm(pages)
cats = ", ".join(llm_category_ids) if llm_category_ids else "(none)"
return (
f"Active LLM categories: {cats}\n\n"
f"Output schema: {OUTPUT_SCHEMA_HINT}\n\n"
f"<document>\n{doc}\n</document>"
)
class MyProvider:
name = "myprovider"
def __init__(
self,
*,
model: str,
client: Any | None = None,
) -> None:
self.model = model
self._client = client or httpx.Client(...)
def detect_redactions(
self,
pages: tuple[Page, ...],
llm_category_ids: tuple[str, ...],
*,
instructions: tuple[str, ...] = (),
seed: int | None = None,
attempt: int = 0,
layout_aware: bool = False,
model: str | None = None,
) -> tuple[list[Finding], list[ChunkRecord]]:
if not llm_category_ids:
return [], []
user_prompt = build_user_prompt(pages, llm_category_ids)
prompt_sha = hashlib.sha256(user_prompt.encode("utf-8")).hexdigest()
started = time.monotonic()
# ... call your model, get a JSON string back as `content` ...
duration_ms = int((time.monotonic() - started) * 1000)
response_sha = hashlib.sha256(content.encode("utf-8")).hexdigest()
chunk = ChunkRecord(
chunk_idx=0,
pages=tuple(p.number for p in pages),
temperature=0.0,
seed_requested=seed,
seed_honored=seed is not None, # True if your API honored it
system_fingerprint=None, # or whatever the API exposes
prompt_sha256=prompt_sha,
response_sha256=response_sha,
tokens_in=...,
tokens_out=...,
cache_creation_input_tokens=0, # populate if your provider returns
cache_read_input_tokens=0, # cache-write/read counters
duration_ms=duration_ms,
)
try:
payload = json.loads(content)
except json.JSONDecodeError:
return [], [chunk]
if not isinstance(payload, dict):
return [], [chunk]
return (
parse_findings_payload(payload, pages, source="llm"),
[chunk],
)
Use providers/_shared.py:parse_findings_payload — it validates
indices and confidences, drops hallucinated entries, and produces
canonical Findings. Don't roll your own parser.
Step 2: Register in the factory¶
Edit src/kuroi/providers/factory.py to dispatch your provider name:
def make_provider(config: Config) -> Provider:
if config.provider == "anthropic":
return AnthropicProvider(model=config.model)
if config.provider == "claude-cli":
return ClaudeCliProvider(
model=config.model,
cli_path=config.claude_cli_path,
timeout_s=config.claude_cli_timeout_s,
)
if config.provider == "ollama":
return OllamaProvider(model=config.model, url=config.ollama_url)
if config.provider == "myprovider":
return MyProvider(model=config.model)
raise ValueError(f"Unknown provider: {config.provider!r}")
Then widen ProviderName and VALID_PROVIDERS in
src/kuroi/core/config.py:
ProviderName = Literal["anthropic", "claude-cli", "ollama", "myprovider"]
VALID_PROVIDERS: tuple[ProviderName, ...] = (
"anthropic", "claude-cli", "ollama", "myprovider",
)
If your provider needs new config keys (a base URL, a model alias),
follow the [ollama] table pattern in core/config.py: read the
value via _read_string(file_data, "myprovider.url"), add a CLI flag
in cli/run.py, and wire it through ConfigOverrides.
Step 3: Add pricing¶
If your provider charges per token, add an entry to
src/kuroi/data/pricing.json keyed by the model id (the same string
returned by kuroi models). The required shape is:
"my-model-id": {
"input_per_million": 1.00,
"output_per_million": 5.00,
"cache_write_multiplier": 1.25,
"cache_read_multiplier": 0.1
}
Free / local providers can leave the file unchanged — estimate_cost
already handles missing rates by returning $0.00. To refresh a cached
copy, ship a new pricing.json and run
kuroi config refresh-pricing --from path/to/pricing.json.
Step 4: Tests¶
Write tests/providers/test_<name>.py using
tests/providers/test_ollama.py as the template. The pattern: a small
stub class with a post() method that satisfies the httpx.Client
interface, seeded with a canned JSON response. Assert that
detect_redactions emits the expected Findings, that the
ChunkRecord records prompt and response SHAs, and that error paths
(HTTP errors, JSON-decode errors, timeouts) return [], [chunk] (or
[], [] when there are no LLM categories) with no findings.
Step 5: Update the docs¶
- Add a row to the provider comparison table in
docs/user-guide/providers.md. - Add a
=== "MyProvider"tab to the "Configure a provider" section. - Document any new config keys.
The CLI reference and rule-schema reference update automatically when
you re-run make docs-gen (which regenerates them from the live Typer
app and dataclasses). make docs-build then verifies the strict
zensical build.
Worked example: Ollama¶
The shipping Ollama provider (src/kuroi/providers/ollama.py, ~150
lines) is the canonical reference. It shows how to:
- Build the system + user prompts with
<document>tags that defang prompt-injection in extracted text. - Wrap an
httpx.Clientwith the right connect/read timeouts. - Translate Ollama's
/api/chatenvelope (prompt_eval_count,eval_count) into the canonicalChunkRecordtoken counts. - Hash prompts and responses with SHA-256 so the audit log captures reproducibility evidence.
- Fail closed: any HTTP, JSON, or schema error returns
([], [chunk])without crashing the run.