Skip to main content
Caribou is an OTEL-based SDK for logging agent interactions. It records messages, tool calls, grading results, and lifecycle events as structured traces and spans, exporting them to ClickHouse with optional Langfuse integration.

Quick start

import caribou
from caribou import Message, ToolExecution

caribou.configure(run_id="my-project", phase="prod")

with caribou.trace_run({"datapoint_id": "req_123", "phase": "prod"}) as trace_id:
    caribou.log(Message(role="user", content="Hello"))
    caribou.log(Message(role="assistant", content="Hi!", model="gpt-4o", input_tokens=10, output_tokens=5))
    caribou.set_status(caribou.RolloutStatus.COMPLETED)

caribou.flush()

Installation

Within the monorepo

uv pip install -e caribou

External (CodeArtifact)

With pip:
aws codeartifact login --tool pip --domain applied-compute --repository pypi-store --region us-east-1
pip install caribou
With uv, add to .env:
UV_INDEX_CODEARTIFACT_USERNAME=aws
UV_INDEX_CODEARTIFACT_PASSWORD=<token-from-aws-codeartifact-get-authorization-token>
And to pyproject.toml:
[project]
dependencies = ["caribou>=0.1.0"]

[[tool.uv.index]]
name = "codeartifact"
url = "https://applied-compute-324441770720.d.codeartifact.us-east-2.amazonaws.com/pypi/pypi-store/simple/"

[tool.uv.sources]
caribou = { index = "codeartifact" }

Lifecycle

FunctionDescription
configure(run_id, phase, ...)Initialize tracing. Safe to call multiple times. Auto-configures from env vars.
trace_run(meta)Context manager for a rollout. Creates root span, writes trace summary on exit. Returns trace_id.
flush(timeout=5.0)Flush pending spans to exporters.
shutdown()Flush and release resources. Call at process exit.

trace_run metadata

KeyRequiredDescription
datapoint_idYesInput identifier
phaseYes"train", "eval", or "prod"
run_idNoKestrel run ID
step_numberNoTraining/eval step (omit for prod)
env_nameNoEnvironment name
model_nameNoModel identifier
grader_nameNoGrader name (eval/train only)
trace_idNoManual trace ID for multi-request continuity

Metadata and status

caribou.add_meta({"tool_call_count": 5, "lines_changed": 42})
caribou.add_tags("ablation_v2", "dataset_v2")
caribou.set_status(caribou.RolloutStatus.COMPLETED)
caribou.set_error("Timeout after 30s")
FunctionDescription
add_meta(data)Add trace-level metadata (stored in entity_metadata table)
add_tags(*tags)Add filterable string tags
set_status(status, error_message)Set rollout outcome
set_error(message)Mark current span as errored
Available statuses: PENDING, COMPLETED, TRUNCATED, ABORTED, ENV_SETUP_ERROR, STEP_ERROR, GRADER_TIMEOUT, GRADER_ERROR, TOOL_ERROR, ERROR

Turn index management

Turn and intra-turn indices are managed automatically by log(Message(...)), but can be controlled manually:
FunctionDescription
increment_turn()Advance to next turn (resets intra to 0)
increment_intra()Advance intra-turn index
set_turn_index(n)Set turn index directly
set_intra_index(n)Set intra index directly

Eval/train vs production

Eval / trainProduction
Phase"eval" or "train""prod"
Run IDUnique per eval run (from Kestrel)Must equal project name
Step numberGroups traces into training stepsNot used
Kestrel registrationRuns, run steps, and datapointsOnly datapoints

Environment variables

VariableDescription
CLICKHOUSE_URLClickHouse HTTP URL
CLICKHOUSE_DATABASEClickHouse database name
CLICKHOUSE_USERClickHouse username
CLICKHOUSE_PASSWORDClickHouse password
KESTREL_API_URLKestrel server URL
KESTREL_API_KEYKestrel API key
LANGFUSE_PUBLIC_KEYOptional — Langfuse public key
LANGFUSE_SECRET_KEYOptional — Langfuse secret key
LANGFUSE_HOSTOptional — Langfuse host (default: https://us.cloud.langfuse.com)
CARIBOU_TRACE_DEBUGOptional — enable console span logging

Langfuse integration

Set credentials to enable OTLP export to Langfuse (runs alongside ClickHouse):
export LANGFUSE_PUBLIC_KEY="pk-lf-..."
export LANGFUSE_SECRET_KEY="sk-lf-..."
Messages with LLM metadata automatically include OpenLLMetry semantic conventions (gen_ai.system, gen_ai.request.model, gen_ai.usage.input_tokens, etc.).