Production logging

Production logging uses the same Message, ToolExecution, and Event primitives as eval/train, with a few key differences in how you configure and register traces.

Key differences from eval/train

	Eval / train	Production
Phase	`"eval"` or `"train"`	`"prod"`
Run ID	Unique per eval run (from Kestrel)	Set to the project name
Step number	Groups traces into training steps	Not used
Grading	Score each rollout with `Grade`	Typically not used
Kestrel registration	Runs, run steps, and datapoints	Project only (optionally datapoints)

Setup

Configure Caribou once at application startup. Set run_id to your project name and phase to "prod":

import caribou

PROJECT_NAME = "my-project"
caribou.configure(run_id=PROJECT_NAME, phase="prod")

kestrel = caribou.get_kestrel_client()
if kestrel.enabled:
    kestrel.get_or_create_project(name=PROJECT_NAME)

You don’t create runs or run steps in production — all traces are grouped under the project name.

Per-request tracing

Wrap each incoming request in trace_run. Use a request ID or session ID as the datapoint_id:

from caribou import Message, ToolExecution

async def handle_request(request_id: str, user_input: str):
    with caribou.trace_run({
        "datapoint_id": request_id,
        "phase": "prod",
        "model_name": "gpt-4o",
    }) as trace_id:
        caribou.log(Message(role="user", content=user_input))

        completion = await get_completion(user_input)
        caribou.log(Message(
            role="assistant",
            content=completion["content"],
            tool_calls=completion.get("tool_calls"),
            model="gpt-4o",
            provider="openai",
            input_tokens=completion["usage"]["input"],
            output_tokens=completion["usage"]["output"],
            cost_usd=completion["usage"]["cost"],
        ))

        if completion.get("tool_calls"):
            for tc in completion["tool_calls"]:
                result = await execute_tool(tc)
                caribou.log(ToolExecution(
                    name=tc["function"]["name"],
                    call_id=tc["id"],
                    arguments=tc["function"]["arguments"],
                    result=result,
                    requestor="agent",
                ))

        caribou.set_status(caribou.RolloutStatus.COMPLETED)

    caribou.flush()

Multi-turn sessions

For multi-turn conversations that span multiple requests, pass a consistent trace_id in the metadata to maintain continuity:

with caribou.trace_run({
    "datapoint_id": session_id,
    "phase": "prod",
    "trace_id": f"trc_{session_id}",
    "model_name": "gpt-4o",
}) as trace_id:
    # ...

Metadata and tags

Attach metadata and tags the same way as eval/train. These are searchable in Kestrel.

with caribou.trace_run({...}) as trace_id:
    # ... agent execution ...
    caribou.add_meta({"customer_id": "cust_123", "latency_ms": 450})
    caribou.add_tags("api_v2", "premium_tier")

Error handling

Exceptions inside trace_run are automatically captured — the trace status is set to ERROR and the traceback is recorded. You can also set errors explicitly:

with caribou.trace_run({...}) as trace_id:
    try:
        result = await risky_operation()
    except TimeoutError:
        caribou.set_status(caribou.RolloutStatus.TRUNCATED)
        caribou.set_error("Operation timed out after 30s")

Shutdown

Call shutdown() on application exit to flush remaining spans:

caribou.shutdown()

Complete example

import caribou
from caribou import Message, ToolExecution

PROJECT_NAME = "my-agent"
caribou.configure(run_id=PROJECT_NAME, phase="prod")

kestrel = caribou.get_kestrel_client()
if kestrel.enabled:
    kestrel.get_or_create_project(name=PROJECT_NAME)


async def handle_request(request_id: str, messages: list[dict]):
    with caribou.trace_run({
        "datapoint_id": request_id,
        "phase": "prod",
        "model_name": "gpt-4o",
    }) as trace_id:
        for msg in messages:
            caribou.log(Message(role=msg["role"], content=msg["content"]))

        completion = await get_completion(messages)
        caribou.log(Message(
            role="assistant",
            content=completion["content"],
            model="gpt-4o",
            input_tokens=completion["usage"]["input"],
            output_tokens=completion["usage"]["output"],
            cost_usd=completion["usage"]["cost"],
        ))

        caribou.add_meta({"request_id": request_id})
        caribou.set_status(caribou.RolloutStatus.COMPLETED)

    caribou.flush()

Caribou

Manta

​Key differences from eval/train

​Setup

​Per-request tracing

​Multi-turn sessions

​Metadata and tags

​Error handling

​Shutdown

​Complete example