log() function accepts typed dataclasses for structured logging. Each type creates a specific span in the trace.
Message
Log conversation messages. Assistant messages can include LLM metadata.grader_name field:
| Field | Type | Description |
|---|---|---|
role | str | "user", "assistant", or "system" |
content | str | list[dict] | None | Text or OpenAI-style multimodal content |
tool_calls | list[dict] | None | Tool calls made by assistant |
tool_call_id | str | None | Tool call ID (for tool responses) |
name | str | None | Function name (for tool responses) |
model | str | None | LLM model name |
provider | str | None | LLM provider |
input_tokens | int | None | Input token count |
output_tokens | int | None | Output token count |
cost_usd | float | None | Cost in USD |
duration_ms | int | None | Duration in milliseconds |
grader_name | str | None | Grader name (for grader LLM calls) |
prompt | str | None | Grader prompt summary |
Span naming
| Scenario | span_kind | span_name |
|---|---|---|
| Regular message | message | message.{role} |
Message with grader_name | grader.{grader_name} | grader.{grader_name} |
ToolExecution
Log tool calls with arguments, results, and error status.| Field | Type | Description |
|---|---|---|
name | str | Tool name |
call_id | str | None | Tool call ID |
arguments | dict | str | None | Tool input |
result | str | list[dict] | None | Tool result (text or multimodal) |
error | str | None | Error message if failed |
requestor | str | None | "agent", "user", or "system" |
duration_ms | int | None | Duration in milliseconds |
tool.{name} span.
Grade
Log grading results with scores, reasoning, and per-criterion breakdowns.| Field | Type | Description |
|---|---|---|
score | float | Score (0.0 to 1.0) |
grader_type | str | None | Grader class name |
reasoning | str | None | Reasoning explanation |
criteria | list[dict] | None | Individual criteria results |
passed_count | int | None | Number of passed criteria |
total_count | int | None | Total criteria count |
grader or grader.{grader_type} span. Each criterion in criteria creates a child grader.criterion span.
Event
Log lifecycle events with optional details.| Field | Type | Description |
|---|---|---|
name | str | Event name |
details | dict | None | Additional event details |
event.{name} span.
Multimodal content
Caribou processes base64 images in message content, tool results, tool arguments, grader artifacts, and event details. Images are uploaded to S3 and replaced withcaribou-image:// URLs, which Kestrel renders automatically.
Use OpenAI-style content parts: