Overview - AC2

Manta is the Python read layer for Kestrel. It queries traces and spans from ClickHouse and wraps them in typed context objects that make it easy to work with conversations, tool calls, grader results, and more.

What Manta provides

Querying

MantaService is the entry point for loading data from ClickHouse. You can query traces by run, phase, step, or datapoint — and load their spans with filtering by kind, role, tool name, and more.

from manta.data import MantaService

service = MantaService()

traces = service.load_traces(run_id="run_xxx", phase="eval")
spans = service.load_spans_for_traces(
    trace_ids=[t.trace_id for t in traces],
    span_kinds=["message", "tool"],
)

Typed context objects

Raw trace and span rows are wrapped in context objects with typed accessors. TraceContext is the main one — it gives you methods like messages(), tools(), grader_result(), and conversation() instead of parsing raw span data yourself.

from manta import TraceContext

ctx = TraceContext(trace_row, span_loader=lambda tid: spans[tid])

ctx.messages(role="assistant")   # list[MessageData]
ctx.tools(name="bash")           # list[ToolData]
ctx.grader_result()              # GraderResultData | None
ctx.conversation()               # list[dict] — full conversation
ctx.failed_tools()               # list[ToolData]
ctx.has_errors()                 # bool

Analysis pipelines

For batch analysis across a full run, Manta provides a MapReduce pipeline that runs a function per trace, aggregates results up the entity hierarchy, and optionally persists computed metadata back to ClickHouse.

from manta import MapReduceComputation, EntityLevel

pipeline = (
    MapReduceComputation("My analysis", project_id="prj_xxx")
    .add_level(EntityLevel.TRACE, map_fn, TraceMetrics)
    .add_level(EntityLevel.DATAPOINT, reduce_fn, DatapointMetrics)
)
result = pipeline.run(run_id="run_xxx")

Core concepts

Entity hierarchy

Traces are organized in a hierarchy:

Level	What it represents
Trace	A single agent rollout — one execution of a datapoint
Datapoint	A problem instance or sample (may have multiple traces across steps)
Step	A training or eval step within a run
Run	A complete training or evaluation run

Span kinds

Each trace contains spans — individual events that happened during the rollout:

Kind	What it captures
`message`	A conversation message (user, assistant, system, tool)
`tool`	A tool call with arguments, result, and error status
`grader`	Grading results, criteria, and grader LLM messages
`llm`	Raw LLM invocations with token counts and cost
`system`	Lifecycle events (setup, teardown, etc.)

Trace properties

Each trace carries summary data from ClickHouse:

Property	Type	Description
`trace_id`	`str`	Unique trace ID
`run_id`	`str`	Parent run
`step_number`	`int`	Training/eval step
`phase`	`str`	`"train"`, `"eval"`, or `"prod"`
`datapoint_id`	`str`	Parent datapoint
`score`	`float \| None`	Grader score
`status`	`str`	`"completed"`, `"error"`, etc.
`cost_usd`	`float`	Total cost
`duration_ms`	`int`	Trace duration
`turn_count`	`int`	Number of turns
`tokens_input`	`int`	Input tokens
`tokens_output`	`int`	Output tokens
`model_name`	`str`	Model used
`env_name`	`str`	Environment name
`meta`	`dict`	Trace metadata

Caribou

Manta

​What Manta provides

​Querying

​Typed context objects

​Analysis pipelines

​Core concepts

​Entity hierarchy

​Span kinds

​Trace properties