Skip to main content
Manta is the Python read layer for Kestrel. It queries traces and spans from ClickHouse and wraps them in typed context objects that make it easy to work with conversations, tool calls, grader results, and more.

What Manta provides

Querying

MantaService is the entry point for loading data from ClickHouse. You can query traces by run, phase, step, or datapoint — and load their spans with filtering by kind, role, tool name, and more.
from manta.data import MantaService

service = MantaService()

traces = service.load_traces(run_id="run_xxx", phase="eval")
spans = service.load_spans_for_traces(
    trace_ids=[t.trace_id for t in traces],
    span_kinds=["message", "tool"],
)

Typed context objects

Raw trace and span rows are wrapped in context objects with typed accessors. TraceContext is the main one — it gives you methods like messages(), tools(), grader_result(), and conversation() instead of parsing raw span data yourself.
from manta import TraceContext

ctx = TraceContext(trace_row, span_loader=lambda tid: spans[tid])

ctx.messages(role="assistant")   # list[MessageData]
ctx.tools(name="bash")           # list[ToolData]
ctx.grader_result()              # GraderResultData | None
ctx.conversation()               # list[dict] — full conversation
ctx.failed_tools()               # list[ToolData]
ctx.has_errors()                 # bool

Analysis pipelines

For batch analysis across a full run, Manta provides a MapReduce pipeline that runs a function per trace, aggregates results up the entity hierarchy, and optionally persists computed metadata back to ClickHouse.
from manta import MapReduceComputation, EntityLevel

pipeline = (
    MapReduceComputation("My analysis", project_id="prj_xxx")
    .add_level(EntityLevel.TRACE, map_fn, TraceMetrics)
    .add_level(EntityLevel.DATAPOINT, reduce_fn, DatapointMetrics)
)
result = pipeline.run(run_id="run_xxx")

Core concepts

Entity hierarchy

Traces are organized in a hierarchy:
LevelWhat it represents
TraceA single agent rollout — one execution of a datapoint
DatapointA problem instance or sample (may have multiple traces across steps)
StepA training or eval step within a run
RunA complete training or evaluation run

Span kinds

Each trace contains spans — individual events that happened during the rollout:
KindWhat it captures
messageA conversation message (user, assistant, system, tool)
toolA tool call with arguments, result, and error status
graderGrading results, criteria, and grader LLM messages
llmRaw LLM invocations with token counts and cost
systemLifecycle events (setup, teardown, etc.)

Trace properties

Each trace carries summary data from ClickHouse:
PropertyTypeDescription
trace_idstrUnique trace ID
run_idstrParent run
step_numberintTraining/eval step
phasestr"train", "eval", or "prod"
datapoint_idstrParent datapoint
scorefloat | NoneGrader score
statusstr"completed", "error", etc.
cost_usdfloatTotal cost
duration_msintTrace duration
turn_countintNumber of turns
tokens_inputintInput tokens
tokens_outputintOutput tokens
model_namestrModel used
env_namestrEnvironment name
metadictTrace metadata