Save traces of model executions for debugging and distillation

An eight-factor AI application maintains comprehensive traces of all model interactions and process executions. A trace captures the complete context, execution path, and results of each operation. Traces should be structured, searchable, and suitable for both debugging and improvement.

Traces in an eight-factor app serve multiple purposes:

  • Debugging complex interactions
  • Performance optimization
  • Cost monitoring
  • Quality assurance
  • Training data collection
  • Process improvement

A complete trace includes:

  • Input context and prompts used
  • Model responses and parameters
  • Tool calls and results
  • Reasoning steps taken
  • Resource usage metrics
  • Timing information
  • Final outcomes

Bad practice - minimal or unstructured logging:

def process_request(request):
    logging.info(f"Processing request: {request}")
    result = model.generate(prompt)
    logging.info(f"Got result: {result}")
    return result

Good practice - structured tracing:

class Tracer:
    def __init__(self, storage: TraceStorage):
        self.storage = storage
        self.current_trace = None
    
    @contextmanager
    def trace(self, operation_type: str, metadata: dict = None):
        trace = Trace(
            operation_type=operation_type,
            metadata=metadata,
            timestamp=datetime.now()
        )
        
        try:
            self.current_trace = trace
            yield trace
            trace.status = "success"
        except Exception as e:
            trace.status = "error"
            trace.error = str(e)
            raise
        finally:
            trace.end_timestamp = datetime.now()
            self.storage.store(trace)

class ModelInteractionTracer(Tracer):
    async def traced_generate(
        self, 
        prompt: str, 
        context: Context,
        **params
    ) -> TraceResult:
        with self.trace("model_generation") as trace:
            trace.record_input(prompt, context, params)
            
            result = await self.model.generate(prompt, **params)
            
            trace.record_output(
                result,
                token_usage=self.count_tokens(prompt, result),
                latency=trace.duration
            )
            
            return TraceResult(result, trace)

Traces should be structured hierarchically:

class WorkflowTrace(Trace):
    def __init__(self):
        self.steps = []
        self.current_step = None
    
    @contextmanager
    def trace_step(self, step: Step):
        step_trace = StepTrace(step)
        self.current_step = step_trace
        self.steps.append(step_trace)
        
        try:
            yield step_trace
        finally:
            step_trace.complete()
    
    def to_timeline(self) -> Timeline:
        return Timeline([
            TimelineEvent(
                step=step.name,
                start=step.start_time,
                end=step.end_time,
                details=step.details
            )
            for step in self.steps
        ])

This approach enables:

  • Process visualization
  • Performance analysis
  • Cost attribution
  • Quality monitoring
  • Continuous improvement

Traces should be treated as valuable data:

  • Stored in a queryable format
  • Retained according to policy
  • Protected for privacy
  • Indexed for search
  • Analyzed for patterns

Applications should support trace analysis:

class TraceAnalyzer:
    def analyze_traces(
        self, 
        timeframe: TimeRange, 
        filters: Dict[str, any]
    ) -> Analysis:
        traces = self.storage.query(timeframe, filters)
        
        return Analysis(
            performance=self.analyze_performance(traces),
            costs=self.analyze_costs(traces),
            quality=self.analyze_quality(traces),
            patterns=self.detect_patterns(traces),
            anomalies=self.detect_anomalies(traces)
        )
    
    def analyze_performance(self, traces: List[Trace]) -> Metrics:
        return Metrics({
            "p50_latency": self.percentile(traces, "duration", 50),
            "p95_latency": self.percentile(traces, "duration", 95),
            "token_usage": self.average(traces, "tokens_used"),
            "success_rate": self.success_rate(traces)
        })

This pattern enables:

  • Performance optimization
  • Cost optimization
  • Quality improvement
  • Pattern discovery
  • Anomaly detection

Trace implementation should optimize for:

  • Completeness - capturing all relevant data
  • Structure - organized for analysis
  • Accessibility - easy to query
  • Privacy - protecting sensitive data
  • Efficiency - managing storage costs

Each trace should capture:

  • Full execution context
  • All model interactions
  • Resource usage
  • Timing information
  • Process decisions
  • Final outcomes

Traces can also feed back into other factors:

  • Examples - identifying good cases for the example suite
  • Workflows - optimizing process flows
  • Reasoning - improving decision patterns
  • Context - refining context selection
  • Tools - measuring tool effectiveness