OCX 26 Experience
OC for AI

AI-Powered Trace Analysis

Adding MCP to TMLL

Matthew Khouzam

Ericsson Research · Open Community Experience 2026

Agenda

  • Tracing, Trace Compass & TMLL
  • What is MCP & why it matters
  • Performance, cost & privacy
  • How to start: create a CLI, wrap it in MCP
  • Debug with Theia & MCP Inspector
  • See results in Kiro, Theia & Gemini
  • Key takeaways

Who Am I?

  • Matthew Khouzam, Principal Researcher & Open Source Developer, Ericsson, Montréal
  • 16 years in FOSS: Eclipse Trace Compass, Linux tracing, TMLL
  • University collaborations (Polytechnique Montréal, Concordia)
  • Former (and future ;) ) board of directors member, CoC commitee member, former chair of eCDT.

I have the privilege of being paid by Ericsson to make the world a better place through open source.

Tracing & Trace Compass

Context before the code

What is Tracing?

  • Recording timestamped events from a running system, kernel, userspace, network
  • Non-intrusive: nanosecond overhead, always-on capable
  • Produces massive datasets: a few minutes → gigabytes of events

What is Trace Compass?

  • Open-source Eclipse project for trace visualization & analysis
  • Supports LTTng, CTF, perf, ftrace, and more
  • Powerful, but requires expertise to use effectively
  • Trace Server Protocol (TSP) exposes analysis via REST API

Why Add MCP?

  • Users shouldn't need to be tracing experts to get insights
  • "Find CPU anomalies in this trace" is easier than navigating 20 views
  • AI handles the how, the user focuses on the what

Personal motivation: make trace analysis accessible to everyone, not just the experts who built it

What's New in Trace Compass

  • CTF2 support, JSON-based trace metadata, human-readable, AI-parseable
  • Opens the door to smarter AI-driven analysis
  • AI can read trace structure without binary parsing

JSON metadata means AI agents can understand trace schemas natively

TMLL, The ML Layer

  • Anomaly detection, iforest, z-score, IQR, moving average, combined
  • Change point analysis, single, z-score, voting, PCA
  • Correlation analysis, Pearson, Spearman, Kendall
  • Memory leak detection, idle resources, capacity planning

Python ML library on top of the Trace Server Protocol, powerful, but requires code to use

What is MCP?

Model Context Protocol, the universal adapter between AI and tools

MCP in 30 Seconds

  • Open protocol: JSON-RPC over stdio or HTTP
  • Write a tool once → use from any MCP-compatible agent
  • Adopted by OpenAI, Anthropic, Google, all major IDEs
AI Agent
MCP Server
Your Tool / API

Why MCP Matters for You

Deterministic tools + AI reasoning = best of both worlds

Deterministic Execution

  • AI decides what to call, your tool decides how to execute
  • No hallucinated analysis, real code runs on real data
  • Reproducible results: same input → same output, every time

Save Tokens

  • Don't paste raw data into the prompt, let the tool process it server-side
  • Return summaries, not megabytes of CSV
  • Progressive discovery: only load schemas for tools you actually use (~80% savings)

Improve Your Existing Tools

  • Your CLI already works, MCP makes it AI-accessible
  • No rewrite needed: wrap, don't replace
  • Users who can't write Python can now use your tool via natural language

MCP doesn't replace your tools, it gives them a new audience

Performance & Cost

Why not just feed the trace to the LLM?

Raw Trace → LLM vs MCP + TMLL

ApproachTokensCost (est.)Result
babeltrace → LLM~1B+$1000s per queryHallucinated, non-deterministic
MCP + TMLL~2Kfraction of a centDeterministic ML analysis

A 1 GB kernel trace ≈ billions of text tokens, exceeds every context window

  • LLMs can't reliably do statistics on raw events
  • MCP approach: AI sends one tool call, gets deterministic results back
  • Trace Server + TMLL do the heavy lifting, AI just interprets

MCP Overhead

OperationDirect TMLL APIMCP (via CLI)
Experiment creation~200 ms~800 ms
Anomaly detection~2 s~3 s
Correlation analysis~1.5 s~2.5 s
  • ~1 s overhead per call (subprocess + Python startup)
  • Negligible next to LLM inference latency
  • Future: in-process mode eliminates the cost

Runs Locally, Privacy & Free Compute

  • Trace Compass, TMLL, and the MCP server all run on your machine
  • Your traces never leave the network, critical for regulated, proprietary, or customer data
  • Uses compute you already own, no new cloud bill, no GPU rental
  • The AI only sees the small, aggregated result, not the raw events

We already had the hardware, the server, and the library. MCP just unlocked them.

How to Start

Step 1: Create a CLI

Create a CLI

#!/usr/bin/env python3
"""tmll_cli.py, 12 subcommands wrapping the TMLL library."""
import argparse
from tmll.tmll_client import TMLLClient

def detect_anomalies(args):
    client = TMLLClient(args.host, args.port)
    experiment = get_experiment(client, args.experiment)
    outputs = experiment.find_outputs(keyword=args.keywords, type=['xy'])
    ad = AnomalyDetection(client, experiment, outputs)
    result = ad.find_anomalies(method=args.method)
    print(f"Found {total} anomalies across {len(result.anomalies)} outputs")

# ... 11 more subcommands ...

Standard argparse, nothing MCP-specific yet

Step 2: Wrap CLI in MCP

~615 lines of Python total

The MCP Wrapper

from mcp.server.fastmcp import FastMCP
mcp = FastMCP("tmll-cli-mcp-server")

@mcp.tool()
def detect_anomalies(experiment_id: str,
                     keywords: list[str] = None,
                     method: str = None) -> str:
    """Detect anomalies in trace data using ML methods."""
    args = build_args({
        "keywords": ("-k", keywords or ["cpu usage"]),
        "method": ("-m", method or "iforest"),
    })
    return run_cli("anomaly", experiment_id, *args)

Any CLI → MCP tool in 5 lines of glue

The Stack

TMLL, Python ML library for trace analysis
↓ wrapped as
CLI, argparse, 12 commands
↓ wrapped as
MCP Server, FastMCP + subprocess
↓ used by
Any AI agent: Kiro, Gemini, Theia, Goose…

Why MCP → CLI → Library?

  • Separation of concerns, you can access and test your CLI independently, no AI needed
  • MCP is new (2024), put it on tested footing by building on a proven CLI layer
  • One path to maintain, why have two code paths when one works for both humans and AI?

12 Tools Exposed

  • ensure_server
  • create_experiment
  • list_experiments
  • list_outputs
  • fetch_data
  • delete_experiment
  • detect_anomalies
  • detect_memory_leak
  • detect_changepoints
  • analyze_correlation
  • detect_idle_resources
  • plan_capacity

Progressive Discovery

Eager Loading (old)

  • All 12 tool schemas sent upfront
  • ~2,200 tokens consumed before the user says anything

Progressive Discovery (new)

  • Schemas loaded on demand
  • Typical session: 2–3 tools → ~80% savings

FastMCP's progressive discovery means the AI only pays for what it uses

MCP Apps

Rich content returned inline

Images in AI Output

@mcp.tool()
def plot_xy_with_anomalies(experiment_id: str,
                           as_image: bool = True):
    """Detect anomalies and return annotated charts."""
    # ... run analysis, generate matplotlib plot ...
    return Image(data=buf.getvalue(), format="png")

The AI doesn't just describe the anomaly, it shows you the chart

Debugging MCP

Theia IDE & MCP Inspector

Theia, AI Agent History

Theia MCP debugging view

MCP Inspector

MCP Inspector

MCP Inspector, test tools interactively without an AI client

See the Results

One MCP server, every AI client

Kiro IDE

Kiro running TMLL MCP

Eclipse Theia & Gemini CLI

Theia MCP results

Theia, inline anomaly chart

Gemini CLI resource usage

Gemini, 1 GB trace, 85K tokens, 5 tool calls

Key Takeaways

  • ~615 lines of Python to make TMLL AI-accessible
  • Pattern: Library → CLI → MCP works for any tool
  • One MCP server → works across Kiro, Gemini, Theia, Goose
  • Progressive discovery saves ~80% of token overhead
  • Runs locally, your traces stay on your machine, on hardware you already own
  • MCP Apps: charts and images inline in AI responses
  • The bottleneck is never the MCP glue, it's the ML analysis and LLM inference
  • Many MCP tools in the wild are sub-optimal, don't judge the technology by a few bad implementations

MCP is a protocol, not a product. The quality is in your hands.

OC for AI

Thank You

matthew.khouzam@ericsson.com

github.com/eclipse-tmll/tmll/pull/16

Merged last night!

Copyright© Eclipse Foundation AISBL and contributors. Made available under CC-BY-SA 4.0 International.

Back to main page