Wintermute Framework, Part 4: Cartridges, MCP, and Surgeon
In Part 3 we wired up the AI subsystem. This post is about the capability surface the AI plays with: cartridges (in-process Python plugins whose methods auto-register as AI tools), the MCP runtime that lets us mount external MCP servers like Burp/Maltego/JIRA, and Surgeon (a shipped MCP server for firmware emulation hook generation and AFL++ fuzzing).
By the end of this post we have:
tpm20,jtag, andfirmware_analysiscartridges loaded against the
IoT-camera engagement,- a custom Burp-Suite MCP server registered alongside Surgeon,
- a worked I²C-EEPROM extraction → static analysis chain that hands a
Vulnerabilityback into the operation.
Hardware-side, this post focuses on OpenOCD/JTAG, firmware static analysis, and TPM 2.0 quirks. Bring an HS2 dongle if you’re following along on real silicon.
What a Cartridge Actually Is
A cartridge is a single Python module under wintermute/cartridges/ that
exposes one primary class. The shipped ones are:
| Cartridge module | Primary class | Purpose |
|---|---|---|
wintermute/cartridges/tpm20.py | tpm20 | TPM 2.0 command builder + transport (PCR state, DA lockout, fuzzing). |
wintermute/cartridges/jtag.py | JTAGCartridge | OpenOCD telnet RPC: halt/resume, register/memory read, firmware dump. |
wintermute/cartridges/firmware_analysis.py | FirmwareAnalysisCartridge | Stateless blob analysis: entropy, secrets, strings, basefind. |
CartridgeManager (wintermute/cartridges/manager.py) is a singleton. On
load("name") it imports the module, finds the primary class (by exact-name
match, Cartridge suffix, or “first class defined in the module”),
instantiates it, walks its public methods, and feeds each one through
register_tools(...) from wintermute/ai/utils/tool_factory.py. The
generated Tool objects are inserted into the global ToolRegistry. On
unload("name") they are removed and an observer callback fires, so
WintermuteMCP can broadcast notifications/tools/list_changed to
connected clients.
The mental model: cartridges turn ordinary Python class instances into AI tools, and adding/removing a cartridge changes the AI’s capability surface live.
Loading From the Console — and What Happens
onoSendai [acme-iotcam-2026-Q2] > cartridgesonoSendai [acme-iotcam-2026-Q2/cartridges] > list📦 Available Cartridges┃ Name ┃ Loaded ┃┃ firmware_analysis ┃ ┃┃ jtag ┃ ┃┃ tpm20 ┃ ┃onoSendai [.../cartridges] > load tpm20✔ Loaded cartridge tpm20 — 9 tool(s) registered with the AI.[*] Exposed functions: get_random, read_public, nv_read, nv_write, start_auth_session, test_pcr_state, test_da_lockout, fuzz_command, executeonoSendai [.../cartridges] > load firmware_analysis✔ Loaded cartridge firmware_analysis — 4 tool(s) registered with the AI.[*] Exposed functions: analyze_entropy, scan_for_secrets, extract_strings, find_base_addressonoSendai [.../cartridges] > tpm20onoSendai [.../cartridges/tpm20] > list⚙️ Cartridge: tpm20 (tpm20)┃ Function ┃ Signature ┃ Description ┃┃ get_random ┃ (num_bytes: int) ┃ Request num_bytes of randomness… ┃┃ test_pcr_state ┃ (pcr_index: int) ┃ Read a PCR; verify it changes… ┃┃ test_da_lockout ┃ (max_attempts: int = 5) ┃ Force a DA lockout to test reset… ┃┃ ... ┃ ┃ ┃onoSendai [.../cartridges/tpm20] > run test_pcr_state 0{'pcr_index': 0, 'value': '0x000...', 'changed_after_extend': True}
The same run is what the LLM does via tool-call: tools.call("test_pcr_state", {"pcr_index": 0}). The shape is identical because there is a single registry.
Programmatic Cartridge Use
From examples/07-Programmatic-Hardware-Cartridges.ipynb, the canonical
JTAG dump-then-analyse chain (here in production form, not the notebook’s
in-memory fake transport):
from wintermute.cartridges.jtag import JTAGCartridge, OpenOCDConfig, OpenOCDTransportfrom wintermute.cartridges.firmware_analysis import FirmwareAnalysisCartridgefrom wintermute.utils.blob_manager import WorkspaceManager# Real OpenOCD running locally on :4444 against the iot-cam-01 JTAGtransport = OpenOCDTransport(OpenOCDConfig(host="localhost", port=4444))workspace = WorkspaceManager() # defaults to ./wintermute_workspacejtag = JTAGCartridge(transport=transport, workspace=workspace)fa = FirmwareAnalysisCartridge()assert jtag.halt_core()descriptor = jtag.dump_firmware(start_address="0x08000000", size_bytes=0x100000, filename="iotcam-flash.bin")# descriptor: {"file_path": "...sha256-prefix.bin", "size_bytes": ...,# "sha256": "...", "type": "binary_blob"}entropy = fa.analyze_entropy(descriptor["file_path"])secrets = fa.scan_for_secrets(descriptor["file_path"])strings = fa.extract_strings(descriptor["file_path"], min_length=8)base = fa.find_base_address(descriptor["file_path"], arch="arm", min_addr=0x08000000, max_addr=0x40000000)
Three properties of this code are crucial for the per-test-case sub-agents in Part 7:
- The dump never enters the LLM context.
dump_firmwarewrites the
bytes throughWorkspaceManagerand returns a descriptor. When the AI
calls this tool, the descriptor is what the model sees; the bytes stay on
disk addressable byfile_path. - Each follow-up tool consumes
file_path, not bytes.analyze_entropy,scan_for_secrets,extract_strings,find_base_addressall take afile_path. This composes cleanly: the agent says “dump → analyze entropy
→ find base address,” each step is a tool call, no step ever attempts
to put 1 MiB of flash into a prompt. find_base_addressis multiprocess-bounded. The cartridge caps
workers atos.cpu_count() // 2and silencesbasefind‘s progress bars,
so an agent running this without supervision does not eat the whole
machine.
Tying a Cartridge Method To a Live Test Case
Sub-agents in Part 7 will repeatedly do the following: take a
TestCaseRun.bound[i].object_id (the peripheral hostname/alias), find the
right cartridge method, and invoke it with the right arguments.
For IOT-HW-UART-001:iot-cam-01:debug-uart, the steps in the test plan are
“capture boot output,” “test BREAK interrupt,” “verify root shell.” None of
those map 1-to-1 to JTAG cartridge methods, but the JTAG cartridge is the
one that proves a UART exposes interruptible U-Boot — by halting the core,
reading PC after a BREAK, and verifying it landed in the bootloader. The
sub-agent is the thing that bridges the natural-language step in the test
case to the typed method on a cartridge.
For IOT-HW-I2C-001:iot-cam-01:mcio-eeprom-1, however, none of the shipped
cartridges has an I²C method — and that is fine. We will add a small
I2CCartridge next, and because cartridges hot-load, the orchestrator picks
it up without restart.
Writing Your Own Cartridge
Drop a new file wintermute/cartridges/i2c.py:
from __future__ import annotationsfrom typing import Anyfrom smbus2 import SMBus # apt install python3-smbus2class I2CCartridge: """Direct I²C bus access via Linux i2c-dev for EEPROM extraction.""" def __init__(self, bus_number: int = 2) -> None: self.bus = SMBus(bus_number) def detect(self) -> list[int]: """Probe addresses 0x03..0x77 and return the list of responders.""" found = [] for addr in range(0x03, 0x78): try: self.bus.read_byte(addr) found.append(addr) except OSError: pass return found def dump_eeprom(self, address: int, size: int = 256) -> dict[str, Any]: """Sequential read of `size` bytes from device `address`. Returns a workspace blob descriptor (NEVER raw bytes — the LLM never sees them). """ # i2c sequential read: write 0x00, then read `size` bytes self.bus.write_byte(address, 0) data = bytes(self.bus.read_byte(address) for _ in range(size)) return data # tool_factory.LARGE_PAYLOAD_THRESHOLD_BYTES handles offload def write_byte(self, address: int, register: int, value: int) -> dict[str, int]: self.bus.write_byte_data(address, register, value) return {"address": address, "register": register, "value": value}
That is the entire cartridge. Now:
onoSendai [.../cartridges] > load i2c✔ Loaded cartridge i2c — 3 tool(s) registered with the AI.[*] Exposed functions: detect, dump_eeprom, write_byteonoSendai [.../cartridges/i2c] > run detect[80, 81] # 0x50 0x51onoSendai [.../cartridges/i2c] > run dump_eeprom 0x50 256{'file_path': './wintermute_workspace/blob-3a4f.../...bin', 'size_bytes': 256, 'sha256': '...', 'type': 'binary_blob'}
Two specific Wintermute behaviors made this easy:
tool_factory.function_to_toolreads type hints. Returningbytes
(dump_eeprom) automatically goes through_maybe_offload_payloadand
becomes a workspace blob descriptor.CartridgeManager._find_primary_classpicksI2CCartridgebecause
its name ends inCartridge. No registration boilerplate.
This is what “agentic framework for hardware red teams” actually means —
turning a 30-line wrapper around smbus2 into something the orchestrator
can compose with TPM, JTAG, and firmware analysis without writing any
glue. We will use this exact I2CCartridge in Part 6 and Part 7.
The MCP Side — Two Directions
Wintermute speaks MCP both ways:
- Outbound (
integrations/mcp_runtime.py):MCPRuntimeand themcp register/start/stopconsole family let Wintermute consume an
external MCP server. Tools from that server land in the same global
registry as cartridge methods — the LLM does not see the seam. - Inbound (
WintermuteMCP.py):wintermute-mcpruns as an MCP server
exposing 80+ tools (operation CRUD, devices, vulnerabilities, AI chat,
reports, cartridges, SSH, depthcharge, burp ingest…). Any MCP client —
Claude Desktop, Cursor, or our own orchestrator from Part 6 — drives the
framework remotely.
Outbound: Mounting an External MCP Server From the Console
Suppose we have a Burp Suite MCP server (a community project that exposes Burp’s REST API as MCP tools). Register, start, list:
onoSendai > mcp register burpsuite uvx burp-mcp --proxy http://127.0.0.1:8080[*] Registered MCP server burpsuite (uvx burp-mcp --proxy http://127.0.0.1:8080) Saved to ~/.config/wintermute/mcp_servers.jsononoSendai > mcp start burpsuite[*] Server burpsuite started (PID 84211)onoSendai > tools mcp🔌 External MCP Tools┃ Name ┃ Description ┃ Server ┃┃ burp_active_scan ┃ Trigger an active scan against… ┃ burpsuite ┃┃ burp_get_issues ┃ Pull current issue list… ┃ burpsuite ┃┃ ... ┃ ┃ ┃onoSendai > tools list🧰 Native AI Tools┃ Name ┃ Description ┃ Source ┃┃ get_random ┃ TPM 2.0 randomness… ┃ internal ┃┃ analyze_entropy ┃ Shannon entropy… ┃ internal ┃┃ burp_active_scan ┃ ... ┃ mcp ┃
The agent now sees the union — analyze_entropy (cartridge), burp_active_scan
(external MCP), nv_read (TPM), dump_firmware (JTAG) — all callable by
name through ToolsRuntime.run_tool. This is how we drop a Maltego MCP, a
JIRA MCP, an internal Confluence MCP onto an engagement and have the
orchestrator (Part 6) discover them on the fly.
Inbound: Driving Wintermute From Claude Desktop
Run the server from a different terminal:
$ wintermute-mcp --transport stdio
In ~/.config/Claude/claude_desktop_config.json:
{ "mcpServers": { "wintermute": { "command": "wintermute-mcp", "args": ["--transport", "stdio"] } }}
Now Claude Desktop has tools like create_operation, add_device,
add_peripheral_to_device, setup_storage_backend, add_test_plan_from_json,
generate_test_runs, update_test_run_status, add_vulnerability_to_test_run,
generate_report, run_ssh_command, execute_depthcharge_catalog,
execute_depthcharge_memory_dump, ingest_burp_scan, attach_evidence,
load_cartridge — every one of which mutates the same Operation
container we built in Part 2.
A red-team workflow becomes “Claude, load the IoT camera engagement, attach TestPlans/TP-HW-BLACKBOX-001.json, generate runs, then for the I²C-EEPROM run, dump it via the i2c cartridge and analyze the dump.” Claude composes the calls in the right order; Wintermute persists the result.
Surgeon: Firmware Hooks and AFL++ Fuzzing
Surgeon is Wintermute’s purpose-built MCP server for firmware emulation
hook generation. Source: wintermute/integrations/surgeon/server.py.
The exposed MCP tools (and the offensive use case for each):
| Surgeon tool | Purpose |
|---|---|
create_hook_skeleton | Generate a C hook for a peripheral’s MMIO region (UART/WIFI/JTAG/PCIE/USB/…). |
list_firmware_symbols | nm over the firmware ELF — finds candidate hook addresses (functions to instrument). |
write_config_file | Write YAML/JSON configs into the Surgeon project tree. |
build_firmware | make build FIRMWARE=<name> — compiles the instrumented binary. |
start_fuzzing | make run-fuzz FIRMWARE=<name> — launches the AFL++ docker container. |
get_fuzzer_stats | Read fuzzer_stats from the AFL output dir. |
The interesting one is create_hook_skeleton. It takes a peripheral type
(UART, WIFI, BLUETOOTH, ETHERNET, USB, PCIE, JTAG, TPM,
ZIGBEE) and a malicious_snippet of C code, and produces an emulator
hook with read/write callbacks at the peripheral’s MMIO base. The
malicious_snippet lets a red-team scenario inject arbitrary fault-injection
behavior — integer overflow on a register read, randomized RX bytes on a
radio, fake PCIe vendor ID:
# Driven via MCP from the orchestratorhook = await surgeon_session.call_tool("create_hook_skeleton", { "firmware_name": "iotcam_v3", "peripheral_name": "wifi_chip", "address_base": "0x40001000", "peripheral_type": "WIFI", "malicious_snippet": "if (offset == 0x08 && (rand() % 100 < 5)) " " *val = 0xDEADBEEF; // 5% radio noise injection",})
This produces a C file under <SURGEON_ROOT>/src/runtime/handlers/iotcam_v3/wifi_chip.c
ready for make build FIRMWARE=iotcam_v3. Hook in a SDR-shape radio
abstraction (TX writes are logged, RX reads can be poisoned), build, then
start_fuzzing. AFL++ inside the Surgeon docker container drives the
emulated firmware against the hook, looking for crashes — the offensive
playbook for “find a memory-corruption bug in the WiFi RX path of a
firmware blob you can’t run on real hardware.”
Wiring Surgeon Into Wintermute
SurgeonBackend is the in-process bridge. From examples/04-AI-Enrichment-and-Tools.ipynb
extended to Surgeon:
from wintermute.ai.tools_runtime import ToolsRuntimefrom wintermute.integrations.surgeon.backend import SurgeonBackendruntime = ToolsRuntime()backend = SurgeonBackend(surgeon_root="/opt/surgeon")await backend.start()runtime.register_backend(backend)# Now the LLM can call create_hook_skeleton, build_firmware, start_fuzzing# alongside cartridge methodsall_tools = await runtime.get_all_tools()print([t["function"]["name"] for t in all_tools])
SurgeonBackend.start() (integrations/surgeon/backend.py) spawns the
Surgeon FastMCP server as a subprocess and connects via stdio. get_ai_tools()
returns the FastMCP tool list converted into OpenAI function-calling
format; execute_tool() round-trips through MCP. From the LLM’s vantage,
Surgeon tools are indistinguishable from cartridge methods.
A Composed Pentest Step: I²C-EEPROM Extraction Into a Vulnerability
Putting everything in this post together, here is the sequence the Part-7
sub-agent will execute autonomously for IOT-HW-I2C-001:iot-cam-01:mcio-eeprom-1:
# Manual transcript — Part 7 will let the LLM make these calls itselffrom wintermute.cartridges.manager import CartridgeManagerfrom wintermute.ai.tools_runtime import tools as registryfrom wintermute.findings import ReproductionStep, Vulnerability# 0. Cartridges available on the active operationmgr = CartridgeManager()mgr.load("i2c")mgr.load("firmware_analysis")# 1. Detect addresses on bus 2addrs = registry.call("detect", {})["result"]assert 0x50 in addrs# 2. Dump the EEPROM — descriptor returned, no bytes in contextdesc = registry.call("dump_eeprom", {"address": 0x50, "size": 256})# 3. Static analysis on the dumpstrings = registry.call("extract_strings", {"file_path": desc["file_path"], "min_length": 8})["result"]secrets = registry.call("scan_for_secrets", {"file_path": desc["file_path"]})["result"]# 4. Decide & write back into the live operationinteresting = strings["top_20_interesting_strings"]creds_present = any("password" in s or "admin" in s for s in interesting)pem_present = bool(secrets["matches"].get("pem_block", []))run = next(r for r in op.test_runs if r.run_id == "IOT-HW-I2C-001:iot-cam-01:mcio-eeprom-1")run.start()if creds_present or pem_present: vuln = Vulnerability( title="Hardcoded credentials/keys in I2C EEPROM (MCIO bus 2, 0x50)", description=( f"Recovered top strings: {interesting[:3]} ... " f"PEM blocks at offsets: {secrets['matches'].get('pem_block', [])}" ), cvss=8, threat="unauthorized device access via static credentials", reproduction_steps=[ ReproductionStep( title="Detect I2C devices on bus 2", tool="i2c.detect", action="probe", confidence=80, arguments=[], ), ReproductionStep( title="Sequential read of 256 bytes from 0x50", tool="i2c.dump_eeprom", action="read", confidence=90, arguments=["0x50", "256"], ), ReproductionStep( title="Extract printable strings (>=8 chars)", tool="firmware_analysis.extract_strings", action="analyze", confidence=80, arguments=[desc["file_path"], "8"], ), ], ) run.findings.append(vuln) run.status = RunStatus.failed # vulnerability found = run "failed" targetelse: run.status = RunStatus.passedrun.finish()op.save()
That is the shape of every per-test-case sub-agent in Part 7. The
sub-agent’s job is to translate the natural-language steps in
TestCase.steps into the right cartridge / MCP / Surgeon tool calls in the
right order, then write the verdict into the live TestCaseRun. The framework
already ships:
- the cartridge surface (in-process, hot-loadable),
- the MCP surface (external tools, hot-mountable),
- the workspace (large blobs offloaded automatically),
- the run/finding/repro-step model.
The agent layer above is genuinely small once the framework underneath does this much work for it.
What’s Next
Part 5 builds our first end-to-end agentic flow: a single-prompt Claude call that, given the IoT camera operation, chooses whether to dump the EEPROM, runs through tool_calling_chat‘s loop, and writes a finding. It is intentionally simple — one agent, one test case — because Part 6 and Part 7 generalize it into the orchestrator and the per-test-case sub-agents.






Leave a Reply