Wintermute Framework, Part 2: Operations, Storage, and the Console

In Part 1 we mapped the architecture and the engagement data model. In this post we drive Wintermute by hand: build an Operation, persist it, drive it from the REPL, and learn the console’s context-stack idioms — because every later post (the agent in Part 5, the orchestrator in Part 6, the sub-agents in Part 7) sits on top of these plumbing primitives.

This post is a field manual for the operator. No AI yet.

The Reference Engagement: a Raspberry-Pi-Based IoT Camera

I’ll use a single fictional engagement throughout the series so the data accumulates. Today is the in-take; later posts will fill in the agentic execution.

Engagement name: acme-iotcam-2026-Q2
Scope: a managed IoT camera (iot-cam-01, IP 10.0.0.5), an MCIO board with
an I²C EEPROM at 0x50, and a backing AWS account acme-prod with an IAM
role we suspect is over-privileged.
Analyst: Case (case, case@acme.com).
Stakeholder: Acme’s security lead, Robert Smith.

We’ll add hardware peripherals (UART on J3, JTAG on J5, the I²C-EEPROM as a peripheral attached to the device), one declarative TestPlan (the shipped TestPlans/TP-HW-BLACKBOX-001.json), and persist the lot to disk so it can be loaded on a different host or by an MCP client.

Building the Operation Programmatically

			
from wintermute.core import Operation, TestPlan
from wintermute.peripherals import JTAG, UART, Peripheral
from wintermute.hardware import Architecture, Processor
op = Operation("acme-iotcam-2026-Q2",
               start_date="04/01/2026", end_date="04/30/2026")
op.addAnalyst("Case", "case", "case@acme.com")
op.addUser(uid="rsmith", name="Robert Smith",
           email="robert@acme.com", teams=["stakeholder"])
# Target device
op.addDevice("iot-cam-01", "10.0.0.5", operatingsystem="Linux 5.10")
dev = op.getDeviceByHostname("iot-cam-01")
# Tag it with processor + architecture
dev.processor = Processor(
    processor="BCM2837",
    manufacturer="Broadcom",
    processor_family="Cortex-A",
    architecture=Architecture(
        core="Cortex-A53", instruction_set="ARMv8-A",
        cpu_cores=4,
        key_features={"trustzone": True, "neon": True},
    ),
    endianness="little",
)
# Hardware peripherals — exactly the shape examples/03-Hardware-Security-Testing.ipynb
# uses, just with our own pinouts
dev.peripherals.extend([
    UART(name="debug-uart", baudrate=115200, device_path="/dev/ttyUSB0",
         pins={"tx": "J3-1", "rx": "J3-2", "gnd": "J3-3"}),
    JTAG(name="main-jtag", device_path="/dev/jtag0",
         pins={"tck": "J5-1", "tms": "J5-2", "tdi": "J5-3",
               "tdo": "J5-4", "gnd": "J5-5"}),
    Peripheral(name="mcio-eeprom-1", pType="I2C",
               pins={"scl": "MCIO-7", "sda": "MCIO-9"}),
])
# Network services on the device
dev.addService(portNumber=443, app="lighttpd",
               protocol="ipv4", transport_layer="HTTPS")
dev.addService(portNumber=22, app="dropbear",
               protocol="ipv4", transport_layer="SSH")
# Cloud account in scope
op.addAWSAccount("acme-prod", account_id="111122223333")

		

Three details are non-obvious enough that they bite people:

addDevice and addAnalyst use upsert-with-merge semantics
(Operation._merge_attributes, core.py:1068). Calling them twice with the
same hostname / userid does not create duplicates — list fields are extended
uniquely, dicts are shallow-merged, scalar fields overwrite only when the
new value is truthy. This is the contract every cartridge and AI tool relies
on, so partial enrichment is safe.
Peripheral is the generic class. There are dedicated subclasses for
UART, JTAG, Wifi, Bluetooth, USB, PCIe, Ethernet, and
TPMPeripheral (TPM 2.0 transport-aware). For a one-off bus where there is
no dedicated subclass (I²C, SPI, SWD, Zigbee), use Peripheral(name=..., pType=...).
dev.peripherals.extend([...]) and dev.services.append(...) are perfectly
legitimate — the helpers addService / addPeripheral exist for de-dup but
direct list manipulation is what Operation.from_dict does internally, so
it is part of the contract.

Loading a Hardware Test Plan

The shipped TestPlans/TP-HW-BLACKBOX-001.json covers seven categories of hardware testing. We attach it via Operation.addTestPlan(...):

			
import json
from pathlib import Path
from wintermute.core import TestPlan
tp_data = json.loads(Path("TestPlans/TP-HW-BLACKBOX-001.json").read_text())
op.addTestPlan(TestPlan.from_dict(tp_data))

		

Now look at one specific test case in that plan — IOT-HW-UART-001 — to see how scoping resolves against our Operation:

			
{
  "code": "IOT-HW-UART-001",
  "name": "UART discovery & console exposure",
  "execution_mode": "per_binding",
  "execution_binding": "uart",
  "target_scope": {
    "tags": ["uart", "console"],
    "bindings": [
      { "kind": "device",     "name": "dut",  "where": {}, "cardinality": "one" },
      { "kind": "peripheral", "name": "uart",
        "where": { "device": "dut", "pType": "UART" }, "cardinality": "many" }
    ]
  },
  "steps": [
    { "title": "Capture boot-time serial output", "tool": "serial-capture",
      "action": "collect", "confidence": 80, "arguments": ["capture_boot_log"] },
    ...
  ]
}

		

Now generate runs:

			
runs = op.generateTestRuns()
for r in runs:
    print(f"{r.run_id:<55}  {r.test_case_code}  {r.status.value}")

generateTestRuns() (core.py:1008) walks every TestCase across attached plans, calls resolveBindings() to match the case’s selectors against op.devices and op.cloud_accounts, then createRunsForTestCase() (core.py:936) fans the case out per the execution_mode:

once → one run with run_id = "TC_CODE:once".
per_device → one run per matched DUT.
per_binding → one run per matched peripheral, with run_id = "TC_CODE:DEVICE:OBJECT".

For our IoT camera, the UART case yields a single run (IOT-HW-UART-001:iot-cam-01:debug-uart) because we only declared one UART peripheral. The I²C-EEPROM case (IOT-HW-I2C-001 in the same plan) likewise binds to mcio-eeprom-1. Tag a second device tomorrow and the same plan produces twice as many runs without modification.

This selector + cardinality machinery is the lever the orchestrator pulls in Part 6. The agent does not invent test cases out of nothing — it loads a real TestPlan, resolves it against the live operation, and dispatches one sub-agent per generated run.

Persistence: `JsonFileBackend`, `DynamoDBBackend`, and the Protocol

			
from wintermute.backends.json_storage import JsonFileBackend
Operation.register_backend("json", JsonFileBackend(base_path="./.wintermute_data"),
                           make_default=True)
op.save()                       # writes acme-iotcam-2026-Q2.json

StorageBackend (wintermute/storage.py) is a four-method protocol — save, load, list_all, delete. The two shipped implementations are:

Backend	Module	Notes
`JsonFileBackend`	`backends/json_storage.py`	TinyDB underneath; one file per operation; ideal for laptops.
`DynamoDBBackend`	`backends/dynamodb.py`	Single-table; uses operation name as partition key.

Switching at runtime is a single call:

			
Operation.use_backend("dynamodb")
op.save()                       # now writes to DynamoDB

This matters during a multi-analyst engagement: the field operator runs the console with JsonFileBackend to sync to a thumb drive; the home base runs the MCP server with DynamoDBBackend so other analysts can pick up the state. The agent does not care which is active.

To implement a custom backend (Postgres, S3, Notion):

			
class MyPostgresBackend:
    def save(self, operation_id: str, data: dict) -> bool: ...
    def load(self, operation_id: str) -> dict | None: ...
    def list_all(self) -> list[str]: ...
    def delete(self, operation_id: str) -> bool: ...
Operation.register_backend("postgres", MyPostgresBackend(...), make_default=True)

		

No subclassing, no abstract base. Same Protocol pattern is used for TicketBackend, ReportBackend, and the ToolBackend an MCP server exposes — there is exactly one mental model for “swap a subsystem.”

The Console: Context Stack and Builders

wintermute (the binary) is a Metasploit-style REPL. Three idioms make it worth using over scripted Python:

1. Operation creation and direct field setting.

			
onoSendai > operation create acme-iotcam-2026-Q2
onoSendai [acme-iotcam-2026-Q2] > set start_date 04/01/2026
onoSendai [acme-iotcam-2026-Q2] > set end_date 04/30/2026

2. Add commands with inline arguments.

			
onoSendai [acme-iotcam-2026-Q2] > add device iot-cam-01 10.0.0.5
onoSendai [acme-iotcam-2026-Q2] > add analyst Case case case@acme.com
onoSendai [acme-iotcam-2026-Q2] > add user rsmith "Robert Smith" robert@acme.com
onoSendai [acme-iotcam-2026-Q2] > add cloudaccount acme-prod aws
onoSendai [acme-iotcam-2026-Q2] > add awsaccount acme-prod 111122223333

		

3. Domain context drilldown. Typing a domain name (devices, analysts, users) drops the prompt one level:

			
onoSendai [acme-iotcam-2026-Q2] > devices
onoSendai [acme-iotcam-2026-Q2/devices] > list
onoSendai [acme-iotcam-2026-Q2/devices] > iot-cam-01     # bare-id drilldown == edit
onoSendai [acme-iotcam-2026-Q2/devices/iot-cam-01] > add peripheral uart
onoSendai [acme-iotcam-2026-Q2/devices/iot-cam-01/uart] > set name debug-uart
onoSendai [.../uart] > set baudrate 115200
onoSendai [.../uart] > set device_path /dev/ttyUSB0
onoSendai [.../uart] > save

		

The builder pattern is implemented in BuilderContext (WintermuteConsole.py:111) and dispatched through _dispatch_builder_command (WintermuteConsole.py:4869). save materializes the partial-built object and attaches it to the parent’s list field (peripherals in this case). back pops one frame.

The reason this pattern matters for agents: when the AI is on, every add/edit/set is also exposed as MCP tool (e.g., add_device, add_peripheral_to_device, edit_device). The same builder hierarchy is how the MCP server’s ObjectRegistry exposes nested objects — the agent fundamentally does what the operator does.

Test Run Drilldown — the `testruns` Domain

Once a plan is attached and runs are generated, testruns is its own context:

			
onoSendai [acme-iotcam-2026-Q2] > testruns load TestPlans/TP-HW-BLACKBOX-001.json
[*] Loaded test plan TP-HW-BLACKBOX-001 (24 test case(s))
onoSendai [acme-iotcam-2026-Q2] > testruns generate
[*] Generated 17 new test run(s). Total runs: 17
onoSendai [acme-iotcam-2026-Q2] > testruns
onoSendai [.../testruns] > list
🧪 Test Runs
┃ Run ID                                            ┃ Test Case          ┃ Bound / Target            ┃ Status   ┃
┃ IOT-HW-GEN-001:iot-cam-01                         ┃ IOT-HW-GEN-001     ┃ dut=iot-cam-01            ┃ not_run  ┃
┃ IOT-HW-DISC-001:iot-cam-01                        ┃ IOT-HW-DISC-001    ┃ dut=iot-cam-01            ┃ not_run  ┃
┃ IOT-HW-UART-001:iot-cam-01:debug-uart             ┃ IOT-HW-UART-001    ┃ uart=debug-uart           ┃ not_run  ┃
┃ ...
onoSendai [.../testruns] > IOT-HW-UART-001:iot-cam-01:debug-uart
onoSendai [.../testruns/IOT-HW-UART-001:iot-cam-01:debug-uart] > show
onoSendai [.../testruns/...] > start
[*] Run IOT-HW-UART-001:iot-cam-01:debug-uart -> in_progress
onoSendai [.../testruns/...] > note "Captured boot log; bootloader allows interrupt"
onoSendai [.../testruns/...] > vuln "Unauthenticated U-Boot console" 8
onoSendai [.../testruns/...] > fail
[*] Run ... -> failed

		

In this sequence we just executed a test run manually. Every operation is reachable programmatically (run.start(), run.findings.append(...), run.status = RunStatus.failed, run.finish()) and via MCP (update_test_run_status, add_note_to_test_run, add_vulnerability_to_test_run). The point of Part 7’s per-test-case sub-agent is to do exactly this loop — start → exec tools → attach findings → status → finish — without an operator typing.

Tickets and Reports: Two More Pluggable Backends

A pentest is not done when the run is failed; it is done when the deliverable is on the customer’s desk and the bug is in their tracker. Both are first-class.

			
from wintermute.tickets import InMemoryBackend, Status, Ticket
Ticket.register_backend("mem", InMemoryBackend(), make_default=True)
tid = Ticket.create(title="Unauthenticated U-Boot console on iot-cam-01",
                    description="UART J3 exposes interruptible U-Boot. ...")
Ticket.update(tid, status=Status.IN_PROGRESS)
Ticket.comment(tid, text="Repro confirmed at 115200 8N1", author="case")

		

Swap InMemoryBackend for BugzillaBackend(url, api_key, product, component) and the same code talks to Bugzilla. The metaclass pattern (TicketMeta in wintermute/tickets.py) routes the static Ticket.create / read / update / comment calls to the active backend. There is no per-call backend argument because the cartridge code (Part 4) and AI workflows (Part 5+) just call Ticket.create(...) and trust the engagement-level configuration.

The same shape applies to reports. From examples/01-Basic-Examples.ipynb:

			
from wintermute.backends.docx_reports import DocxTplPerVulnBackend
from wintermute.reports import Report, ReportSpec
Report.register_backend("docx", DocxTplPerVulnBackend(
    template_dir="templates",
    main_template="report_main.docx",
    vuln_template="report_vuln.docx",
), make_default=True)
Report.save(
    ReportSpec(title="ACME IoT Camera Hardware Assessment",
               author="Case", summary="Findings on iot-cam-01 ..."),
    [op],
    "out.docx",
)

		

DocxTplPerVulnBackend walks the operation graph (devices → services → peripherals → vulnerabilities and runs → findings) via collect_vulnerabilities and collect_test_runs (reports.py:297, reports.py:393), composing a per-vulnerability section from templates/report_vuln.docx, a per-test-run section from templates/report_test_run.docx, and stitching them under templates/report_main.docx. Templates ship in the templates/ directory; copy and customize for your client palette.

A Real-World Pentest Workflow — Without an LLM Yet

Pulling the threads together, here is what a typical day looks like with the console alone:

Day 0 — Onboarding.
wintermute → operation create acme-iotcam-2026-Q2
→ set dates → add analyst, add user, add device, add cloudaccount
→ backend setup json ./data → save.
Day 1 — Recon and modeling.
Manual visual board survey. Drill into [devices/iot-cam-01],
add peripheral uart / jtag / spi, set pinouts.
Day 1–N — Plan-driven execution.
testruns load TestPlans/TP-HW-BLACKBOX-001.json → testruns generate
→ drilldown per run → note, vuln, pass/fail.
Day N+1 — Deliverable.
setup_report_backend docx ... → generate_report ./out/iotcam.docx.
Day N+2 — Tracking.
Each Vulnerability becomes a Ticket row in Bugzilla via the same
Ticket.create(...) calls.

This is the workflow we will automate in the rest of the series. Every phase has an MCP tool counterpart, every phase mutates the same Operation object, every phase is a place where an agent can plug in.

What’s Next

Part 3 covers the AI subsystem proper: the Router, the four shipped LLM providers, the RAG engine (vector_store_type: "local" vs Qdrant), how tools.json glues binary locations to the LLM tool surface, and the difference between simple_chat, tool_calling_chat, and the global ToolRegistry. After that, every post is agent-shaped.

Leave a ReplyCancel reply

Hey!

Join the club

Categories

Tags

Recent Posts

Wintermute Framework, Part 9: Attacking U-Boot Over UART — init=/bin/bash via bootargs Injection

Wintermute Framework, Part 8: U-Boot Secure Boot Testing With the Depthcharge Backend

Wintermute Framework, Part 7: Per-Test-Case Sub-Agents

Blogroll