Wintermute Framework, Part 3: The AI Router, RAG, and Tool Registry
Part 1 sketched the architecture; Part 2 drove the framework by hand. Now we plug in the AI. By the end of this post we have:
- a
Routerset up against AWS Bedrock with a Groq fallback for cheap tasks, - a RAG knowledge base of red-team manuals queried per-engagement,
- a
tools.jsonthat path-resolves binaries the LLM is allowed to invoke, - a tool-calling loop that enriches the
BCM2837processor oniot-cam-01
using nothing butsimple_chatand registered Python tools.
Every line of code lands inside the same Operation we built in Part 2.
The Three Pieces of the Subsystem
ChatRequest ┌────────────────────┐
• messages │ Router.choose() │
• model │ default_provider │──┐
• tools (ToolSpec list) │ task_tag override │ │
• task_tag │ ("cheap" → groq) │ │
• response_format └─────────┬──────────┘ │
│ │
┌────────────▼───────────┐ │
│ LLMRegistry │ │
│ bedrock │ │
│ openai │ │
│ groq │ │
│ local_embedder │ │
│ rag-tiny_hardware_test│ │
│ rag-red_team_manuals │ │
└────────────┬───────────┘ │
│ │
┌────────────▼───────────┐ │
│ LLMProvider │ │
│ .chat(req) │◀┘
│ .embed(...) │
│ .list_models() │
│ .count_tokens(...) │
└────────────┬───────────┘
│ ChatResponse
│ • content
│ • tool_calls[]
▼
┌────────────────────────┐
│ ToolsRuntime │
│ .run_tool(name, args) │
│ 1. dynamic backends │
│ (MCP, Surgeon) │
│ 2. local ToolRegistry│
└────────────────────────┘
The three pieces are Router + LLMRegistry + LLMProvider (the AI), the
RAG providers that wrap base providers transparently, and the tool registry
underneath.
Bootstrapping the Router
init_router() (wintermute/ai/bootstrap.py) registers every available
provider and discovers RAG knowledge bases automatically. The full code:
import osfrom wintermute.ai.bootstrap import init_routerfrom wintermute.ai.provider import llmsrouter = init_router()print("Providers:", llms.providers())print("Default :", router.default_provider, router.default_model)
Output on a machine with AWS + Groq credentials and one local KB indexed:
Providers: ['bedrock', 'groq', 'openai', 'local_embedder', 'rag-tiny_hardware_test', 'rag-red_team_manuals']Default : bedrock bedrock/us.anthropic.claude-sonnet-4-6:0
Inside init_router():
register_bedrock(region=os.getenv("AWS_REGION", "us-east-1"))register_groq(api_key=os.getenv("GROQ_API_KEY"))register_openai(api_key=os.getenv("OPENAI_API_KEY"))register_huggingface(as_name="local_embedder")bootstrap_rags(llms)return Router(default_provider="bedrock", default_model=os.getenv("BEDROCK_MODEL_ID"))
Each register_* is permissive: a missing API key skips registration without
failing. Run with no env vars at all and you get just local_embedder —
useful when simple_chat is not the goal but you want the global registry
populated for an MCP server that brings its own model.
Routing By Task Tag
Set task_tag="cheap" on a request and Router.choose (provider.py:99)
swaps to Groq if it is registered:
from wintermute.ai.use import simple_chat# Bedrock by defaultprint(simple_chat(router, "Summarize JTAG vs SWD risks"))# Routed to Groq because of the tag, no other changeprint(simple_chat(router, "Same question, fast", task_tag="cheap"))
This is the lever we will pull in Part 7 to keep the per-test-case sub-agents fast and inexpensive (Groq Llama 3.3 70B) while reserving Claude/GPT for the orchestrator’s reasoning.
Routing By RAG
router.set_default(provider="rag-<name>") flips the default for every
subsequent call, including tool-calling:
# Stock Bedrock — no document contextrouter.set_default(provider="bedrock")print(simple_chat(router, "What voltage does the VCC_CORE pin use on the BCM2837?"))# RAG-augmented — Bedrock + retrieved chunks from the KBrouter.set_default(provider="rag-tiny_hardware_test")print(simple_chat(router, "What voltage does the VCC_CORE pin use on the BCM2837?"))
The first call answers with the model’s training data. The second call
queries the LlamaIndex vector store, prepends the retrieved chunks, and
forwards to the configured base provider. From the source
(wintermute/ai/providers/rag_provider.py), the RAGProvider wraps any
other registered provider as its base_provider — so the same KB works
backed by Bedrock, OpenAI, or Groq without rebuilding the index.
Building a Red-Team Knowledge Base — rag-red_team_manuals
For a real engagement you want the agent grounded in your own offensive references — exploit notes, vendor errata, prior-engagement reports — not just generic web data. Layout:
knowledge_bases/└── red_team_manuals/ ├── rag_config.json ├── docs/ │ ├── uboot-attack-cheatsheet.md │ ├── i2c-eeprom-extraction.pdf │ ├── tpm20-quirks-2024.md │ ├── prior-engagements/ │ │ ├── 2024-Q3-acme-router.docx │ │ └── 2025-Q1-foo-cam.docx └── storage_db/ # written after indexing
rag_config.json:
{ "rag_id": "red_team_manuals", "description": "Embedded systems and red team exploit manuals + prior eng. reports.", "base_provider_id": "bedrock", "embed_provider_id": "local_embedder", "embedding_model": "BAAI/bge-small-en-v1.5", "vector_store_type": "local", "document_types": ["pdf", "markdown", "text", "docx"]}
Index with LlamaIndex + the local embedder so nothing leaves the host:
from llama_index.core import VectorStoreIndex, SimpleDirectoryReaderfrom wintermute.ai.providers.huggingface_provider import HuggingFaceProviderfrom wintermute.ai.providers.rag_provider import LlamaIndexEmbeddingWrapperembed = HuggingFaceProvider(name="local_embedder")embed_model = LlamaIndexEmbeddingWrapper( provider=embed, model_name="BAAI/bge-small-en-v1.5",)docs = SimpleDirectoryReader("knowledge_bases/red_team_manuals/docs").load_data()index = VectorStoreIndex.from_documents(docs, embed_model=embed_model)index.storage_context.persist( persist_dir="knowledge_bases/red_team_manuals/storage_db")
Once storage_db/ exists, the next init_router() registers it as
rag-red_team_manuals automatically — no code change. Switch to it from the
console:
onoSendai [acme-iotcam-2026-Q2] > ai rag list rag-tiny_hardware_test Hardware technical reference... rag-red_team_manuals Embedded systems and red team exploit manuals...onoSendai [acme-iotcam-2026-Q2] > ai rag use red_team_manuals[*] AI default provider set to rag-red_team_manualsonoSendai [acme-iotcam-2026-Q2] > ai How did we extract the I2C EEPROM on the foo-cam in 2025-Q1?
For shared engagements with several analysts, swap vector_store_type to
"qdrant" and point at a remote server ("qdrant_url") — same code, same
RAG provider, indices now centralized.
Tools, Path Mapping, and the Global Registry
The tool registry is one of Wintermute’s most useful primitives because it collapses three different ways of giving the LLM a capability into a single surface.
1. Pure Python tools via register_tools
wintermute/ai/utils/tool_factory.py introspects a Python function’s
signature and type hints, generates a Pydantic model, derives a JSON Schema,
wraps the call so the LLM sees {"result": ...} envelopes, and crucially
offloads oversized payloads (raw bytes always, strings >1 KiB) to the
WorkspaceManager so blob-returning tools never poison the context window.
from wintermute.ai.tools_runtime import tools, spec_from_toolfrom wintermute.ai.utils.tool_factory import register_toolsdef lookup_cve(cve_id: str) -> str: """Look up a CVE by ID and return vulnerability details.""" # ... query NVD or local DB ... return "..."def check_port(host: str, port: int) -> str: """Check if a network port is open on the host.""" # ... your scanner ... return f"Port {port} on {host}: open"for tool in register_tools([lookup_cve, check_port]): tools.register(tool)
The factory pulls the docstring as the description, the parameter annotations
as the input schema, and the return type as the output. This is the same
mechanism CartridgeManager._register_instance_methods uses on cartridge
classes — your hand-written tools and a loaded cartridge’s methods land in
the same registry.
2. Path-mapped binaries via tools.json
ToolRegistry.load_tool_configs(path) reads a JSON list mapping tool name
to a (directory, executable) pair, resolved against WINTERMUTE_TOOLS_ROOT
(default /opt). When that name is later registered, the absolute path is
appended to the tool’s description so the LLM sees:
{ "name": "openocd", "description": "Open On-Chip Debugger for JTAG/SWD access\n\nAbsolute Path: /opt/openocd/bin/openocd", ...}
Concrete example (examples/05-RAG-Knowledge-Bases.ipynb):
[ { "name": "openocd", "directory": "openocd/bin", "executable": "openocd" }, { "name": "flashrom", "directory": "flashrom/bin", "executable": "flashrom" }, { "name": "depthcharge", "directory": "depthcharge/bin","executable": "depthcharge-inspect" }, { "name": "tpm2_tools", "directory": "tpm2-tools/bin", "executable": "tpm2_getcap" }]
Why this matters: the RAG context (the red-team manual you just indexed) will
say “use openocd to halt the core.” The agent translates that to a tool call.
Without path mapping, the LLM either guesses /usr/local/bin/openocd or
prepends sudo randomly. With path mapping, the description is grounded
against the deployment, so the generated subprocess.run([...]) call is
correct on this host whether you’re on a Linux laptop, a NUC in the lab, or
a CI runner where everything is in /srv/tools.
3. Dynamic backends — Surgeon and any MCP server
ToolsRuntime.register_backend(backend) wires a ToolBackend (the
get_ai_tools() + execute_tool(...) protocol) into the runtime. The
shipped SurgeonBackend (integrations/surgeon/backend.py) is a managed
subprocess that speaks MCP over stdio; on start() it spawns the
integrations/surgeon/server.py FastMCP server and exposes its tools
(create_hook_skeleton, list_firmware_symbols, build_firmware,
start_fuzzing, get_fuzzer_stats, write_config_file) via the same
get_ai_tools() interface used by the static registry.
from wintermute.ai.tools_runtime import ToolsRuntimefrom wintermute.integrations.surgeon.backend import SurgeonBackendruntime = ToolsRuntime()backend = SurgeonBackend(surgeon_root="/opt/surgeon")await backend.start()runtime.register_backend(backend)# Now ToolsRuntime.run_tool() resolves Surgeon tools first,# falls back to local registry. The LLM sees one merged list.
This is the seam we will use in Part 4 to mount custom MCP servers next to Surgeon — a Burp Suite MCP, a Maltego MCP, a customer-specific JIRA MCP — without altering anything in the agent code.
End-to-End Example: AI-Enriched Hardware Inventory
Let’s tie it together with the most useful “small” agent task: take a
half-blank Processor (“BCM2837 by Broadcom”) and let the LLM populate the
Architecture, capabilities, and pinout fields. This is exactly what
enrich_processor (the enrich_processor_via_ai MCP tool) does.
from wintermute.ai.bootstrap import init_routerfrom wintermute.ai.use import simple_chatfrom wintermute.ai.tools_runtime import tools, spec_from_toolfrom wintermute.ai.utils.tool_factory import register_toolsfrom wintermute.ai.utils.hardware import enrich_processorfrom wintermute.hardware import Processor# 0. Bring up the routerrouter = init_router()# 1. Register a couple of red-team-flavoured toolsdef query_red_team_kb(question: str) -> str: """Query the red_team_manuals KB and return the top retrieved chunk.""" from wintermute.ai.types import ChatRequest, Message rag = init_router_with(provider="rag-red_team_manuals") # see helper below resp = rag.choose(ChatRequest(messages=[Message(role="user", content=question)]))[0] return resp.chat(ChatRequest(messages=[Message(role="user", content=question)])).contentdef cve_for(cpe: str) -> str: """Return CVE summaries for a given CPE 2.3 string.""" # placeholder — wire to NVD return "CVE-2024-XXXX: ..."for t in register_tools([query_red_team_kb, cve_for]): tools.register(t)# 2. Enrich the processor on iot-cam-01 — uses tool_calling_chat under the hoodproc = Processor(processor="BCM2837", manufacturer="Broadcom")proc = enrich_processor(proc, router=router)print(proc.architecture.instruction_set, proc.architecture.cpu_cores)# ARMv8-A 4
enrich_processor (wintermute/ai/utils/hardware.py) builds a
ChatRequest with the processor name and manufacturer, lets the model call
query_red_team_kb and cve_for to ground its answer, then parses the
final response into a populated Processor object. The same flow is exposed
from MCP as enrich_processor_via_ai (WintermuteMCP.py:2199) so an
external Claude Desktop / Cursor / custom client invokes it identically.
For the IoT camera engagement, run the same enrichment over every device on the operation:
for dev in op.devices: if dev.processor and not dev.processor.architecture: dev.processor = enrich_processor(dev.processor, router=router)op.save()
Two minutes of agent time, two analyst-hours saved, every device on the operation now carries an architecture record the report templates can render.
Operational Notes
- Cost control.
task_tag="cheap"is the cheapest knob. For genuinely
long jobs (orchestrator + dozens of sub-agents), setBEDROCK_MODEL_ID=bedrock/us.anthropic.claude-haiku-4-5-20251001-v1:0and
reserve Sonnet for the orchestrator only. Or do the inverse, use Sonnet
for the orchestrator and Groq for sub-agents — exactly what we do in Part 7. - Air-gapped engagements. Skip Bedrock entirely. Register a HuggingFace
embedding provider, build local-only RAG indices, and supply your own
on-prem provider that implementsLLMProvider. The registry is
provider-agnostic —litellmis the default vehicle but not required. - Workspace hygiene.
WorkspaceManager(wintermute/utils/blob_manager.py)
defaults to./wintermute_workspace/. SetWINTERMUTE_WORKSPACE_ROOTto
put dumped firmware on a dedicated SSD; descriptors handed to the LLM are{file_path, size_bytes, sha256, type: "binary_blob"}regardless.
What’s Next
Part 4 covers the cartridge system,
the MCPRuntime, and how Surgeon (firmware emulation hooks + AFL++ fuzzing)
plugs in. We’ll load tpm20, jtag, and firmware_analysis on a live
target, compose them, and watch how the cartridge methods become AI tools
the orchestrator (Part 6) and the per-run sub-agents (Part 7) call by name.





Leave a Reply