Wintermute Framework, Part 3: The AI Router, RAG, and Tool Registry

Wintermute Framework, Part 3: The AI Router, RAG, and Tool Registry

Part 1 sketched the architecture; Part 2 drove the framework by hand. Now we plug in the AI. By the end of this post we have:

  • a Router set up against AWS Bedrock with a Groq fallback for cheap tasks,
  • a RAG knowledge base of red-team manuals queried per-engagement,
  • a tools.json that path-resolves binaries the LLM is allowed to invoke,
  • a tool-calling loop that enriches the BCM2837 processor on iot-cam-01
    using nothing but simple_chat and registered Python tools.

Every line of code lands inside the same Operation we built in Part 2.

The Three Pieces of the Subsystem

   ChatRequest                 ┌────────────────────┐
   • messages                  │   Router.choose()  │
   • model                     │  default_provider  │──┐
   • tools (ToolSpec list)     │  task_tag override │  │
   • task_tag                  │  ("cheap" → groq)  │  │
   • response_format           └─────────┬──────────┘  │
                                         │             │
                            ┌────────────▼───────────┐ │
                            │     LLMRegistry        │ │
                            │   bedrock              │ │
                            │   openai               │ │
                            │   groq                 │ │
                            │   local_embedder       │ │
                            │   rag-tiny_hardware_test│ │
                            │   rag-red_team_manuals │ │
                            └────────────┬───────────┘ │
                                         │             │
                            ┌────────────▼───────────┐ │
                            │    LLMProvider         │ │
                            │  .chat(req)            │◀┘
                            │  .embed(...)           │
                            │  .list_models()        │
                            │  .count_tokens(...)    │
                            └────────────┬───────────┘
                                         │ ChatResponse
                                         │  • content
                                         │  • tool_calls[]
                                         ▼
                            ┌────────────────────────┐
                            │    ToolsRuntime        │
                            │  .run_tool(name, args) │
                            │   1. dynamic backends  │
                            │      (MCP, Surgeon)    │
                            │   2. local ToolRegistry│
                            └────────────────────────┘

The three pieces are Router + LLMRegistry + LLMProvider (the AI), the RAG providers that wrap base providers transparently, and the tool registry underneath.

Bootstrapping the Router

init_router() (wintermute/ai/bootstrap.py) registers every available provider and discovers RAG knowledge bases automatically. The full code:

import os
from wintermute.ai.bootstrap import init_router
from wintermute.ai.provider import llms
router = init_router()
print("Providers:", llms.providers())
print("Default :", router.default_provider, router.default_model)

Output on a machine with AWS + Groq credentials and one local KB indexed:

Providers: ['bedrock', 'groq', 'openai', 'local_embedder',
'rag-tiny_hardware_test', 'rag-red_team_manuals']
Default : bedrock bedrock/us.anthropic.claude-sonnet-4-6:0

Inside init_router():

register_bedrock(region=os.getenv("AWS_REGION", "us-east-1"))
register_groq(api_key=os.getenv("GROQ_API_KEY"))
register_openai(api_key=os.getenv("OPENAI_API_KEY"))
register_huggingface(as_name="local_embedder")
bootstrap_rags(llms)
return Router(default_provider="bedrock",
default_model=os.getenv("BEDROCK_MODEL_ID"))

Each register_* is permissive: a missing API key skips registration without failing. Run with no env vars at all and you get just local_embedder — useful when simple_chat is not the goal but you want the global registry populated for an MCP server that brings its own model.

Routing By Task Tag

Set task_tag="cheap" on a request and Router.choose (provider.py:99) swaps to Groq if it is registered:

from wintermute.ai.use import simple_chat
# Bedrock by default
print(simple_chat(router, "Summarize JTAG vs SWD risks"))
# Routed to Groq because of the tag, no other change
print(simple_chat(router, "Same question, fast",
task_tag="cheap"))

This is the lever we will pull in Part 7 to keep the per-test-case sub-agents fast and inexpensive (Groq Llama 3.3 70B) while reserving Claude/GPT for the orchestrator’s reasoning.

Routing By RAG

router.set_default(provider="rag-<name>") flips the default for every subsequent call, including tool-calling:

# Stock Bedrock — no document context
router.set_default(provider="bedrock")
print(simple_chat(router,
"What voltage does the VCC_CORE pin use on the BCM2837?"))
# RAG-augmented — Bedrock + retrieved chunks from the KB
router.set_default(provider="rag-tiny_hardware_test")
print(simple_chat(router,
"What voltage does the VCC_CORE pin use on the BCM2837?"))

The first call answers with the model’s training data. The second call queries the LlamaIndex vector store, prepends the retrieved chunks, and forwards to the configured base provider. From the source (wintermute/ai/providers/rag_provider.py), the RAGProvider wraps any other registered provider as its base_provider — so the same KB works backed by Bedrock, OpenAI, or Groq without rebuilding the index.

Building a Red-Team Knowledge Base — rag-red_team_manuals

For a real engagement you want the agent grounded in your own offensive references — exploit notes, vendor errata, prior-engagement reports — not just generic web data. Layout:

knowledge_bases/
└── red_team_manuals/
├── rag_config.json
├── docs/
│ ├── uboot-attack-cheatsheet.md
│ ├── i2c-eeprom-extraction.pdf
│ ├── tpm20-quirks-2024.md
│ ├── prior-engagements/
│ │ ├── 2024-Q3-acme-router.docx
│ │ └── 2025-Q1-foo-cam.docx
└── storage_db/ # written after indexing

rag_config.json:

{
"rag_id": "red_team_manuals",
"description": "Embedded systems and red team exploit manuals + prior eng. reports.",
"base_provider_id": "bedrock",
"embed_provider_id": "local_embedder",
"embedding_model": "BAAI/bge-small-en-v1.5",
"vector_store_type": "local",
"document_types": ["pdf", "markdown", "text", "docx"]
}

Index with LlamaIndex + the local embedder so nothing leaves the host:

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from wintermute.ai.providers.huggingface_provider import HuggingFaceProvider
from wintermute.ai.providers.rag_provider import LlamaIndexEmbeddingWrapper
embed = HuggingFaceProvider(name="local_embedder")
embed_model = LlamaIndexEmbeddingWrapper(
provider=embed,
model_name="BAAI/bge-small-en-v1.5",
)
docs = SimpleDirectoryReader("knowledge_bases/red_team_manuals/docs").load_data()
index = VectorStoreIndex.from_documents(docs, embed_model=embed_model)
index.storage_context.persist(
persist_dir="knowledge_bases/red_team_manuals/storage_db")

Once storage_db/ exists, the next init_router() registers it as rag-red_team_manuals automatically — no code change. Switch to it from the console:

onoSendai [acme-iotcam-2026-Q2] > ai rag list
rag-tiny_hardware_test Hardware technical reference...
rag-red_team_manuals Embedded systems and red team exploit manuals...
onoSendai [acme-iotcam-2026-Q2] > ai rag use red_team_manuals
[*] AI default provider set to rag-red_team_manuals
onoSendai [acme-iotcam-2026-Q2] > ai How did we extract the I2C EEPROM on the foo-cam in 2025-Q1?

For shared engagements with several analysts, swap vector_store_type to "qdrant" and point at a remote server ("qdrant_url") — same code, same RAG provider, indices now centralized.

Tools, Path Mapping, and the Global Registry

The tool registry is one of Wintermute’s most useful primitives because it collapses three different ways of giving the LLM a capability into a single surface.

1. Pure Python tools via register_tools

wintermute/ai/utils/tool_factory.py introspects a Python function’s signature and type hints, generates a Pydantic model, derives a JSON Schema, wraps the call so the LLM sees {"result": ...} envelopes, and crucially offloads oversized payloads (raw bytes always, strings >1 KiB) to the WorkspaceManager so blob-returning tools never poison the context window.

from wintermute.ai.tools_runtime import tools, spec_from_tool
from wintermute.ai.utils.tool_factory import register_tools
def lookup_cve(cve_id: str) -> str:
"""Look up a CVE by ID and return vulnerability details."""
# ... query NVD or local DB ...
return "..."
def check_port(host: str, port: int) -> str:
"""Check if a network port is open on the host."""
# ... your scanner ...
return f"Port {port} on {host}: open"
for tool in register_tools([lookup_cve, check_port]):
tools.register(tool)

The factory pulls the docstring as the description, the parameter annotations as the input schema, and the return type as the output. This is the same mechanism CartridgeManager._register_instance_methods uses on cartridge classes — your hand-written tools and a loaded cartridge’s methods land in the same registry.

2. Path-mapped binaries via tools.json

ToolRegistry.load_tool_configs(path) reads a JSON list mapping tool name to a (directory, executable) pair, resolved against WINTERMUTE_TOOLS_ROOT (default /opt). When that name is later registered, the absolute path is appended to the tool’s description so the LLM sees:

{
"name": "openocd",
"description": "Open On-Chip Debugger for JTAG/SWD access\n\nAbsolute Path: /opt/openocd/bin/openocd",
...
}

Concrete example (examples/05-RAG-Knowledge-Bases.ipynb):

[
{ "name": "openocd", "directory": "openocd/bin", "executable": "openocd" },
{ "name": "flashrom", "directory": "flashrom/bin", "executable": "flashrom" },
{ "name": "depthcharge", "directory": "depthcharge/bin","executable": "depthcharge-inspect" },
{ "name": "tpm2_tools", "directory": "tpm2-tools/bin", "executable": "tpm2_getcap" }
]

Why this matters: the RAG context (the red-team manual you just indexed) will say “use openocd to halt the core.” The agent translates that to a tool call. Without path mapping, the LLM either guesses /usr/local/bin/openocd or prepends sudo randomly. With path mapping, the description is grounded against the deployment, so the generated subprocess.run([...]) call is correct on this host whether you’re on a Linux laptop, a NUC in the lab, or a CI runner where everything is in /srv/tools.

3. Dynamic backends — Surgeon and any MCP server

ToolsRuntime.register_backend(backend) wires a ToolBackend (the get_ai_tools() + execute_tool(...) protocol) into the runtime. The shipped SurgeonBackend (integrations/surgeon/backend.py) is a managed subprocess that speaks MCP over stdio; on start() it spawns the integrations/surgeon/server.py FastMCP server and exposes its tools (create_hook_skeleton, list_firmware_symbols, build_firmware, start_fuzzing, get_fuzzer_stats, write_config_file) via the same get_ai_tools() interface used by the static registry.

from wintermute.ai.tools_runtime import ToolsRuntime
from wintermute.integrations.surgeon.backend import SurgeonBackend
runtime = ToolsRuntime()
backend = SurgeonBackend(surgeon_root="/opt/surgeon")
await backend.start()
runtime.register_backend(backend)
# Now ToolsRuntime.run_tool() resolves Surgeon tools first,
# falls back to local registry. The LLM sees one merged list.

This is the seam we will use in Part 4 to mount custom MCP servers next to Surgeon — a Burp Suite MCP, a Maltego MCP, a customer-specific JIRA MCP — without altering anything in the agent code.

End-to-End Example: AI-Enriched Hardware Inventory

Let’s tie it together with the most useful “small” agent task: take a half-blank Processor (“BCM2837 by Broadcom”) and let the LLM populate the Architecture, capabilities, and pinout fields. This is exactly what enrich_processor (the enrich_processor_via_ai MCP tool) does.

from wintermute.ai.bootstrap import init_router
from wintermute.ai.use import simple_chat
from wintermute.ai.tools_runtime import tools, spec_from_tool
from wintermute.ai.utils.tool_factory import register_tools
from wintermute.ai.utils.hardware import enrich_processor
from wintermute.hardware import Processor
# 0. Bring up the router
router = init_router()
# 1. Register a couple of red-team-flavoured tools
def query_red_team_kb(question: str) -> str:
"""Query the red_team_manuals KB and return the top retrieved chunk."""
from wintermute.ai.types import ChatRequest, Message
rag = init_router_with(provider="rag-red_team_manuals") # see helper below
resp = rag.choose(ChatRequest(messages=[Message(role="user", content=question)]))[0]
return resp.chat(ChatRequest(messages=[Message(role="user", content=question)])).content
def cve_for(cpe: str) -> str:
"""Return CVE summaries for a given CPE 2.3 string."""
# placeholder — wire to NVD
return "CVE-2024-XXXX: ..."
for t in register_tools([query_red_team_kb, cve_for]):
tools.register(t)
# 2. Enrich the processor on iot-cam-01 — uses tool_calling_chat under the hood
proc = Processor(processor="BCM2837", manufacturer="Broadcom")
proc = enrich_processor(proc, router=router)
print(proc.architecture.instruction_set, proc.architecture.cpu_cores)
# ARMv8-A 4

enrich_processor (wintermute/ai/utils/hardware.py) builds a ChatRequest with the processor name and manufacturer, lets the model call query_red_team_kb and cve_for to ground its answer, then parses the final response into a populated Processor object. The same flow is exposed from MCP as enrich_processor_via_ai (WintermuteMCP.py:2199) so an external Claude Desktop / Cursor / custom client invokes it identically.

For the IoT camera engagement, run the same enrichment over every device on the operation:

for dev in op.devices:
if dev.processor and not dev.processor.architecture:
dev.processor = enrich_processor(dev.processor, router=router)
op.save()

Two minutes of agent time, two analyst-hours saved, every device on the operation now carries an architecture record the report templates can render.

Operational Notes

  • Cost control. task_tag="cheap" is the cheapest knob. For genuinely
    long jobs (orchestrator + dozens of sub-agents), set
    BEDROCK_MODEL_ID=bedrock/us.anthropic.claude-haiku-4-5-20251001-v1:0 and
    reserve Sonnet for the orchestrator only. Or do the inverse, use Sonnet
    for the orchestrator and Groq for sub-agents — exactly what we do in Part 7.
  • Air-gapped engagements. Skip Bedrock entirely. Register a HuggingFace
    embedding provider, build local-only RAG indices, and supply your own
    on-prem provider that implements LLMProvider. The registry is
    provider-agnostic — litellm is the default vehicle but not required.
  • Workspace hygiene. WorkspaceManager (wintermute/utils/blob_manager.py)
    defaults to ./wintermute_workspace/. Set WINTERMUTE_WORKSPACE_ROOT to
    put dumped firmware on a dedicated SSD; descriptors handed to the LLM are
    {file_path, size_bytes, sha256, type: "binary_blob"} regardless.

What’s Next

Part 4 covers the cartridge system, the MCPRuntime, and how Surgeon (firmware emulation hooks + AFL++ fuzzing) plugs in. We’ll load tpm20, jtag, and firmware_analysis on a live target, compose them, and watch how the cartridge methods become AI tools the orchestrator (Part 6) and the per-run sub-agents (Part 7) call by name.

One response to “Wintermute Framework, Part 3: The AI Router, RAG, and Tool Registry”

  1. […] Part 3 we wired up the AI subsystem. This post is about the capability surface the AI plays with: […]

Leave a Reply

Hey!

I’m Bedrock. Discover the ultimate Minetest resource – your go-to guide for expert tutorials, stunning mods, and exclusive stories. Elevate your game with insider knowledge and tips from seasoned Minetest enthusiasts.

Join the club

Stay updated with our latest tips and other news by joining our newsletter.

Discover more from Exploit.Ninja

Subscribe now to keep reading and get access to the full archive.

Continue reading