Whether your AI agent gives a useful answer or confidently makes something up depends, more than anything, on context management—how conversation history, tool outputs, and external knowledge get packed into a model's finite context window. Before OpenClaw 2026.3.7, that logic was hardcoded. Now it's a plugin.
The Problem
OpenClaw used sliding-window compaction: conversation gets too long, old messages get summarized, new messages get room. It worked. But it had trade-offs, and not everyone was happy with them:
- •Summarization loses detail. A coding agent needs to reference a function from 50 turns ago? Gone.
- •No memory across sessions. Close the session, lose the context. Every conversation starts from zero.
- •One strategy, take it or leave it. Want RAG-based assembly? Conversation branching? Custom token budgets? No clean way in.
People worked around it—monkey-patching internals, forking the core, wrapping it in external orchestration. None of it was sustainable.
ContextEngine: A Plugin Slot for Context Management
2026.3.7 takes the entire context lifecycle, extracts it into a well-defined interface, and opens it up as a plugin slot. Write a plugin, implement the interface, and you own context management.
How It Plugs In
Register your engine through the plugin API:
// In your plugin's bootstrap
export default function myContextPlugin(api: PluginAPI) {
api.registerContextEngine('my-engine', (config) => {
return new MyCustomContextEngine(config);
});
}
Select it in config:
plugins:
slots:
contextEngine: my-engine
No plugin configured? OpenClaw wraps the old behavior in a LegacyContextEngine. Nothing changes for existing users.
Seven Lifecycle Hooks
The interface gives you seven hooks—one for every stage that matters in a conversation turn:
1. `bootstrap()`
Engine starts up. Connect to your vector DB, build your graph, load saved state.
async bootstrap(): Promise<void> {
this.vectorStore = await connectToVectorDB(this.config.dbUrl);
this.sessionGraph = new DAG();
}
2. `ingest(message: Message)`
A new message lands—user input, assistant response, tool output. You decide how to store and index it.
async ingest(message: Message): Promise<void> {
// Add to the DAG
const node = this.sessionGraph.addNode(message);
// Index for retrieval
const embedding = await embed(message.content);
await this.vectorStore.upsert(node.id, embedding, message);
}
3. `assemble(budget: TokenBudget): AssembledContext`
The big one. Before every model call, OpenClaw hands you a token budget and asks: build me a context. What you return is exactly what the model sees.
Different engines, radically different strategies:
async assemble(budget: TokenBudget): AssembledContext {
const recentMessages = this.sessionGraph.getRecent(budget.soft * 0.6);
const relevantHistory = await this.vectorStore.query(
this.currentQuery,
budget.soft * 0.3
);
const systemContext = this.buildSystemPrompt(budget.soft * 0.1);
return {
system: systemContext,
messages: [...relevantHistory, ...recentMessages],
tokenEstimate: this.estimateTokens([systemContext, ...relevantHistory, ...recentMessages]),
};
}
4. `compact()`
Context blew past the hard token limit. Time to slim down. The default engine summarizes old messages. Your plugin could prune graph nodes, offload to a vector store, or skip it entirely if assemble() already stays within budget.
5. `afterTurn(turn: Turn)`
A full turn is done—user spoke, agent responded. Good time to persist state, update indexes, or kick off background work.
6. `prepareSubagentSpawn(parentContext: Context): SubagentContext`
The agent is spawning a subagent. How much context does the child get? Everything would blow its token budget. Nothing would leave it blind. This hook lets you be precise about it.
prepareSubagentSpawn(parentContext: Context): SubagentContext {
// Give the subagent a focused slice of context
const relevantNodes = this.sessionGraph.getSubtree(parentContext.taskId);
return {
messages: relevantNodes.map(n => n.message),
metadata: { parentSessionId: this.sessionId },
};
}
7. `onSubagentEnded(result: SubagentResult)`
The subagent finished. Its results need to come back into the parent context somehow—merge everything, summarize, cherry-pick. Your call.
Architecture: Slots vs. Hooks
ContextEngine is a slot, not a hook. Hooks are additive—ten plugins can all listen to onMessage. Slots are exclusive. One ContextEngine at a time.
┌─────────────────────────────────────┐
│ Plugin Registry │
│ │
│ Hooks (additive): │
│ onMessage → [plugin1, plugin2] │
│ onTool → [plugin3] │
│ │
│ Slots (exclusive): │
│ contextEngine → my-engine │
│ (default: LegacyContextEngine) │
└─────────────────────────────────────┘
At startup, OpenClaw reads plugins.slots.contextEngine, finds the registered factory, and instantiates it. If the engine doesn't exist, startup fails loudly. No silent fallback—that's a deliberate choice. You should know what context engine you're running.
Subagent isolation uses AsyncLocalStorage: each child gets its own scoped runtime. Plugin state doesn't leak across agent boundaries.
What People Are Building Already
Lossless-Claw
GitHub · Martian Engineering
The first serious ContextEngine plugin. Replaces sliding-window compaction with a DAG-based summarization system that keeps every original message while staying within token limits.
- •Every message becomes a node in a directed acyclic graph
- •Nodes cluster into "episodes" by topic
- •Over budget? Old episodes get summarized, but the originals stay in the graph
- •If a later turn references old content, the engine pulls the original—not the summary
If you've ever had a coding agent forget a function it wrote an hour ago, this is the fix.
MemOS Cloud Plugin
GitHub · MemTensor
Does one thing: gives your agent memory that survives between sessions.
- •On
bootstrap(): pulls relevant memories from MemOS Cloud based on the opening message - •On
afterTurn(): saves new turns back to the cloud - •On
assemble(): injects recalled memories into system context
Your agent remembers last week's conversation, your preferences, your projects. You stop repeating yourself.
What's Next
From GitHub issues and Discord, people are working on:
- •RAG-native assembly: Skip message history, assemble context from retrieved document chunks. OpenClaw as a conversational search engine.
- •Multi-agent shared memory: Multiple agents sharing a knowledge graph. Collaborative workflows without redundant context.
- •Token-budget optimization: Dynamically tuning context composition based on which model you're using—its pricing, its strengths, its context length.
- •Conversation branching: Tree-structured context. Explore different paths, switch back, keep everything.
Why This Changes the Game
OpenClaw could always add channels, models, and tools. But context—how the agent actually thinks—was locked inside the core. You couldn't touch it without forking.
ContextEngine cracks that open. And once it's open, things start to compound:
- 1.Plugin developers can finally compete on the hardest problem in agent UX: context quality.
- 2.Enterprise users get compliance controls—redact before the model sees it, enforce retention, audit every prompt.
- 3.Researchers can test new context strategies without maintaining a fork.
- 4.Model providers can ship plugins tuned to their architectures. Long-context models and short-context models shouldn't use the same strategy.
That's what a framework becoming a platform actually looks like. Not a rebrand. Not a blog post announcing a "vision." A concrete architectural change that makes the ecosystem self-reinforcing. More plugins → more users → more developers → more plugins. Once that loop starts, it's hard to stop.
The lobster just grew a new claw.