You Can't Retrieve What You Don't Know You're Missing

There is a field in one of the systems I work on that has been deprecated for years and comes back empty every single time you read it. But it is named in such a way that every agent thinks that is the one field that they should really really focus on. And it's always empty! Thus, the agent declares it's the wrong environment and starts searching for the correct one among the local containers, tries to see if it assumed the wrong launch config, and basically goes tokenmaxxing on a wild goose chase.

The agent did nothing dumb, but still failed. The field had a name that sounded like it mattered, so it used it. "But is this secretly deprecated tho?" was missing from the agent's idea of what questions were worth asking. This would get a human maybe once or twice tops. But it will get the poor agent every time I don't end the prompt with "...and watch for that field. It's deprecated but not marked as such". This is basically how you get the next HAL, the next Skynet, the next Butlerian Jihad...

This is a long standing problem, because it is the one that no amount of better retrieval fixes, and almost everyone building agent memory right now is building better retrieval. We the agentic AI gourmands, have a tendency towards bit of a herd mentality. The sheer volume of buzz surrounding LLMs often turns a reasonable sounding hypothesis into an assumed fact, cascading into widespread architectural fallacies.

We have seen this happen with RAG, people assumed if there is enough cosine similarity (or HNSW with closest neighbors etc.) the relevant parts of context would elegantly surface. When this did not happen, it was "RAG is dead" or "Graph RAG is everything". In the process, RAG has changed meanings from simply augmenting the generation via retrieval, to using graph or vector searches with each prompt.

Ultimately the puzzle of retrieving the context that the agent doesn't know that it doesn't know, has not been solved as of yet. I didn't solve it either. But I will propose a design pattern, that I believe to be quietly and spontaneously climbing its way towards becoming the de facto standard. I will lay it out along with a lightweight library, that will enable you to personally observe why this pattern is empirically sound.

Unknown unknowns

I'm but a humble dev, and far be it from me to criticize Karpathy the magnifique. But he can at times be a great trigger of the herd mentality. He has recently come up with a lovely knowledge management paradigm called "LLM Wiki", a fairly decent pattern that hinges on 3 feet. Ingestion, querying and gardening. This 3 prong scheme has a glaring weakness that I'd like to criticize. This requires the agent to initiate the querying. The agent realizes it is unsure about something, forms a query, calls a tool, gets context back, carries on. Vector store, knowledge graph, grep over a notes folder, a Jira MCP, it does not matter. The pattern is the same. The agent decides it has a gap, then goes and fills it.

There is a precondition buried in that loop: the agent has to know the gap is there to pull. While recent models have shown improvement in this regard, you just can't rely on them as of now. LLMs are still notorious for just making up the gap. If a model sees "topic" and "publish" in your docs without explicitly naming the technology, it assumes Kafka and is like, "GCP who?". If it is supposed to pull a DLQ entry, it will waste a quarter of its context window trying to figure out the Kafka setup. Possibly even conclude the implementation is missing and start to implement one.

The half that is missing then, is push: something that finds the fact the agent lacks and injects it in the background. Something that will ask "Oh are we doing CQRS? Alright let me go find out about the messaging architecture".

Somatic memory with hooks

Firstly, we need to briefly explore what hooks are. Hooks are runtime events that the harness exposes. These are usually things like "Stop" where the LLM finishes responding or "UserSubmitPrompt" which fires when the user sends the prompt, but before it reaches the model. The user can attach executables to these events. These executables "automatically" run without intentional invocation by the agent. For instance, every time my model makes a tool call, a programlet can check if it contains banned commands. Or on SessionEnd, I can upload the conversation to somewhere.

Just as MCP protocol has become the standard for pull oriented tools, an emerging surface for push tooling is the hooks that are exposed by most harnesses. We will be utilizing both. Anthropic first published the MCP standard, now it looks like their hooks are becoming the standard. Below is a table for foremost legacy harnesses and a few recent up and coming ones, that adopt hooks with some divergence in their standards:

Harness Cohort User Submit hook equivalent Wire contract vs Claude Code
Claude Code (Anthropic) established UserPromptSubmit original gangster
Codex CLI (OpenAI) / Qwen Code (Alibaba) established / 2026 UserPromptSubmit adopt it wholesale
Gemini CLI (Google) established BeforeAgent adopts it, renames the events
Cursor (Anysphere) established beforeSubmitPrompt its own vocabulary but maps to it
pi (Mario Zechner) 2026 newcomer input (pi.on("input"), can rewrite the prompt) its own JS extension API, not stdin but adaptable
CodeWhale (formerly deepseek-tui) 2026 newcomer message_submit adopting it now (issue #1364)

Look down that last column. An increasing percentage of harnesses either takes Claude Code's wire format wholesale or with a new vocabulary onto it, the most divergent example being pi. When I looked at the 7 of the high popularity harnesses that came out in 2026, I found 5 of them have hooks comparable to Anthropic's. The rate drops the earlier you go. So this is something of an "underground" movement among the cooler people for now.

What makes it worth pointing at is how little noise it made on the way in. MCP arrived with a campaign behind it. "USB-C for LLMs" was the line, it was a good line, and within a season not adopting MCP had started to look like a decision you would be asked to defend. Hooks got none of that, no "this is now the standard", and the harnesses settled on Claude Code's contract regardless, one after another, each of them arriving at the same need on its own. An "emergent" standard, where human was very much in the loop.

That retrieval step is a free variable. It can be grep over a folder of markdown, a vector store, a knowledge graph, or a NotebookLM instance that is filled with the past chats. Swap one for another and nothing above it has to move. You have your body of knowledge and a sense of what the unknown unknowns can be. The design pattern around it can be relatively stable. In fact, let us do a minimal hands on experiment with it.

A small library you can run

Back near the top I promised a small library you could run yourself to watch the pattern in operation. It is at: github.com/esinecan/somatic-memory

It is called somatic-memory for now. Not the best name since the MCP half of it is opposite of somatic. Somatic systems of the body work without our conscious input. The speed of your pulse increases as your muscle output does. It doesn't require you to think "Let me send more blood to my calves". Your iris reacts to light.

Which is why only one of the two components fits the name. The hook is truly "somatic". It fires on your prompt and pushes context in before the model reads a word, no action from the model needed. Like the iris reacting to light. But we do also squint when it's too bright right? The other component is a plain MCP tool called recall, and the agent reaches for it the moment it knows it has a gap. That one is a deliberate, conscious query, the exact opposite of somatic (agentic?), and corresponds to the squint.

In the end, all this engine does one small thing: given the conversation, hand back what is relevant. Because it only reads and never writes, the thing it reads from is entirely up to you: a folder, a vault, a database, a live search. The library calls each one an oracle. Treat them as scaffolding for your customizations. Although the ones besides Web oracle are legitimately usable as is.

Oracle What it reads
filesystem a folder of text files, the zero-config default
obsidian the same, plus it follows wikilinks one hop into linked notes you never searched for
sqlite full-text search over a database something else filled
web a live search fetch, for a source you neither own nor wrote

Swap one row for another and nothing above it has to change. That is the free variable from earlier. I point my own setup at a cloud notebook I already keep, NotebookLM as it happens, but the filesystem oracle needs no cloud and no account at all. The plainest version is grep over a folder of notes, and that already does most of the work. A surprisingly effective way that I will always stand by.

The design space

The library above is aimed being the simplest representation of a design pattern. To me, contextsmithing via harness hooks (setting aside "which hook" question) have 4 basic axes. These are immediate angles to fork and tailor the project above for yourself:

1. Personal against institutional: A continuity setup remembers one person across their own sessions and harnesses, the daemon that has been reading over your shoulder for months. An institutional one holds what a team or an org knows, the shared store every member's agents read from. The engine is the same. What changes is whose memory it is and who it answers for.

2. Cloud against local: The oracle can be a hosted thing you query over the network, a cloud notebook or a managed database, or it can be a folder of files on your own disk that you search with grep. Nothing in the pattern prefers one. A simple folder somewhere in C:/, containing just a bunch of lowly, barely past stone age txt files, that get ripgrepped, synthesized and pushed to the main agent's context in a "somatic" manner, is genuinely surprisingly beneficial.

3. Read only against bidirectional: Everything so far only reads. A bidirectional version also writes, noticing when you correct the agent and keeping that lesson for next time. The write path is where the real difficulty lives, telling a genuine correction from a passing change of mind and keeping the store from rotting as it fills, which is why it is a later piece and not this one.

4. Sync against async: A sync hook blocks the prompt going to LLM until the context is assembled. An async one fires and lets the context arrive some time (depending on what you're having it do) later, while the agent is already working. A fully autonomous run, no human, can afford to block, and blocking means the agent does not start down a wrong path. But in interactive sessions a non-blocking somatic retrieval would be the smarter choice. The past context maybe arrives a little later in the background. I can afford that while I am actively monitoring.

A fair worry at this point is cost. A fresh retrieval pass on every prompt sounds like a lot. The push agent's brain is yours to size, though. At the cheap end it is regex and heuristic tomfoolery that never calls a model at all. At the other end it is the most capable, most "too dangerous to ship" frontier model you can find. Cost and speed move with that choice.

This is just one framing of the architecture, and probably not a complete one. None of the four axes has a default answer either. At work you might want the whole team sharing a cloud, knowledge-graph oracle that can also do backflips. For your very first home assistant, a filesystem ripgrep that returns thirty lines on either side of each hit could be plenty.

The middle ground keeps getting cheaper. Slotting in a small local model to run smarter searches used to be a luxury, but with capable local models everywhere now it often costs less than letting the main model go exploring on its own. Let's walk through a couple of concrete setups:

My home setup could be categorized as "personal, cloud, unidirectional, async" based on these axes. For home PCs of myself and my wife, I have notebooklm instances that are asked up to 3 questions by a Deepseek agent who interpreted the whole context. Once its questions are answered, same Deepseek agent synthesizes its tips and injects back into context between "<subconscious>" tags. It's a rather hefty one but it's great at creating the UX of one continuous identity. The agent only does read only stuff. Our chats are wholesale ingested into NotebookLM instances by a separate cron job.

When I was going to write this article, I scoured the land for similar architectures. It was a great sanity check to find Tomaz Bratanic's Neo4j based setup that has a very similar design. I can sense the 4 axes in his thinking as well: "personal, local, bidirectional, and sync". It is personal in the same sense mine is, one developer's memory carried across Claude Code, Codex and Cursor. It is local, a Neo4j DB he runs himself instead of a hosted service. He has a "dream phase" implementation that corresponds to linting and ingestion phases of Papa Karpathy's "LLM Wiki" but similar to mine, it is decoupled from the hook + mcp remembering architecture, but because his hooks log every event into Neo4j, it is bidirectional.

I have also found some fellow travelers in CodeWhale. I liked it quite a lot as a community driven tool and wanted to try out, but its hooks could watch a user message go by without changing it, so the somatic injection I lean on elsewhere had no clean way in. So I filed an issue (#1364) asking for two things: a message_submit hook that can actually rewrite the text before it reaches the model, and a turn-end event for the fire-and-forget critique pass. What happened next was the encouraging part. The maintainer (Hmbown) agreed it made sense, another contributor (LeXwDeX) argued the hook layer should just mirror Claude Code's contract one to one, a third (AresNing) wrote the first implementation, and within weeks the maintainer put it together into a PR. I did not have to push any of it along. The thread filled up with people who already wanted the same thing, which is that quiet convergence from the table happening right in front of me, one human-in-the-loop decision at a time.

Pull already has a name, agentic RAG, and the MCP tool in here is exactly that, kept for every time the agent does know it has a gap. The push half never got one, neglected maybe in the backwash of semantic RAG disillusionment. I call it somatic RAG. Agentic RAG, meet your other half. My setup runs both against the same memory stores, and I do not read them as a trade-off.

So go back to that field from the start. The agent that burned a third of its window chasing the wrong environment was not stupid and was not short on retrieval. It failed because the one fact that would have saved it, that the field is dead and comes back empty, was never in its idea of what to ask. No bigger window and no better embedding reaches that, because the query never fires. The only thing that reaches it is something already watching the conversation, probing near that fact, ready to rush it back the moment it surfaces. You cannot retrieve what you do not know you are missing. Something else has to push it to you.

This skeleton will get converged on from a lot of directions, with write-back, team versions, gardeners and ontologies layered on top. The code is at the repo, the name is a placeholder, and there is nothing to install. Clone it, point it at a folder of your own notes, and wait for it to surprise you with some nuance you forgot you knew.

An MCP tool to let AI debug any MCP server on the fly

An MCP server that connects to other MCP servers, so the AI assistant can test and debug them from inside the same conversation. 400 lines of TypeScript, one dependency.

The LLM as Intern Fallacy

Why treating LLMs as junior developers who need supervision misses the architectural point entirely. They're not interns. They're power tools that need jigs.

mcp-inspector-as-mcp-server Meta-tool for MCP development

An MCP server that connects to other MCP servers, enabling AI assistants to test, debug, and iterate on tool implementations from inside the conversation. Persistent sessions, event ring buffer, and human-in-the-loop steering. 400 lines, one dependency.

agentic-ai-browser Autonomous browser agent

LLM-powered web automation with DOM snapshot pruning for context management. Single-agent architecture. Won "Most Innovative Project" at the 2025 Agentic AI Innovation Challenge. 162 stars.

skynet-agent Digital proto-individual

Autonomous AI agent with dual-layer memory (passive RAG + conscious volitional), Neo4j knowledge graph, and a secondary "motivator" agent that tasks the model with self-maintenance when idle. 128 stars.

Most Innovative AI Project

Ready Tensor, Agentic AI Innovation Challenge -- April 2025

Adaptive Autonomous AI Co-Browser. Recognized for innovation in autonomous web agents, local-first architecture, and human-in-the-loop agentic workflows. Selected among global submissions.

2025 -- Present

Senior Engineer

Forto -- Berlin

Pricing engine infrastructure for freight logistics. Weight bracket matching, partial quote integration, depot rate selection, local charges and customs. 80+ PRs across rates-management, pricing engine, and TMS. Created Faro MCP (312 tools from 16 service APIs) and Cortex-EM/PM for engineering workflow automation. 100+ code reviews.

TypeScript NestJS MongoDB AWS Kubernetes MCP
2025

AI & Backend Engineer

Freelance -- Berlin

Contract work bridging backend infrastructure and AI agent systems. Built agentic-ai-browser (162 stars, award-winning) and began skynet-agent. RAG pipelines, LangChain orchestration, AWS infrastructure.

Java Spring Boot LangChain RAG AWS React
2019 -- 2025

Senior Backend Developer

CompuGroup Medical -- Berlin

Led monolith-to-microservices decomposition across Spring Boot and Kafka. Migrated messaging from ActiveMQ to Kafka. Established CI/CD pipelines with Jenkins and Kubernetes, cutting deployment cycles by 40%. Built ELK dashboards for incident detection.

Java Spring Boot Kafka Kubernetes AWS ELK
2017 -- 2019

Backend Developer

Giata GmbH -- Berlin

Transformed PHP monolith into Node.js microservices, cutting page load times from 2s to under 1s. Automated AWS CloudFormation processes. RabbitMQ for service communication.

Node.js TypeScript AWS RabbitMQ
2016 -- 2017

Java Microservices Engineer

AUTO1 Group -- Berlin

Spring Cloud microservice architecture. Implemented ELK-based monitoring, reducing root-cause analysis time from days to hours.

Java Spring Cloud ELK AWS
2015 -- 2016

Head of Engineering

Molto DS -- Istanbul

Led engineering for first product using MEAN Stack, Docker, and Azure. Implemented testing frameworks with Jasmine and Karma.

Node.js MongoDB Docker Azure
2013 -- 2015

Senior Software Engineer

Garanti Teknoloji -- Istanbul

Web applications for pension customers and insurance sales using AngularJS and Spring MVC. Reporting systems. Internal knowledge sharing portal.

Java Spring MVC AngularJS
2011 -- 2013

Software Development Specialist

Bilge Adam / Veripark -- Istanbul

Enterprise portals for Borusan Automotive and Borusan Holding. Internet banking application with J2EE, JSF, and Oracle.

Java J2EE Oracle

Koc University

BS Computer Engineering (Full Merit Scholarship) -- 2005-2011

Izmir High School of Science

2002-2005

Turkish Native
English Full professional
German Conversational