These days most people using agentic coding will agree on the importance of "context". A common parallel is the new developer: if we optimize for new employees pushing code to production (safely and successfully) in their first week, this generally includes context like clear setup information, task and codebase documentation, defined patterns, style guides, and more.
However, for such an important topic most of the news I see day-to-day focuses on the nuts and bolts: Memory systems, discoverability, and fresh tactics for tuning the length of context files. There's a gap: how we identify what belongs in our "context" and how to troubleshoot and iterate when it underperforms.
Background: The Agentic Coding Loop
When I talk about context, I sometimes simplify how agents work to a straightforward loop and treat both the prompt we provide and the information the agent can collect as "context".
This is overly simplified, but useful:
flowchart LR Begin(Trigger) --> Context[Understand Context] Context --> Application[Perform Action] Application --> Verification[Verify Output] Verification --> |Feedback| Application Verification --> |Pass| HumanReview([Human Review]) HumanReview --> Begin
We trigger a coding agent with a prompt, directly or through a proxy like a Slack message or work ticket. The agent gathers information from the codebase and any other sources we've provided that seem relevant. It codes. Then it runs any guardrails we've provided (sensors) as a feedback loop, before ultimately handing things back to a human for assessment.
When Context is Missing
Noticing that agent output is unexpected is fairly easy, but knowing what information was missing that led to that result tends to be the hard part. A discussion on memory systems or methods to shorten a system prompt can be interesting, but don't help identify the gap directly in front of us.
For example, we tell the agent to add a new field to the chart of accounts, but the agent (or new employee) only has a shallow understanding of what we're talking about based on general knowledge and some quick searches of the codebase. Unfortunately, we built it to be a "chart of accounts" 3 years ago, then started using it for something very different and never renamed it, and both our new hire and agent are doing their best to copy, paste, and lightly modify neighboring code to close out that ticket.
So let's discuss "context" in more specific terms.
Defining Context
As humans, we often don't need to break context down into clearly defined artifacts. We operate on accumulated experience; a mix of personal experience, tribal knowledge, and zero-to-some available artifacts.
When we talk about proactively building context for an agent, though, a taxonomy can help us identify what types of artifacts need to exist to bring the agent up to speed.
I've grouped these artifacts in 4 categories:
- Intent
- Action Directive: the task type and execution the agent is expected to perform
- Outcome Statement: the business or user goal and success criteria the change will achieve
- Constraint List: boundaries, non-negotiables, and scope limits for this specific request
- Semantic Context
- Domain lexicon: definitions of terms with specific business domain meaning
- Restricted term list: terms with reserved or sensitive meaning to our business that we avoid all other uses of
- Codebase lexicon: local naming patterns, abbreviations, and identifier conventions specific to this codebase
- Positional Context
- Business Domain map: business architecture like capability models and value streams, used to locate where changes apply within business and user flows
- System model: system topology, interconnections, and internal architectural conventions used to locate relevant code and understand the neighboring structure
- Risk and Impact register: the known risks, costs, and sensitivities associated with systems, components, and business flows used to evaluate the blast radius impact of changes
- Normative Context
- Performance Standards: internally defined thresholds for performance, accessibility, and reliability the output must meet
- Compliance register: regulatory, licensing, and compliance requirements and the codified policies for adhering to them
- Security control list: the security policies and controls the systems must conform to
- Pattern library: authoritative positive and negative examples that compose style, conventions, and standards into reusable patterns
Intent represents what we are asking for, Semantic Context is understanding the words we're using, Positional Context is understanding the territory, and Normative Context is understanding the standards, rules, and patterns we conform to.
Apply the taxonomy to improve agentic output
Now that we have a taxonomy, we can start using it to ask more specific questions when the coding harness produces content that is missing necessary understanding.
For instance:
- Step 1: we execute a prompt and see the result is clearly off in some manner
- Step 2: Let's review the taxonomy to pin down what type of miss it is, bottom up
- Normative
- Did it miss using an expected pattern, use one we are avoiding, use a poor example?
- Did it miss a security control or test case?
- ...and so on...
- Positional
- ...
- Semantic
- ...
- Intent
- ...
- Normative
- Step 3: Does the context already exist?
- Yes, but the agent didn't find it
- Yes and the agent found it, but didn't use it
- Yes and the agent found it, but used it poorly
- No
- Step 4: If we run the prompt again with the context pasted in, is the result now aligned with what we were expecting?
- Step 5: Add or adjust the correct type of context, then try again
Our goal is not to fix one prompt, but to build a brain that can match the clues we put together in our first 6-12 months on the job so it will understand what we're saying to a greater and greater degree (we want to raise up the Agent, not dumb down our own speech).
Consider automated verifications
I believe adding a suite of automated verifications is necessary for context to work at organizational scales, past the initial go-live. As we identify and correct that pool of context, we need a way to understand what impact our changes are having and whether we've plugged one hole by re-opening another from 6 months earlier.
Take the process example above: if we extracted the prompt and the context we expected the agent to access from that loop, then it should be possible to build a library of automated verifications over time. Our future selves then have a signal if we break an earlier expectation so we have the information needed to make a conscious decision.
This pattern is not new. It mirrors the progression companies tend to follow from testing through tribal experience, to consistent written test plans, to fully automated test suites.
Context is only a flywheel if it's designed
Context is clearly important, in employees as well as agents. Companies that onboard developers to a high level of effectiveness in 3 months tend to outperform those that take 6-12, and now we're trying to onboard agents in a single prompt.
The taxonomy above can help in the design and assessment, to use alongside all of those great posts on memory systems, progressive disclosure tactics, access methods, and more. By its nature, managing context is going to be an evolutionary effort, which means a lot of our methods of performing continuous improvement, continuous assessment, and other practices will be necessary to keep that flywheel spinning, which I expect to follow-up on more in a later post.