Skip to content

feat(route): add /humanlayer/blog route#21565

Open
zj1123581321 wants to merge 3 commits intoDIYgod:masterfrom
zj1123581321:feat/humanlayer-blog
Open

feat(route): add /humanlayer/blog route#21565
zj1123581321 wants to merge 3 commits intoDIYgod:masterfrom
zj1123581321:feat/humanlayer-blog

Conversation

@zj1123581321
Copy link
Copy Markdown

Involved Issue / 该 PR 相关 Issue

Close #

Example for the Proposed Route(s) / 路由地址示例

/humanlayer/blog

New RSS Route Checklist / 新 RSS 路由检查表

  • New Route / 新的路由
  • Anti-bot or rate limit / 反爬/频率限制
    • If yes, do your code reflect this sign? / 如果有, 是否有对应的措施?
  • Date and time / 日期和时间
    • Parsed / 可以解析
    • Correct time zone / 时区正确
  • New package added / 添加了新的包
  • Puppeteer

Note / 说明

  • New route: /humanlayer/blog — Scrapes humanlayer.dev/blog for blog posts on AI agents, context engineering, and coding best practices
  • Full article content fetched and cached via cheerio + ofetch
  • Supports ?limit= query parameter
  • Radar rules included for browser extension detection

via HAPI

Co-Authored-By: HAPI noreply@hapi.run

🤖 Generated with Claude Code

Add RSS feed for HumanLayer blog (www.humanlayer.dev/blog), scraping
blog listing and full article content with cheerio.

via [HAPI](https://hapi.run)

Co-Authored-By: HAPI <noreply@hapi.run>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

Successfully generated as following:

http://localhost:1200/humanlayer/blog - Success ✔️
<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:atom="http://www.w3.org/2005/Atom" version="2.0">
  <channel>
    <title>HumanLayer Blog</title>
    <link>https://www.humanlayer.dev/blog</link>
    <atom:link href="http://localhost:1200/humanlayer/blog" rel="self" type="application/rss+xml"></atom:link>
    <description>HumanLayer Blog - Powered by RSSHub</description>
    <generator>RSSHub</generator>
    <webMaster>contact@rsshub.app (RSSHub)</webMaster>
    <language>en</language>
    <lastBuildDate>Tue, 31 Mar 2026 06:51:52 GMT</lastBuildDate>
    <ttl>5</ttl>
    <item>
      <title>Long-Context Isn&#39;t the Answer</title>
      <description>&lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;Anthropic just switched the default model in Claude Code to Opus 4.6 with a 1M context window. We tried it when it launched. But now we&#39;re switching back to Opus 4.5&lt;/p&gt;
        &lt;h2 class=&quot;text-3xl mb-3 mt-20 font-bold&quot; style=&quot;color:var(--text-primary)&quot;&gt;## &lt;!-- --&gt;How about some context?&lt;/h2&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;Unlike previous Claude Opus models which have a ~ 200k context window, Opus 4.6 has as 1M context window. This was very exciting for us!&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;We spend a lot of time working on hard problems in large enterprise codebases, and if the model can reliably hold more of the codebase in its head at a time, so to speak, then that creates new possibilities for the scale of problem that can be solved in a single context window.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;It would mean less compaction (whether &lt;a href=&quot;https://www.humanlayer.dev/blog/advanced-context-engineering&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;frequent intentional compaction&lt;/a&gt; or auto-compact), and less &quot;context pressure&quot; to solve the problem before your context window fills up.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;As Calvin French-Owen, who helped launch Codex last year, &lt;a href=&quot;https://youtu.be/qwmmWzPnhog?si=ReRWTavUKTItIGGr&amp;amp;t=918&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;puts it in an episode of YC&#39;s Lightcone podcast&lt;/a&gt;:&lt;/p&gt;
        &lt;blockquote class=&quot;border-l-4 pl-4 my-4 italic&quot; style=&quot;border-color:var(--accent);color:var(--text-muted)&quot;&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;...imagine you&#39;re a college student. You&#39;re taking an exam. In the first five minutes of that exam, you&#39;re like, &quot;Oh, I have all the time in the world. I&#39;ll do a great job. I&#39;ll think through each of these problems.&quot;&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;Let&#39;s say you have like five minutes left and you still have half the exam left. You&#39;re like, &quot;Oh man, I just got to do whatever I can.&quot; Like, that&#39;s the LLM with a context window&lt;/p&gt;
        &lt;/blockquote&gt;
        &lt;h2 class=&quot;text-3xl mb-3 mt-20 font-bold&quot; style=&quot;color:var(--text-primary)&quot;&gt;## &lt;!-- --&gt;More context, less instruction adherence&lt;/h2&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;While the context window is dramatically larger, we noticed over the course of a couple of weeks that instruction adherence was dramatically degraded, and not just at longer context lengths.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;Even well-within what we would consider the &lt;a href=&quot;https://youtu.be/rmvDxxNubIg?si=O17nmS3SScaAkpp-&amp;amp;t=355&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;smart zone&lt;/a&gt; of a 200k-context frontier model, the model was less precise. It would ignore design documents and other inputs when writing a plan file. It would make trivial mistakes, or misunderstand simple instructions - or worse, directly disobey them.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;At longer context lengths, the degradation was even steeper - like the user instructions were getting drowned out by the intermediate tool results and mass of accumulated context.&lt;/p&gt;
        &lt;h2 class=&quot;text-3xl mb-3 mt-20 font-bold&quot; style=&quot;color:var(--text-primary)&quot;&gt;## &lt;!-- --&gt;What determines instruction adherence?&lt;/h2&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;We&#39;ve written about the concept of &lt;a href=&quot;https://www.humanlayer.dev/blog/skill-issue-harness-engineering-for-coding-agents&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;the instruction budget&lt;/a&gt; before - a measurable property of LLMs which describes how many instructions they can follow reasonably well before instruction adherence drops off.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;It&#39;s different for each model, but it&#39;s a function primarily of the quality of the model&#39;s instruction tuning, and more importantly for our purposes, it is &lt;strong style=&quot;color:var(--text-primary)&quot;&gt;strongly correlated with the size of the model&lt;/strong&gt;.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;&lt;img src=&quot;https://www.humanlayer.dev/blog/writing-a-good-claude-md/instructionfollowing.png&quot; alt=&quot;Instruction following&quot; class=&quot;max-w-full h-auto my-4 rounded-none&quot; style=&quot;border:1px solid var(--border-color)&quot; referrerpolicy=&quot;no-referrer&quot;&gt;&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;You can clearly see that the larger models from the &lt;a href=&quot;https://arxiv.org/pdf/2507.11538&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;study&lt;/a&gt; can follow dramatically more instructions before adherence drops off.&lt;/p&gt;
        &lt;h2 class=&quot;text-3xl mb-3 mt-20 font-bold&quot; style=&quot;color:var(--text-primary)&quot;&gt;## &lt;!-- --&gt;Instruction adherence at long context lengths&lt;/h2&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;Why does this matter?&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;When a lab offers an extended-context version of a model, you&#39;re usually not getting a bigger model with more parameters and therefore a larger &quot;instruction budget&quot; to go with the larger context window - you&#39;re likely getting the same model with some clever math (e.g. &lt;a href=&quot;https://arxiv.org/pdf/2309.00071&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;YaRN&lt;/a&gt;) to extend the sequence length the model can attend to, and probably some more post-training to stabilize the model at the longer sequence lengths.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;This means that while the context window size increases, the instruction budget remains the same. You can fit more context and more instructions in the context window, but the model isn&#39;t actually better at attending to those instructions over the context length.&lt;/p&gt;
        &lt;h2 class=&quot;text-3xl mb-3 mt-20 font-bold&quot; style=&quot;color:var(--text-primary)&quot;&gt;## &lt;!-- --&gt;Instruction-following as needle in a haystack&lt;/h2&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;You can think of your context window as a haystack where all of your tool calls and documents and files are hay - every line in your CLAUDE.md file, every instruction in your tool descriptions, every tool result, every instruction in your system prompt, and every user message.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;The quality &amp;amp; correctness of the agent&#39;s next step depends on the LLM&#39;s ability to find a needle in there: namely, the instruction(s) in the context window that are most-relevant to the current state of the context window, which give it the information it needs to make the correct decision about its next action.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;Now imagine we increase the size of the haystack by 500% - but the size of the needle remains the same. Unless our ability to find the needle also increases by 500%, we will have a dramatically harder time finding it. The extra context isn&#39;t really helping us - it&#39;s just digging us deeper into the dumb zone.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;&lt;img src=&quot;https://www.humanlayer.dev/blog/skill-issue/long-context.jpg&quot; alt=&quot;long context&quot; class=&quot;max-w-full h-auto my-4 rounded-none&quot; style=&quot;border:1px solid var(--border-color)&quot; referrerpolicy=&quot;no-referrer&quot;&gt;&lt;/p&gt;
        &lt;h2 class=&quot;text-3xl mb-3 mt-20 font-bold&quot; style=&quot;color:var(--text-primary)&quot;&gt;## &lt;!-- --&gt;What Works Instead&lt;/h2&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;Instead of trying to stuff as much information in a context window as possible and having the LLM reason over it, we found that the context management techniques we had been using had to be used even &lt;em style=&quot;color:var(--text-secondary)&quot;&gt;more&lt;/em&gt; aggressively to keep Opus 4.6 1M on track.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;In my post on &lt;a href=&quot;https://www.humanlayer.dev/blog/skill-issue-harness-engineering-for-coding-agents&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;harness engineering&lt;/a&gt;, I wrote about this as the limit case for using sub-agents:&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;&lt;img src=&quot;https://www.humanlayer.dev/blog/skill-issue/limit-case.png&quot; alt=&quot;limit case&quot; class=&quot;max-w-full h-auto my-4 rounded-none&quot; style=&quot;border:1px solid var(--border-color)&quot; referrerpolicy=&quot;no-referrer&quot;&gt;&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;The day after I published it, I decided maybe it wasn&#39;t such a bad idea, and sat down and wrote this skill:&lt;/p&gt;
        &lt;pre class=&quot;group&quot;&gt;&lt;pre class=&quot;prismjs rsh-code-block grid relative min-w-[250px] mb-6 bg-gray-400/5 p-4&quot;&gt;&lt;button data-slot=&quot;button&quot; class=&quot;inline-flex items-center justify-center gap-2 whitespace-nowrap rounded-none text-sm font-mono font-medium cursor-pointer disabled:cursor-not-allowed disabled:pointer-events-none disabled:opacity-50 [&amp;amp;_svg]:pointer-events-none [&amp;amp;_svg:not([class*=&#39;size-&#39;])]:size-4 shrink-0 [&amp;amp;_svg]:shrink-0 outline-none focus-visible:ring-[3px] uppercase tracking-wider border bg-transparent text-[var(--accent)] border-transparent hover:bg-[var(--accent)]/10 hover:border-[var(--accent)] focus-visible:border-[var(--accent)] focus-visible:ring-[var(--accent)]/50 size-9 absolute top-2 right-1 h-6 w-6 opacity-0 group-hover:opacity-100 transition-opacity duration-200 z-10&quot; aria-label=&quot;Copy code&quot; title=&quot;Copy code&quot;&gt;&lt;svg xmlns=&quot;http://www.w3.org/2000/svg&quot; width=&quot;24&quot; height=&quot;24&quot; viewBox=&quot;0 0 24 24&quot; fill=&quot;none&quot; stroke=&quot;currentColor&quot; stroke-width=&quot;2&quot; stroke-linecap=&quot;round&quot; stroke-linejoin=&quot;round&quot; class=&quot;lucide lucide-copy h-3 w-3&quot; aria-hidden=&quot;true&quot;&gt;&lt;rect width=&quot;14&quot; height=&quot;14&quot; x=&quot;8&quot; y=&quot;8&quot; rx=&quot;2&quot; ry=&quot;2&quot;&gt;&lt;/rect&gt;&lt;path d=&quot;M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/button&gt;&lt;code class=&quot;overflow-x-auto&quot; style=&quot;white-space:pre&quot;&gt;&lt;span class=&quot;&quot;&gt;---
        &lt;/span&gt;name: subagent-orchestrator
        &lt;!-- --&gt;description: orchestrate sub-agents to accomplish complex long-horizon tasks without losing coherency by delegating to sub-agents
        &lt;!-- --&gt;---
        &lt;!-- --&gt;
        &lt;!-- --&gt;This skill provides you with **CRITICAL** instructions that will help you to maintain coherency in long-horizon context-heavy tasks.
        &lt;!-- --&gt;
        &lt;!-- --&gt;You have a large number of tools available to you. The most important one is the one that allows you to dispatch sub-agents: either `Agent` or `Task`.
        &lt;!-- --&gt;
        &lt;!-- --&gt;All non-trivial operations should be delegated to sub-agents. You should delegate research and codebase understanding tasks to codebase-analyzer, codebase-locator and pattern-locator sub-agents.
        &lt;!-- --&gt;
        &lt;!-- --&gt;You should delegate running bash commands (particularly ones that are likely to produce lots of output) such as investigating with the `aws` CLI, using the `gh` CLI, digging through logs to `Bash` sub-agents.
        &lt;!-- --&gt;
        &lt;!-- --&gt;You should use separate sub-agents for separate tasks, and you may launch them in parallel - but do not delegate multiple tasks that are likely to have significant overlap to separate sub-agents.
        &lt;!-- --&gt;
        &lt;!-- --&gt;IMPORTANT: if the user has already given you a task, you should proceed with that task using this approach.
        &lt;!-- --&gt;
        &lt;!-- --&gt;If you have not already been explicitly given a taks, you should ask the user what task they would like for you to work on - do not assume or begin working on a ticket automatically.&lt;/code&gt;&lt;/pre&gt;&lt;/pre&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;I found that using this skill in combination with our existing workflow skills helped me keep Opus coherent at longer context lengths. Why?
        As I wrote about in that post, sub-agents encapsulate context and ensure that only highly-relevant context (the prompt, and the focused sub-agent result) end up in the context window, avoiding &lt;a href=&quot;https://research.trychroma.com/context-rot&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;context rot&lt;/a&gt;:&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;&lt;img src=&quot;https://www.humanlayer.dev/blog/skill-issue/context-firewall.png&quot; alt=&quot;context firewall&quot; class=&quot;max-w-full h-auto my-4 rounded-none&quot; style=&quot;border:1px solid var(--border-color)&quot; referrerpolicy=&quot;no-referrer&quot;&gt;&lt;/p&gt;
        &lt;h2 class=&quot;text-3xl mb-3 mt-20 font-bold&quot; style=&quot;color:var(--text-primary)&quot;&gt;## &lt;!-- --&gt;How we&#39;re incorporating this at HumanLayer&lt;/h2&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;We have had a feature in the tool for a while that warns the user when they&#39;re context is getting high. We used to set this at around %40 of sonnets 168k token window (200k - 32k reserved for output). This came out to about 100k tokens.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;To help users maximize instruction adherence and intelligence on hard codebase problems, we&#39;ve updated our context warnings for long-context models to trigger at the 100k token mark instead of 40% of the usable context. For opus 1m this is only 10% of the context window.&lt;/p&gt;
        &lt;h2 class=&quot;text-3xl mb-3 mt-20 font-bold&quot; style=&quot;color:var(--text-primary)&quot;&gt;## &lt;!-- --&gt;TL;DR&lt;/h2&gt;
        &lt;div class=&quot;pl-8 mb-6&quot;&gt;&lt;ol class=&quot;list-decimal space-y-4&quot; style=&quot;color:var(--text-secondary)&quot;&gt;
        &lt;li&gt;Long-context models degrade at all context lengths, not just long ones.&lt;/li&gt;
        &lt;li&gt;More context isn&#39;t more capability - the instruction budget doesn&#39;t scale with the context window.&lt;/li&gt;
        &lt;li&gt;Context isolation beats context expansion. Sub-agents, progressive disclosure, and &lt;a href=&quot;https://www.humanlayer.dev/blog/context-efficient-backpressure&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;context-efficient backpressure&lt;/a&gt; keep each context window small, focused, and in the smart zone.&lt;/li&gt;
        &lt;/ol&gt;&lt;/div&gt;</description>
      <link>https://www.humanlayer.dev/blog/long-context-isnt-the-answer</link>
      <guid isPermaLink="false">https://www.humanlayer.dev/blog/long-context-isnt-the-answer</guid>
      <pubDate>Sun, 22 Mar 2026 16:00:00 GMT</pubDate>
      <author>Kyle</author>
    </item>
    <item>
      <title>Getting Claude to Actually Read Your CLAUDE.md</title>
      <description>&lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;Claude Code wraps your CLAUDE.md in a &lt;code class=&quot;px-2 py-0.5 rounded font-mono bg-[var(--code-bg)] text-[var(--accent)] border border-[var(--code-border)] tracking-tight shadow-sm mr-1 align-middle wrap-anywhere box-decoration-clone&quot;&gt;&amp;lt;system_reminder&amp;gt;&lt;/code&gt; that explicitly tells the model the contents &quot;may or may not be relevant.&quot; The longer your file gets, the more Claude seems to treat individual sections as optional.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;We &lt;a href=&quot;https://www.humanlayer.dev/blog/writing-a-good-claude-md&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;wrote before&lt;/a&gt; about keeping these files concise and avoiding stuff that belongs in a linter. But even after trimming things down, I kept running into cases where Claude would ignore testing instructions while writing tests, or skip our API conventions when building endpoints.&lt;/p&gt;
        &lt;h2 class=&quot;text-3xl mb-3 mt-20 font-bold&quot; style=&quot;color:var(--text-primary)&quot;&gt;## &lt;!-- --&gt;Conditional &lt;code class=&quot;px-2 py-0.5 rounded font-mono bg-[var(--code-bg)] text-[var(--accent)] border border-[var(--code-border)] tracking-tight shadow-sm mr-1 align-middle wrap-anywhere box-decoration-clone&quot;&gt;&amp;lt;important if&amp;gt;&lt;/code&gt; blocks&lt;/h2&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;What&#39;s been working for me: wrapping sections in &lt;code class=&quot;px-2 py-0.5 rounded font-mono bg-[var(--code-bg)] text-[var(--accent)] border border-[var(--code-border)] tracking-tight shadow-sm mr-1 align-middle wrap-anywhere box-decoration-clone&quot;&gt;&amp;lt;important if=&quot;condition&quot;&amp;gt;&lt;/code&gt; tags.&lt;/p&gt;
        &lt;pre class=&quot;group&quot;&gt;&lt;pre class=&quot;prismjs rsh-code-block grid relative min-w-[250px] mb-6 bg-gray-400/5 p-4&quot;&gt;&lt;button data-slot=&quot;button&quot; class=&quot;inline-flex items-center justify-center gap-2 whitespace-nowrap rounded-none text-sm font-mono font-medium cursor-pointer disabled:cursor-not-allowed disabled:pointer-events-none disabled:opacity-50 [&amp;amp;_svg]:pointer-events-none [&amp;amp;_svg:not([class*=&#39;size-&#39;])]:size-4 shrink-0 [&amp;amp;_svg]:shrink-0 outline-none focus-visible:ring-[3px] uppercase tracking-wider border bg-transparent text-[var(--accent)] border-transparent hover:bg-[var(--accent)]/10 hover:border-[var(--accent)] focus-visible:border-[var(--accent)] focus-visible:ring-[var(--accent)]/50 size-9 absolute top-2 right-1 h-6 w-6 opacity-0 group-hover:opacity-100 transition-opacity duration-200 z-10&quot; aria-label=&quot;Copy code&quot; title=&quot;Copy code&quot;&gt;&lt;svg xmlns=&quot;http://www.w3.org/2000/svg&quot; width=&quot;24&quot; height=&quot;24&quot; viewBox=&quot;0 0 24 24&quot; fill=&quot;none&quot; stroke=&quot;currentColor&quot; stroke-width=&quot;2&quot; stroke-linecap=&quot;round&quot; stroke-linejoin=&quot;round&quot; class=&quot;lucide lucide-copy h-3 w-3&quot; aria-hidden=&quot;true&quot;&gt;&lt;rect width=&quot;14&quot; height=&quot;14&quot; x=&quot;8&quot; y=&quot;8&quot; rx=&quot;2&quot; ry=&quot;2&quot;&gt;&lt;/rect&gt;&lt;path d=&quot;M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/button&gt;&lt;code class=&quot;overflow-x-auto&quot; style=&quot;white-space:pre&quot;&gt;&lt;span class=&quot;&quot;&gt;&amp;lt;important if=&quot;you are writing or modifying tests&quot;&amp;gt;
        &lt;/span&gt;- Use `createTestApp()` helper for integration tests
        &lt;!-- --&gt;- Mock database with `dbMock` from `packages/db/test`
        &lt;!-- --&gt;- Test fixtures live in `__fixtures__/` directories
        &lt;!-- --&gt;&amp;lt;/important&amp;gt;&lt;/code&gt;&lt;/pre&gt;&lt;/pre&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;We don&#39;t have a rigorous explanation for why this helps. My guess is that the explicit condition gives Claude a clearer signal about when to apply the instructions, rather than leaving it to decide relevance on its own. Whatever the mechanism, we&#39;ve seen noticeably better adherence on tasks where only some sections of my CLAUDE.md should apply.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;A few things I&#39;ve learned:&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;&lt;strong style=&quot;color:var(--text-primary)&quot;&gt;Don&#39;t wrap everything.&lt;/strong&gt; Project identity, directory structure, and tech stack are relevant to basically every task. The conditional blocks are for testing setup or deployment procedures that only matter sometimes.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;&lt;strong style=&quot;color:var(--text-primary)&quot;&gt;Make conditions specific.&lt;/strong&gt; &lt;code class=&quot;px-2 py-0.5 rounded font-mono bg-[var(--code-bg)] text-[var(--accent)] border border-[var(--code-border)] tracking-tight shadow-sm mr-1 align-middle wrap-anywhere box-decoration-clone&quot;&gt;&amp;lt;important if=&quot;you are writing code&quot;&amp;gt;&lt;/code&gt; matches almost everything and defeats the purpose. I try to make each condition narrow enough that it only fires when I actually want those rules applied.&lt;/p&gt;
        &lt;h2 class=&quot;text-3xl mb-3 mt-20 font-bold&quot; style=&quot;color:var(--text-primary)&quot;&gt;## &lt;!-- --&gt;A skill to automate this&lt;/h2&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;I did a lot of restructuring CLAUDE.md files by hand, so I made a Claude Code skill that does the rewrite. It pulls out the foundational stuff, wraps domain-specific sections in conditional blocks, and cleans up some common problems (stale code snippets, style rules that should be in a linter, vague instructions that don&#39;t actually help).&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;It&#39;s not perfect—you&#39;ll probably want to review what it produces—but it&#39;s a decent starting point.&lt;/p&gt;
        &lt;h2 class=&quot;text-3xl mb-3 mt-20 font-bold&quot; style=&quot;color:var(--text-primary)&quot;&gt;## &lt;!-- --&gt;Try it&lt;/h2&gt;
        &lt;pre class=&quot;group&quot;&gt;&lt;pre class=&quot;prismjs rsh-code-block grid relative min-w-[250px] mb-6 bg-gray-400/5 p-4&quot;&gt;&lt;button data-slot=&quot;button&quot; class=&quot;inline-flex items-center justify-center gap-2 whitespace-nowrap rounded-none text-sm font-mono font-medium cursor-pointer disabled:cursor-not-allowed disabled:pointer-events-none disabled:opacity-50 [&amp;amp;_svg]:pointer-events-none [&amp;amp;_svg:not([class*=&#39;size-&#39;])]:size-4 shrink-0 [&amp;amp;_svg]:shrink-0 outline-none focus-visible:ring-[3px] uppercase tracking-wider border bg-transparent text-[var(--accent)] border-transparent hover:bg-[var(--accent)]/10 hover:border-[var(--accent)] focus-visible:border-[var(--accent)] focus-visible:ring-[var(--accent)]/50 size-9 absolute top-2 right-1 h-6 w-6 opacity-0 group-hover:opacity-100 transition-opacity duration-200 z-10&quot; aria-label=&quot;Copy code&quot; title=&quot;Copy code&quot;&gt;&lt;svg xmlns=&quot;http://www.w3.org/2000/svg&quot; width=&quot;24&quot; height=&quot;24&quot; viewBox=&quot;0 0 24 24&quot; fill=&quot;none&quot; stroke=&quot;currentColor&quot; stroke-width=&quot;2&quot; stroke-linecap=&quot;round&quot; stroke-linejoin=&quot;round&quot; class=&quot;lucide lucide-copy h-3 w-3&quot; aria-hidden=&quot;true&quot;&gt;&lt;rect width=&quot;14&quot; height=&quot;14&quot; x=&quot;8&quot; y=&quot;8&quot; rx=&quot;2&quot; ry=&quot;2&quot;&gt;&lt;/rect&gt;&lt;path d=&quot;M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/button&gt;&lt;code class=&quot;overflow-x-auto&quot; style=&quot;white-space:pre&quot;&gt;&lt;span class=&quot;&quot;&gt;npx skills add humanlayer/skills --skill improve-claude-md&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/pre&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;Then in your project:&lt;/p&gt;
        &lt;pre class=&quot;group&quot;&gt;&lt;pre class=&quot;prismjs rsh-code-block grid relative min-w-[250px] mb-6 bg-gray-400/5 p-4&quot;&gt;&lt;button data-slot=&quot;button&quot; class=&quot;inline-flex items-center justify-center gap-2 whitespace-nowrap rounded-none text-sm font-mono font-medium cursor-pointer disabled:cursor-not-allowed disabled:pointer-events-none disabled:opacity-50 [&amp;amp;_svg]:pointer-events-none [&amp;amp;_svg:not([class*=&#39;size-&#39;])]:size-4 shrink-0 [&amp;amp;_svg]:shrink-0 outline-none focus-visible:ring-[3px] uppercase tracking-wider border bg-transparent text-[var(--accent)] border-transparent hover:bg-[var(--accent)]/10 hover:border-[var(--accent)] focus-visible:border-[var(--accent)] focus-visible:ring-[var(--accent)]/50 size-9 absolute top-2 right-1 h-6 w-6 opacity-0 group-hover:opacity-100 transition-opacity duration-200 z-10&quot; aria-label=&quot;Copy code&quot; title=&quot;Copy code&quot;&gt;&lt;svg xmlns=&quot;http://www.w3.org/2000/svg&quot; width=&quot;24&quot; height=&quot;24&quot; viewBox=&quot;0 0 24 24&quot; fill=&quot;none&quot; stroke=&quot;currentColor&quot; stroke-width=&quot;2&quot; stroke-linecap=&quot;round&quot; stroke-linejoin=&quot;round&quot; class=&quot;lucide lucide-copy h-3 w-3&quot; aria-hidden=&quot;true&quot;&gt;&lt;rect width=&quot;14&quot; height=&quot;14&quot; x=&quot;8&quot; y=&quot;8&quot; rx=&quot;2&quot; ry=&quot;2&quot;&gt;&lt;/rect&gt;&lt;path d=&quot;M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/button&gt;&lt;code class=&quot;overflow-x-auto&quot; style=&quot;white-space:pre&quot;&gt;&lt;span class=&quot;&quot;&gt;/improve-claude-md&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/pre&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;It reads your existing CLAUDE.md and outputs a rewritten version. Source is at &lt;a href=&quot;https://github.com/humanlayer/skills/tree/main/plugins/improve-claude-md&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;github.com/humanlayer/skills&lt;/a&gt;.&lt;/p&gt;</description>
      <link>https://www.humanlayer.dev/blog/stop-claude-from-ignoring-your-claude-md</link>
      <guid isPermaLink="false">https://www.humanlayer.dev/blog/stop-claude-from-ignoring-your-claude-md</guid>
      <pubDate>Mon, 16 Mar 2026 16:00:00 GMT</pubDate>
      <author>Dex</author>
    </item>
    <item>
      <title>Skill Issue: Harness Engineering for Coding Agents</title>
      <description>&lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;We&#39;ve spent the past year watching coding agents fail in every conceivable way: ignoring instructions, executing dangerous commands un-prompted, and going in circles on the simplest of tasks.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;We&#39;ve seen teams ship immense amounts of slop. We&#39;ve even &lt;a href=&quot;https://www.youtube.com/live/99Kxkemj1g8?si=tVsK88M3XPyo0urD&amp;amp;t=20967&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;shipped a little bit of slop ourselves&lt;/a&gt;.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;Every time, the instinct was the same:&lt;/p&gt;
        &lt;div class=&quot;pl-8 mb-6&quot;&gt;&lt;ul class=&quot;list-disc space-y-4&quot; style=&quot;color:var(--text-secondary)&quot;&gt;
        &lt;li&gt;&quot;We just need better models, GPT-6 will fix it&quot;&lt;/li&gt;
        &lt;li&gt;&quot;We just need better instruction-following&quot;&lt;/li&gt;
        &lt;li&gt;&quot;It&#39;ll work once [niche library I&#39;m using] is in the training data&quot;&lt;/li&gt;
        &lt;/ul&gt;&lt;/div&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;But over the course of dozens of projects and hundreds of agent sessions, we kept arriving at the same conclusion: it&#39;s not a model problem. It&#39;s a &lt;strong style=&quot;color:var(--text-primary)&quot;&gt;configuration problem&lt;/strong&gt;.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;Yes, models will get smarter, and some existing failure modes will disappear. And then because they are smarter, we will give them new problems which are bigger and harder, and they will &lt;strong style=&quot;color:var(--text-primary)&quot;&gt;continue to fail in unexpected ways&lt;/strong&gt;. Unexpected failures modes are a fundamental problem for non-deterministic systems.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;&lt;img src=&quot;https://www.humanlayer.dev/blog/skill-issue/gpt-6.png&quot; alt=&quot;gpt-6&quot; class=&quot;max-w-full h-auto my-4 rounded-none&quot; style=&quot;border:1px solid var(--border-color)&quot; referrerpolicy=&quot;no-referrer&quot;&gt;&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;So instead of praying for &lt;code class=&quot;px-2 py-0.5 rounded font-mono bg-[var(--code-bg)] text-[var(--accent)] border border-[var(--code-border)] tracking-tight shadow-sm mr-1 align-middle wrap-anywhere box-decoration-clone&quot;&gt;gpt-6.4-codex-ultrahigh_extended&lt;/code&gt; to save us all, we try to focus on answering the question of &lt;em style=&quot;color:var(--text-secondary)&quot;&gt;&quot;how do we get the most out of &lt;strong style=&quot;color:var(--text-primary)&quot;&gt;today&#39;s&lt;/strong&gt; models?&quot;&lt;/em&gt;&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;There are lots of ways to get better performance out of your coding agent. If you use coding agents for moderately hard tasks, you&#39;ve probably configured your coding agent a bit. Have you used skills? MCP servers? Sub-agents? Memory? AGENTS.md files?&lt;/p&gt;
        &lt;pre class=&quot;group&quot;&gt;&lt;pre class=&quot;prismjs rsh-code-block grid relative min-w-[250px] mb-6 bg-gray-400/5 p-4&quot;&gt;&lt;button data-slot=&quot;button&quot; class=&quot;inline-flex items-center justify-center gap-2 whitespace-nowrap rounded-none text-sm font-mono font-medium cursor-pointer disabled:cursor-not-allowed disabled:pointer-events-none disabled:opacity-50 [&amp;amp;_svg]:pointer-events-none [&amp;amp;_svg:not([class*=&#39;size-&#39;])]:size-4 shrink-0 [&amp;amp;_svg]:shrink-0 outline-none focus-visible:ring-[3px] uppercase tracking-wider border bg-transparent text-[var(--accent)] border-transparent hover:bg-[var(--accent)]/10 hover:border-[var(--accent)] focus-visible:border-[var(--accent)] focus-visible:ring-[var(--accent)]/50 size-9 absolute top-2 right-1 h-6 w-6 opacity-0 group-hover:opacity-100 transition-opacity duration-200 z-10&quot; aria-label=&quot;Copy code&quot; title=&quot;Copy code&quot;&gt;&lt;svg xmlns=&quot;http://www.w3.org/2000/svg&quot; width=&quot;24&quot; height=&quot;24&quot; viewBox=&quot;0 0 24 24&quot; fill=&quot;none&quot; stroke=&quot;currentColor&quot; stroke-width=&quot;2&quot; stroke-linecap=&quot;round&quot; stroke-linejoin=&quot;round&quot; class=&quot;lucide lucide-copy h-3 w-3&quot; aria-hidden=&quot;true&quot;&gt;&lt;rect width=&quot;14&quot; height=&quot;14&quot; x=&quot;8&quot; y=&quot;8&quot; rx=&quot;2&quot; ry=&quot;2&quot;&gt;&lt;/rect&gt;&lt;path d=&quot;M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/button&gt;&lt;code class=&quot;overflow-x-auto&quot; style=&quot;white-space:pre&quot;&gt;&lt;span class=&quot;&quot;&gt;coding agent = AI model(s) + harness&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/pre&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;These are all technically separate concepts, but they are all part of the coding agent&#39;s configuration surface. We call this the &lt;strong style=&quot;color:var(--text-primary)&quot;&gt;coding agent&#39;s harness&lt;/strong&gt;, and we think of it as &lt;a href=&quot;https://www.vtrivedy.com/posts/claude-code-sdk-haas-harness-as-a-service&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;the agent’s runtime&lt;/a&gt;, or as its peripherals: what does the model use to interact with its environment?&lt;/p&gt;
        &lt;h2 class=&quot;text-3xl mb-3 mt-20 font-bold&quot; style=&quot;color:var(--text-primary)&quot;&gt;## &lt;!-- --&gt;Harness Engineering&lt;/h2&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;&lt;strong style=&quot;color:var(--text-primary)&quot;&gt;Harness engineering&lt;/strong&gt;, coined by &lt;a href=&quot;https://x.com/Vtrivedy10&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;Viv&lt;/a&gt;, describes the practice of leveraging these configuration points to customize and improve your coding agent&#39;s output quality and reliability.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;&lt;img src=&quot;https://www.humanlayer.dev/blog/skill-issue/harness-components.png&quot; alt=&quot;harness components&quot; class=&quot;max-w-full h-auto my-4 rounded-none&quot; style=&quot;border:1px solid var(--border-color)&quot; referrerpolicy=&quot;no-referrer&quot;&gt;
        &lt;em style=&quot;color:var(--text-secondary)&quot;&gt;Image from &lt;a href=&quot;https://blog.langchain.com/the-anatomy-of-an-agent-harness/&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;Viv&#39;s post&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;As &lt;a href=&quot;https://mitchellh.com/writing/my-ai-adoption-journey#step-5-engineer-the-harness&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;Mitchell Hashimoto put it&lt;/a&gt;, harness engineering&lt;/p&gt;
        &lt;blockquote class=&quot;border-l-4 pl-4 my-4 italic&quot; style=&quot;border-color:var(--accent);color:var(--text-muted)&quot;&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;[...] is the idea that anytime you find an agent makes a mistake, you take the time to engineer a solution such that the agent never makes that mistake again.&lt;/p&gt;
        &lt;/blockquote&gt;
        &lt;h2 class=&quot;text-3xl mb-3 mt-20 font-bold&quot; style=&quot;color:var(--text-primary)&quot;&gt;## &lt;!-- --&gt;... as a Subset of Context Engineering&lt;/h2&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;We view harness engineering as a subset of &lt;a href=&quot;https://www.humanlayer.dev/blog/advanced-context-engineering&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;context engineering&lt;/a&gt;. Coined by my cofounder &lt;a href=&quot;https://x.com/dexhorthy&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;Dex&lt;/a&gt; in &lt;a href=&quot;https://github.com/humanlayer/12-factor-agents&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;12-factor agents&lt;/a&gt;, context engineering is a superset of “prompt engineering” and a variety of other techniques for systematically improving AI agents’ reliability.&amp;nbsp;You can find the original talk on &lt;a href=&quot;https://www.youtube.com/watch?v=IS_y40zY-hc&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;Harness engineering then is the subset of context engineering which primarily involves leveraging harness configuration points to carefully manage the context windows of coding agents.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;&lt;img src=&quot;https://www.humanlayer.dev/blog/skill-issue/harness-engineering.png&quot; alt=&quot;harness engineering as context engineering&quot; class=&quot;max-w-full h-auto my-4 rounded-none&quot; style=&quot;border:1px solid var(--border-color)&quot; referrerpolicy=&quot;no-referrer&quot;&gt;&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;It answers:&lt;/p&gt;
        &lt;div class=&quot;pl-8 mb-6&quot;&gt;&lt;ul class=&quot;list-disc space-y-4&quot; style=&quot;color:var(--text-secondary)&quot;&gt;
        &lt;li&gt;How do we give our coding agent new capabilities?&lt;/li&gt;
        &lt;li&gt;How do we teach it things about our codebase that aren’t in the training data?&lt;/li&gt;
        &lt;li&gt;How do we add determinism beyond &lt;code class=&quot;px-2 py-0.5 rounded font-mono bg-[var(--code-bg)] text-[var(--accent)] border border-[var(--code-border)] tracking-tight shadow-sm mr-1 align-middle wrap-anywhere box-decoration-clone&quot;&gt;CRITICAL: always do XYZ&lt;/code&gt; in the system message?&lt;/li&gt;
        &lt;li&gt;How do we adapt the agent’s behavior for our specific codebase?&lt;/li&gt;
        &lt;li&gt;How do we increase task success rates beyond “magic prompts”?&lt;/li&gt;
        &lt;li&gt;How do we prevent our context window from inflating too rapidly, or with too much bad context?&lt;/li&gt;
        &lt;/ul&gt;&lt;/div&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;Skills, MCP servers, sub-agents, hooks, and back-pressure mechanisms are all tactical solutions we’ve arrived at.&lt;/p&gt;
        &lt;h3 class=&quot;text-2xl mb-2 mt-16 font-bold&quot; style=&quot;color:var(--text-primary)&quot;&gt;### &lt;!-- --&gt;Views on Harness Engineering&lt;/h3&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;Viv’s posts on harness engineering are worth reading alongside this one — &lt;a href=&quot;https://www.vtrivedy.com/posts/claude-code-sdk-haas-harness-as-a-service&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;the first&lt;/a&gt; frames the four customization levers (system prompt, tools/MCPs, context, sub-agents), and &lt;a href=&quot;https://blog.langchain.com/the-anatomy-of-an-agent-harness/&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;the second&lt;/a&gt; works backwards from what models &lt;em style=&quot;color:var(--text-secondary)&quot;&gt;can’t&lt;/em&gt; do natively to derive why each harness component exists.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;&lt;img src=&quot;https://www.humanlayer.dev/blog/skill-issue/backwards.png&quot; alt=&quot;working backwards from what models can&#39;t do natively&quot; class=&quot;max-w-full h-auto my-4 rounded-none&quot; style=&quot;border:1px solid var(--border-color)&quot; referrerpolicy=&quot;no-referrer&quot;&gt;
        &lt;em style=&quot;color:var(--text-secondary)&quot;&gt;Image from &lt;a href=&quot;https://blog.langchain.com/the-anatomy-of-an-agent-harness/&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;Viv&#39;s post&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;We’d add two levers he doesn’t emphasize:&lt;/p&gt;
        &lt;div class=&quot;pl-8 mb-6&quot;&gt;&lt;ol class=&quot;list-decimal space-y-4&quot; style=&quot;color:var(--text-secondary)&quot;&gt;
        &lt;li&gt;&lt;strong style=&quot;color:var(--text-primary)&quot;&gt;hooks&lt;/strong&gt; for automated integration and deterministic control flow&lt;/li&gt;
        &lt;li&gt;&lt;strong style=&quot;color:var(--text-primary)&quot;&gt;skills&lt;/strong&gt; for progressive disclosure of knowledge. (Dex likes to refer to them as &quot;Instruction Modules&quot; - more on this in another post.)&lt;/li&gt;
        &lt;/ol&gt;&lt;/div&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;After months of solving hard problems in complex brownfield enterprise-scale codebases, we have found that sub-agents are a particularly powerful lever. When working on hard problems that require many, many context windows to solve, &lt;strong style=&quot;color:var(--text-primary)&quot;&gt;sub-agents are the key to maintaining coherency across many sessions&lt;/strong&gt;. Sub-agents &lt;strong style=&quot;color:var(--text-primary)&quot;&gt;function as a &quot;context firewall&quot;&lt;/strong&gt; that ensures discrete tasks can run in isolated context windows so none of the intermediate noise accumulates in your parent thread which is responsible for orchestration, and you can maintain coherency for much, much longer.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;&lt;img src=&quot;https://www.humanlayer.dev/blog/skill-issue/context-firewall.png&quot; alt=&quot;context firewall&quot; class=&quot;max-w-full h-auto my-4 rounded-none&quot; style=&quot;border:1px solid var(--border-color)&quot; referrerpolicy=&quot;no-referrer&quot;&gt;&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;OpenAI recently wrote a &lt;a href=&quot;https://openai.com/index/harness-engineering/&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;blog post&lt;/a&gt; on the topic as well. There&#39;s some great content in there, and it seems to indicate that they view harness engineering as configuring everything &lt;em style=&quot;color:var(--text-secondary)&quot;&gt;outside&lt;/em&gt; of the agent&#39;s runtime. It&#39;s more focused on back-pressure and verification mechanisms. (Although this may be a mis-reading; the post is somewhat unclear: the word &quot;harness&quot; only appears once in the text of the post, and in reference to evals rather than harness engineering itself.)&lt;/p&gt;
        &lt;h3 class=&quot;text-2xl mb-2 mt-16 font-bold&quot; style=&quot;color:var(--text-primary)&quot;&gt;### &lt;!-- --&gt;But What About Post-Training?&lt;/h3&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;Given that frontier coding models are post-trained on their harnesses (e.g. Claude in Claude Code, GPT-5 Codex in Codex), some will argue that the best harness and/or configuration is the one that the model was trained on.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;For example, the Codex models are so tightly coupled with the Codex harness&#39;s &lt;code class=&quot;px-2 py-0.5 rounded font-mono bg-[var(--code-bg)] text-[var(--accent)] border border-[var(--code-border)] tracking-tight shadow-sm mr-1 align-middle wrap-anywhere box-decoration-clone&quot;&gt;apply_patch&lt;/code&gt; tool that &lt;a href=&quot;https://opencode.ai/&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;OpenCode&lt;/a&gt; — built as an open-source alternative to Claude Code — had to &lt;a href=&quot;https://github.com/anomalyco/opencode/pull/9127&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;add an &lt;code class=&quot;px-2 py-0.5 rounded font-mono bg-[var(--code-bg)] text-[var(--accent)] border border-[var(--code-border)] tracking-tight shadow-sm mr-1 align-middle wrap-anywhere box-decoration-clone&quot;&gt;apply_patch&lt;/code&gt; tool&lt;/a&gt; specifically for GPT/Codex models to mimic the Codex harness to improve the Codex models&#39; performance in the OpenCode harness - while Claude and other models still use normal &lt;code class=&quot;px-2 py-0.5 rounded font-mono bg-[var(--code-bg)] text-[var(--accent)] border border-[var(--code-border)] tracking-tight shadow-sm mr-1 align-middle wrap-anywhere box-decoration-clone&quot;&gt;edit&lt;/code&gt; and &lt;code class=&quot;px-2 py-0.5 rounded font-mono bg-[var(--code-bg)] text-[var(--accent)] border border-[var(--code-border)] tracking-tight shadow-sm mr-1 align-middle wrap-anywhere box-decoration-clone&quot;&gt;write&lt;/code&gt; tools.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;This &lt;em style=&quot;color:var(--text-secondary)&quot;&gt;can&lt;/em&gt; mean that a model will perform better when coupled with the harness it was post-trained on, and some might infer that this means you shouldn&#39;t customize the harness at all.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;But it cuts both ways: &lt;strong style=&quot;color:var(--text-primary)&quot;&gt;models can be over-fitted to their harness&lt;/strong&gt;. Viv cites &lt;a href=&quot;https://terminalbench.com/&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;Terminal Bench 2.0&lt;/a&gt; where Opus 4.6 in Claude Code comes in position #33, but when placed in a different harness that wasn&#39;t seen during post-training, it comes in at #5 (+/- about 4 positions in either direction).&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;&lt;img src=&quot;https://www.humanlayer.dev/blog/skill-issue/terminal-bench.png&quot; alt=&quot;terminal bench&quot; class=&quot;max-w-full h-auto my-4 rounded-none&quot; style=&quot;border:1px solid var(--border-color)&quot; referrerpolicy=&quot;no-referrer&quot;&gt;&lt;/p&gt;
        &lt;h2 class=&quot;text-3xl mb-3 mt-20 font-bold&quot; style=&quot;color:var(--text-primary)&quot;&gt;## &lt;!-- --&gt;Engineering Your Harness&lt;/h2&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;With that in mind, let&#39;s walk through the configuration surfaces we&#39;ve found most impactful.&lt;/p&gt;
        &lt;h3 class=&quot;text-2xl mb-2 mt-16 font-bold&quot; style=&quot;color:var(--text-primary)&quot;&gt;### &lt;!-- --&gt;CLAUDE.md &amp;amp; AGENTS.md&lt;/h3&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;Before touching any other harness configuration points, it&#39;s usually worth customizing your CLAUDE.md / AGENTS.md files. These are markdown files at the top-level of your repository that get deterministically injected into the agent&#39;s system prompt by the harness.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;We have already shared &lt;a href=&quot;https://www.humanlayer.dev/blog/writing-a-good-claude-md&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;some opinions about what makes a good CLAUDE.md&lt;/a&gt; file and how to use it correctly, so give that a read if you&#39;re not familiar with it. Matt Pocock also wrote a &lt;a href=&quot;https://www.aihero.dev/a-complete-guide-to-agents-md&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;great follow-up&lt;/a&gt; that more generally applies to AGENTS.md.&lt;/p&gt;
        &lt;h4 class=&quot;text-xl mb-1 mt-8 font-bold&quot; style=&quot;color:var(--text-primary)&quot;&gt;#### &lt;!-- --&gt;The ETH Zurich Study&lt;/h4&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;After ETH Zurich published their &lt;a href=&quot;https://arxiv.org/abs/2602.11988&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;study testing 138 agentfiles across various repos&lt;/a&gt; which indicated that most agentfiles were useless-or-worse, we got a lot of feedback on our &lt;a href=&quot;https://www.humanlayer.dev/blog/writing-a-good-claude-md&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;CLAUDE.md post&lt;/a&gt;:&lt;/p&gt;
        &lt;blockquote class=&quot;border-l-4 pl-4 my-4 italic&quot; style=&quot;border-color:var(--accent);color:var(--text-muted)&quot;&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;See, CLAUDE.md files don&#39;t even help — they&#39;re a waste of time.&lt;/p&gt;
        &lt;/blockquote&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;Indeed: the study tested many agentfiles across a wide variety of repos, and found:&lt;/p&gt;
        &lt;div class=&quot;pl-8 mb-6&quot;&gt;&lt;ul class=&quot;list-disc space-y-4&quot; style=&quot;color:var(--text-secondary)&quot;&gt;
        &lt;li&gt;that LLM-generated ones actually &lt;em style=&quot;color:var(--text-secondary)&quot;&gt;hurt&lt;/em&gt; performance while costing 20%+ more&lt;/li&gt;
        &lt;li&gt;human-written ones only helped about 4%.&lt;/li&gt;
        &lt;li&gt;Agents spent 14-22% more reasoning tokens processing context file instructions, took more steps to complete tasks, and ran more tools — all without improving resolution rates.&lt;/li&gt;
        &lt;li&gt;Codebase overviews and directory listings didn&#39;t help at all; agents discover repository structure on their own just fine.&lt;/li&gt;
        &lt;/ul&gt;&lt;/div&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;A careful reading of the study &lt;strong style=&quot;color:var(--text-primary)&quot;&gt;indicates that what we said in our post was correct:&lt;/strong&gt;&lt;/p&gt;
        &lt;div class=&quot;pl-8 mb-6&quot;&gt;&lt;ul class=&quot;list-disc space-y-4&quot; style=&quot;color:var(--text-secondary)&quot;&gt;
        &lt;li&gt;&lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;Agent-generated files were worse. Yes, we said:&lt;/p&gt;
        &lt;blockquote class=&quot;border-l-4 pl-4 my-4 italic&quot; style=&quot;border-color:var(--accent);color:var(--text-muted)&quot;&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;avoid auto-generating it. You should carefully craft its contents for best results.&lt;/p&gt;
        &lt;/blockquote&gt;
        &lt;/li&gt;
        &lt;li&gt;&lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;Lots of files too-heavily-steered the model to use specific tools, causing worse outcomes. Yep, we said:&lt;/p&gt;
        &lt;blockquote class=&quot;border-l-4 pl-4 my-4 italic&quot; style=&quot;border-color:var(--accent);color:var(--text-muted)&quot;&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;Less (instructions) is more. While you shouldn&#39;t omit necessary instructions, you should include as few instructions as reasonably possible in the file.&lt;/p&gt;
        &lt;/blockquote&gt;
        &lt;/li&gt;
        &lt;li&gt;&lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;Files contained irrelevant context. Yep, we said:&lt;/p&gt;
        &lt;blockquote class=&quot;border-l-4 pl-4 my-4 italic&quot; style=&quot;border-color:var(--accent);color:var(--text-muted)&quot;&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;Use Progressive Disclosure&lt;/p&gt;
        &lt;/blockquote&gt;
        &lt;/li&gt;
        &lt;li&gt;&lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;The human-written ones barely helped because of too many conditional rules. Yep, we said:&lt;/p&gt;
        &lt;blockquote class=&quot;border-l-4 pl-4 my-4 italic&quot; style=&quot;border-color:var(--accent);color:var(--text-muted)&quot;&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;Keep the contents of your CLAUDE.md concise and universally applicable.&lt;/p&gt;
        &lt;/blockquote&gt;
        &lt;/li&gt;
        &lt;/ul&gt;&lt;/div&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;Our CLAUDE.md is under 60 lines.&lt;/p&gt;
        &lt;h3 class=&quot;text-2xl mb-2 mt-16 font-bold&quot; style=&quot;color:var(--text-primary)&quot;&gt;### &lt;!-- --&gt;MCP Servers Are for Tools&lt;/h3&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;MCP servers are&amp;nbsp;primarily for plugging tools into your coding agent to extend its capabilities beyond file I/O and bash commands. The MCP specification includes additional features like resources, prompts, and elicitations, but &lt;strong style=&quot;color:var(--text-primary)&quot;&gt;these are generally not well-supported by MCP clients and coding agent harnesses&lt;/strong&gt;.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;The MCP spec supports servers that run on your (or the agent&#39;s) local machine which allow the agent to interact with its local environment, but it also supports HTTP-based MCP servers that can connect your agent with remote tools and services like Linear, Sentry, and more.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;When you plug an MCP server into your coding agent, the list of available tools, their descriptions, and the arguments needed to invoke them &lt;strong style=&quot;color:var(--text-primary)&quot;&gt;are injected into your coding agent&#39;s system prompt&lt;/strong&gt;. As a result, the MCP server can use the tool descriptions to customize your agent’s behavior by providing your agent with instructions about when to use them.&amp;nbsp;&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;&lt;strong style=&quot;color:var(--text-primary)&quot;&gt;WARNING&lt;/strong&gt;: because MCP servers’ tool descriptions are added to your coding agent’s system prompt, never connect to one you don’t trust. This can be a dangerous vector for prompt injection! STDIO servers and other servers that run client-side with npx or uvx can also execute code on your host in the absence of prompt injection.&lt;/p&gt;
        &lt;h4 class=&quot;text-xl mb-1 mt-8 font-bold&quot; style=&quot;color:var(--text-primary)&quot;&gt;#### &lt;!-- --&gt;Too Many Tools Is Bad&lt;/h4&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;We’ve seen this firsthand: plug too many MCP tools into your agent, and the context window fills up with tool descriptions, pushing you into &lt;a href=&quot;https://youtu.be/rmvDxxNubIg?si=O17nmS3SScaAkpp-&amp;amp;t=355&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;the dumb zone&lt;/a&gt; much faster:&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;&lt;img src=&quot;https://www.humanlayer.dev/blog/skill-issue/too-many-tools.png&quot; alt=&quot;too many tools&quot; class=&quot;max-w-full h-auto my-4 rounded-none&quot; style=&quot;border:1px solid var(--border-color)&quot; referrerpolicy=&quot;no-referrer&quot;&gt;&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;The &lt;a href=&quot;https://www.aihero.dev/a-complete-guide-to-agents-md#the-instruction-budget&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;instruction budget&lt;/a&gt; matters too — every irrelevant tool description is an instruction the agent has to process without any benefit.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;In fact, these failure modes are so common that Anthropic released &lt;a href=&quot;https://platform.claude.com/docs/en/agents-and-tools/tool-use/tool-search-tool&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;experimental support for MCP tool search&lt;/a&gt; to progressively disclose tools to Claude when the user has too many MCP tools connected. TL;DR: if you’re not actively using a server which provides a large number of tools, turn it off.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;We also found that if an MCP server duplicates functionality that’s already available as a CLI well-represented in training data, it works better to just prompt the agent to use the CLI. For things like GitHub, Docker, or most databases, your coding agent can just use the right CLIs and shell commands. The model has seen these tools enough during training that it already knows how to use them, and you gain the added benefit of composability with tools like &lt;code class=&quot;px-2 py-0.5 rounded font-mono bg-[var(--code-bg)] text-[var(--accent)] border border-[var(--code-border)] tracking-tight shadow-sm mr-1 align-middle wrap-anywhere box-decoration-clone&quot;&gt;grep&lt;/code&gt; and &lt;code class=&quot;px-2 py-0.5 rounded font-mono bg-[var(--code-bg)] text-[var(--accent)] border border-[var(--code-border)] tracking-tight shadow-sm mr-1 align-middle wrap-anywhere box-decoration-clone&quot;&gt;jq&lt;/code&gt; to enable additional context-efficiency.&lt;/p&gt;
        &lt;h4 class=&quot;text-xl mb-1 mt-8 font-bold&quot; style=&quot;color:var(--text-primary)&quot;&gt;#### &lt;!-- --&gt;Always Be Context-Engineering&lt;/h4&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;At HumanLayer, we used the Linear MCP server for a while before realizing that we really only used a small subset of the tools it provides - so we wrote a small CLI that wraps the Linear API and provides very context-efficient responses, and we included 6 example usages in our &lt;a href=&quot;https://www.humanlayer.dev/blog/writing-a-good-claude-md&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;CLAUDE.md&lt;/a&gt; file:&lt;/p&gt;
        &lt;pre class=&quot;group&quot;&gt;&lt;pre class=&quot;prismjs rsh-code-block grid relative min-w-[250px] mb-6 bg-gray-400/5 p-4&quot;&gt;&lt;button data-slot=&quot;button&quot; class=&quot;inline-flex items-center justify-center gap-2 whitespace-nowrap rounded-none text-sm font-mono font-medium cursor-pointer disabled:cursor-not-allowed disabled:pointer-events-none disabled:opacity-50 [&amp;amp;_svg]:pointer-events-none [&amp;amp;_svg:not([class*=&#39;size-&#39;])]:size-4 shrink-0 [&amp;amp;_svg]:shrink-0 outline-none focus-visible:ring-[3px] uppercase tracking-wider border bg-transparent text-[var(--accent)] border-transparent hover:bg-[var(--accent)]/10 hover:border-[var(--accent)] focus-visible:border-[var(--accent)] focus-visible:ring-[var(--accent)]/50 size-9 absolute top-2 right-1 h-6 w-6 opacity-0 group-

@github-actions
Copy link
Copy Markdown
Contributor

Auto Review

No clear rule violations found in the current diff.

@github-actions
Copy link
Copy Markdown
Contributor

Successfully generated as following:

http://localhost:1200/humanlayer/blog - Success ✔️
<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:atom="http://www.w3.org/2005/Atom" version="2.0">
  <channel>
    <title>HumanLayer Blog</title>
    <link>https://www.humanlayer.dev/blog</link>
    <atom:link href="http://localhost:1200/humanlayer/blog" rel="self" type="application/rss+xml"></atom:link>
    <description>HumanLayer Blog - Powered by RSSHub</description>
    <generator>RSSHub</generator>
    <webMaster>contact@rsshub.app (RSSHub)</webMaster>
    <language>en</language>
    <lastBuildDate>Tue, 31 Mar 2026 06:52:36 GMT</lastBuildDate>
    <ttl>5</ttl>
    <item>
      <title>Long-Context Isn&#39;t the Answer</title>
      <description>&lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;Anthropic just switched the default model in Claude Code to Opus 4.6 with a 1M context window. We tried it when it launched. But now we&#39;re switching back to Opus 4.5&lt;/p&gt;
        &lt;h2 class=&quot;text-3xl mb-3 mt-20 font-bold&quot; style=&quot;color:var(--text-primary)&quot;&gt;## &lt;!-- --&gt;How about some context?&lt;/h2&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;Unlike previous Claude Opus models which have a ~ 200k context window, Opus 4.6 has as 1M context window. This was very exciting for us!&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;We spend a lot of time working on hard problems in large enterprise codebases, and if the model can reliably hold more of the codebase in its head at a time, so to speak, then that creates new possibilities for the scale of problem that can be solved in a single context window.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;It would mean less compaction (whether &lt;a href=&quot;https://www.humanlayer.dev/blog/advanced-context-engineering&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;frequent intentional compaction&lt;/a&gt; or auto-compact), and less &quot;context pressure&quot; to solve the problem before your context window fills up.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;As Calvin French-Owen, who helped launch Codex last year, &lt;a href=&quot;https://youtu.be/qwmmWzPnhog?si=ReRWTavUKTItIGGr&amp;amp;t=918&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;puts it in an episode of YC&#39;s Lightcone podcast&lt;/a&gt;:&lt;/p&gt;
        &lt;blockquote class=&quot;border-l-4 pl-4 my-4 italic&quot; style=&quot;border-color:var(--accent);color:var(--text-muted)&quot;&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;...imagine you&#39;re a college student. You&#39;re taking an exam. In the first five minutes of that exam, you&#39;re like, &quot;Oh, I have all the time in the world. I&#39;ll do a great job. I&#39;ll think through each of these problems.&quot;&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;Let&#39;s say you have like five minutes left and you still have half the exam left. You&#39;re like, &quot;Oh man, I just got to do whatever I can.&quot; Like, that&#39;s the LLM with a context window&lt;/p&gt;
        &lt;/blockquote&gt;
        &lt;h2 class=&quot;text-3xl mb-3 mt-20 font-bold&quot; style=&quot;color:var(--text-primary)&quot;&gt;## &lt;!-- --&gt;More context, less instruction adherence&lt;/h2&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;While the context window is dramatically larger, we noticed over the course of a couple of weeks that instruction adherence was dramatically degraded, and not just at longer context lengths.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;Even well-within what we would consider the &lt;a href=&quot;https://youtu.be/rmvDxxNubIg?si=O17nmS3SScaAkpp-&amp;amp;t=355&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;smart zone&lt;/a&gt; of a 200k-context frontier model, the model was less precise. It would ignore design documents and other inputs when writing a plan file. It would make trivial mistakes, or misunderstand simple instructions - or worse, directly disobey them.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;At longer context lengths, the degradation was even steeper - like the user instructions were getting drowned out by the intermediate tool results and mass of accumulated context.&lt;/p&gt;
        &lt;h2 class=&quot;text-3xl mb-3 mt-20 font-bold&quot; style=&quot;color:var(--text-primary)&quot;&gt;## &lt;!-- --&gt;What determines instruction adherence?&lt;/h2&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;We&#39;ve written about the concept of &lt;a href=&quot;https://www.humanlayer.dev/blog/skill-issue-harness-engineering-for-coding-agents&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;the instruction budget&lt;/a&gt; before - a measurable property of LLMs which describes how many instructions they can follow reasonably well before instruction adherence drops off.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;It&#39;s different for each model, but it&#39;s a function primarily of the quality of the model&#39;s instruction tuning, and more importantly for our purposes, it is &lt;strong style=&quot;color:var(--text-primary)&quot;&gt;strongly correlated with the size of the model&lt;/strong&gt;.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;&lt;img src=&quot;https://www.humanlayer.dev/blog/writing-a-good-claude-md/instructionfollowing.png&quot; alt=&quot;Instruction following&quot; class=&quot;max-w-full h-auto my-4 rounded-none&quot; style=&quot;border:1px solid var(--border-color)&quot; referrerpolicy=&quot;no-referrer&quot;&gt;&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;You can clearly see that the larger models from the &lt;a href=&quot;https://arxiv.org/pdf/2507.11538&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;study&lt;/a&gt; can follow dramatically more instructions before adherence drops off.&lt;/p&gt;
        &lt;h2 class=&quot;text-3xl mb-3 mt-20 font-bold&quot; style=&quot;color:var(--text-primary)&quot;&gt;## &lt;!-- --&gt;Instruction adherence at long context lengths&lt;/h2&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;Why does this matter?&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;When a lab offers an extended-context version of a model, you&#39;re usually not getting a bigger model with more parameters and therefore a larger &quot;instruction budget&quot; to go with the larger context window - you&#39;re likely getting the same model with some clever math (e.g. &lt;a href=&quot;https://arxiv.org/pdf/2309.00071&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;YaRN&lt;/a&gt;) to extend the sequence length the model can attend to, and probably some more post-training to stabilize the model at the longer sequence lengths.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;This means that while the context window size increases, the instruction budget remains the same. You can fit more context and more instructions in the context window, but the model isn&#39;t actually better at attending to those instructions over the context length.&lt;/p&gt;
        &lt;h2 class=&quot;text-3xl mb-3 mt-20 font-bold&quot; style=&quot;color:var(--text-primary)&quot;&gt;## &lt;!-- --&gt;Instruction-following as needle in a haystack&lt;/h2&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;You can think of your context window as a haystack where all of your tool calls and documents and files are hay - every line in your CLAUDE.md file, every instruction in your tool descriptions, every tool result, every instruction in your system prompt, and every user message.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;The quality &amp;amp; correctness of the agent&#39;s next step depends on the LLM&#39;s ability to find a needle in there: namely, the instruction(s) in the context window that are most-relevant to the current state of the context window, which give it the information it needs to make the correct decision about its next action.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;Now imagine we increase the size of the haystack by 500% - but the size of the needle remains the same. Unless our ability to find the needle also increases by 500%, we will have a dramatically harder time finding it. The extra context isn&#39;t really helping us - it&#39;s just digging us deeper into the dumb zone.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;&lt;img src=&quot;https://www.humanlayer.dev/blog/skill-issue/long-context.jpg&quot; alt=&quot;long context&quot; class=&quot;max-w-full h-auto my-4 rounded-none&quot; style=&quot;border:1px solid var(--border-color)&quot; referrerpolicy=&quot;no-referrer&quot;&gt;&lt;/p&gt;
        &lt;h2 class=&quot;text-3xl mb-3 mt-20 font-bold&quot; style=&quot;color:var(--text-primary)&quot;&gt;## &lt;!-- --&gt;What Works Instead&lt;/h2&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;Instead of trying to stuff as much information in a context window as possible and having the LLM reason over it, we found that the context management techniques we had been using had to be used even &lt;em style=&quot;color:var(--text-secondary)&quot;&gt;more&lt;/em&gt; aggressively to keep Opus 4.6 1M on track.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;In my post on &lt;a href=&quot;https://www.humanlayer.dev/blog/skill-issue-harness-engineering-for-coding-agents&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;harness engineering&lt;/a&gt;, I wrote about this as the limit case for using sub-agents:&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;&lt;img src=&quot;https://www.humanlayer.dev/blog/skill-issue/limit-case.png&quot; alt=&quot;limit case&quot; class=&quot;max-w-full h-auto my-4 rounded-none&quot; style=&quot;border:1px solid var(--border-color)&quot; referrerpolicy=&quot;no-referrer&quot;&gt;&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;The day after I published it, I decided maybe it wasn&#39;t such a bad idea, and sat down and wrote this skill:&lt;/p&gt;
        &lt;pre class=&quot;group&quot;&gt;&lt;pre class=&quot;prismjs rsh-code-block grid relative min-w-[250px] mb-6 bg-gray-400/5 p-4&quot;&gt;&lt;button data-slot=&quot;button&quot; class=&quot;inline-flex items-center justify-center gap-2 whitespace-nowrap rounded-none text-sm font-mono font-medium cursor-pointer disabled:cursor-not-allowed disabled:pointer-events-none disabled:opacity-50 [&amp;amp;_svg]:pointer-events-none [&amp;amp;_svg:not([class*=&#39;size-&#39;])]:size-4 shrink-0 [&amp;amp;_svg]:shrink-0 outline-none focus-visible:ring-[3px] uppercase tracking-wider border bg-transparent text-[var(--accent)] border-transparent hover:bg-[var(--accent)]/10 hover:border-[var(--accent)] focus-visible:border-[var(--accent)] focus-visible:ring-[var(--accent)]/50 size-9 absolute top-2 right-1 h-6 w-6 opacity-0 group-hover:opacity-100 transition-opacity duration-200 z-10&quot; aria-label=&quot;Copy code&quot; title=&quot;Copy code&quot;&gt;&lt;svg xmlns=&quot;http://www.w3.org/2000/svg&quot; width=&quot;24&quot; height=&quot;24&quot; viewBox=&quot;0 0 24 24&quot; fill=&quot;none&quot; stroke=&quot;currentColor&quot; stroke-width=&quot;2&quot; stroke-linecap=&quot;round&quot; stroke-linejoin=&quot;round&quot; class=&quot;lucide lucide-copy h-3 w-3&quot; aria-hidden=&quot;true&quot;&gt;&lt;rect width=&quot;14&quot; height=&quot;14&quot; x=&quot;8&quot; y=&quot;8&quot; rx=&quot;2&quot; ry=&quot;2&quot;&gt;&lt;/rect&gt;&lt;path d=&quot;M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/button&gt;&lt;code class=&quot;overflow-x-auto&quot; style=&quot;white-space:pre&quot;&gt;&lt;span class=&quot;&quot;&gt;---
        &lt;/span&gt;name: subagent-orchestrator
        &lt;!-- --&gt;description: orchestrate sub-agents to accomplish complex long-horizon tasks without losing coherency by delegating to sub-agents
        &lt;!-- --&gt;---
        &lt;!-- --&gt;
        &lt;!-- --&gt;This skill provides you with **CRITICAL** instructions that will help you to maintain coherency in long-horizon context-heavy tasks.
        &lt;!-- --&gt;
        &lt;!-- --&gt;You have a large number of tools available to you. The most important one is the one that allows you to dispatch sub-agents: either `Agent` or `Task`.
        &lt;!-- --&gt;
        &lt;!-- --&gt;All non-trivial operations should be delegated to sub-agents. You should delegate research and codebase understanding tasks to codebase-analyzer, codebase-locator and pattern-locator sub-agents.
        &lt;!-- --&gt;
        &lt;!-- --&gt;You should delegate running bash commands (particularly ones that are likely to produce lots of output) such as investigating with the `aws` CLI, using the `gh` CLI, digging through logs to `Bash` sub-agents.
        &lt;!-- --&gt;
        &lt;!-- --&gt;You should use separate sub-agents for separate tasks, and you may launch them in parallel - but do not delegate multiple tasks that are likely to have significant overlap to separate sub-agents.
        &lt;!-- --&gt;
        &lt;!-- --&gt;IMPORTANT: if the user has already given you a task, you should proceed with that task using this approach.
        &lt;!-- --&gt;
        &lt;!-- --&gt;If you have not already been explicitly given a taks, you should ask the user what task they would like for you to work on - do not assume or begin working on a ticket automatically.&lt;/code&gt;&lt;/pre&gt;&lt;/pre&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;I found that using this skill in combination with our existing workflow skills helped me keep Opus coherent at longer context lengths. Why?
        As I wrote about in that post, sub-agents encapsulate context and ensure that only highly-relevant context (the prompt, and the focused sub-agent result) end up in the context window, avoiding &lt;a href=&quot;https://research.trychroma.com/context-rot&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;context rot&lt;/a&gt;:&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;&lt;img src=&quot;https://www.humanlayer.dev/blog/skill-issue/context-firewall.png&quot; alt=&quot;context firewall&quot; class=&quot;max-w-full h-auto my-4 rounded-none&quot; style=&quot;border:1px solid var(--border-color)&quot; referrerpolicy=&quot;no-referrer&quot;&gt;&lt;/p&gt;
        &lt;h2 class=&quot;text-3xl mb-3 mt-20 font-bold&quot; style=&quot;color:var(--text-primary)&quot;&gt;## &lt;!-- --&gt;How we&#39;re incorporating this at HumanLayer&lt;/h2&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;We have had a feature in the tool for a while that warns the user when they&#39;re context is getting high. We used to set this at around %40 of sonnets 168k token window (200k - 32k reserved for output). This came out to about 100k tokens.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;To help users maximize instruction adherence and intelligence on hard codebase problems, we&#39;ve updated our context warnings for long-context models to trigger at the 100k token mark instead of 40% of the usable context. For opus 1m this is only 10% of the context window.&lt;/p&gt;
        &lt;h2 class=&quot;text-3xl mb-3 mt-20 font-bold&quot; style=&quot;color:var(--text-primary)&quot;&gt;## &lt;!-- --&gt;TL;DR&lt;/h2&gt;
        &lt;div class=&quot;pl-8 mb-6&quot;&gt;&lt;ol class=&quot;list-decimal space-y-4&quot; style=&quot;color:var(--text-secondary)&quot;&gt;
        &lt;li&gt;Long-context models degrade at all context lengths, not just long ones.&lt;/li&gt;
        &lt;li&gt;More context isn&#39;t more capability - the instruction budget doesn&#39;t scale with the context window.&lt;/li&gt;
        &lt;li&gt;Context isolation beats context expansion. Sub-agents, progressive disclosure, and &lt;a href=&quot;https://www.humanlayer.dev/blog/context-efficient-backpressure&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;context-efficient backpressure&lt;/a&gt; keep each context window small, focused, and in the smart zone.&lt;/li&gt;
        &lt;/ol&gt;&lt;/div&gt;</description>
      <link>https://www.humanlayer.dev/blog/long-context-isnt-the-answer</link>
      <guid isPermaLink="false">https://www.humanlayer.dev/blog/long-context-isnt-the-answer</guid>
      <pubDate>Sun, 22 Mar 2026 16:00:00 GMT</pubDate>
      <author>Kyle</author>
    </item>
    <item>
      <title>Getting Claude to Actually Read Your CLAUDE.md</title>
      <description>&lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;Claude Code wraps your CLAUDE.md in a &lt;code class=&quot;px-2 py-0.5 rounded font-mono bg-[var(--code-bg)] text-[var(--accent)] border border-[var(--code-border)] tracking-tight shadow-sm mr-1 align-middle wrap-anywhere box-decoration-clone&quot;&gt;&amp;lt;system_reminder&amp;gt;&lt;/code&gt; that explicitly tells the model the contents &quot;may or may not be relevant.&quot; The longer your file gets, the more Claude seems to treat individual sections as optional.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;We &lt;a href=&quot;https://www.humanlayer.dev/blog/writing-a-good-claude-md&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;wrote before&lt;/a&gt; about keeping these files concise and avoiding stuff that belongs in a linter. But even after trimming things down, I kept running into cases where Claude would ignore testing instructions while writing tests, or skip our API conventions when building endpoints.&lt;/p&gt;
        &lt;h2 class=&quot;text-3xl mb-3 mt-20 font-bold&quot; style=&quot;color:var(--text-primary)&quot;&gt;## &lt;!-- --&gt;Conditional &lt;code class=&quot;px-2 py-0.5 rounded font-mono bg-[var(--code-bg)] text-[var(--accent)] border border-[var(--code-border)] tracking-tight shadow-sm mr-1 align-middle wrap-anywhere box-decoration-clone&quot;&gt;&amp;lt;important if&amp;gt;&lt;/code&gt; blocks&lt;/h2&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;What&#39;s been working for me: wrapping sections in &lt;code class=&quot;px-2 py-0.5 rounded font-mono bg-[var(--code-bg)] text-[var(--accent)] border border-[var(--code-border)] tracking-tight shadow-sm mr-1 align-middle wrap-anywhere box-decoration-clone&quot;&gt;&amp;lt;important if=&quot;condition&quot;&amp;gt;&lt;/code&gt; tags.&lt;/p&gt;
        &lt;pre class=&quot;group&quot;&gt;&lt;pre class=&quot;prismjs rsh-code-block grid relative min-w-[250px] mb-6 bg-gray-400/5 p-4&quot;&gt;&lt;button data-slot=&quot;button&quot; class=&quot;inline-flex items-center justify-center gap-2 whitespace-nowrap rounded-none text-sm font-mono font-medium cursor-pointer disabled:cursor-not-allowed disabled:pointer-events-none disabled:opacity-50 [&amp;amp;_svg]:pointer-events-none [&amp;amp;_svg:not([class*=&#39;size-&#39;])]:size-4 shrink-0 [&amp;amp;_svg]:shrink-0 outline-none focus-visible:ring-[3px] uppercase tracking-wider border bg-transparent text-[var(--accent)] border-transparent hover:bg-[var(--accent)]/10 hover:border-[var(--accent)] focus-visible:border-[var(--accent)] focus-visible:ring-[var(--accent)]/50 size-9 absolute top-2 right-1 h-6 w-6 opacity-0 group-hover:opacity-100 transition-opacity duration-200 z-10&quot; aria-label=&quot;Copy code&quot; title=&quot;Copy code&quot;&gt;&lt;svg xmlns=&quot;http://www.w3.org/2000/svg&quot; width=&quot;24&quot; height=&quot;24&quot; viewBox=&quot;0 0 24 24&quot; fill=&quot;none&quot; stroke=&quot;currentColor&quot; stroke-width=&quot;2&quot; stroke-linecap=&quot;round&quot; stroke-linejoin=&quot;round&quot; class=&quot;lucide lucide-copy h-3 w-3&quot; aria-hidden=&quot;true&quot;&gt;&lt;rect width=&quot;14&quot; height=&quot;14&quot; x=&quot;8&quot; y=&quot;8&quot; rx=&quot;2&quot; ry=&quot;2&quot;&gt;&lt;/rect&gt;&lt;path d=&quot;M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/button&gt;&lt;code class=&quot;overflow-x-auto&quot; style=&quot;white-space:pre&quot;&gt;&lt;span class=&quot;&quot;&gt;&amp;lt;important if=&quot;you are writing or modifying tests&quot;&amp;gt;
        &lt;/span&gt;- Use `createTestApp()` helper for integration tests
        &lt;!-- --&gt;- Mock database with `dbMock` from `packages/db/test`
        &lt;!-- --&gt;- Test fixtures live in `__fixtures__/` directories
        &lt;!-- --&gt;&amp;lt;/important&amp;gt;&lt;/code&gt;&lt;/pre&gt;&lt;/pre&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;We don&#39;t have a rigorous explanation for why this helps. My guess is that the explicit condition gives Claude a clearer signal about when to apply the instructions, rather than leaving it to decide relevance on its own. Whatever the mechanism, we&#39;ve seen noticeably better adherence on tasks where only some sections of my CLAUDE.md should apply.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;A few things I&#39;ve learned:&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;&lt;strong style=&quot;color:var(--text-primary)&quot;&gt;Don&#39;t wrap everything.&lt;/strong&gt; Project identity, directory structure, and tech stack are relevant to basically every task. The conditional blocks are for testing setup or deployment procedures that only matter sometimes.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;&lt;strong style=&quot;color:var(--text-primary)&quot;&gt;Make conditions specific.&lt;/strong&gt; &lt;code class=&quot;px-2 py-0.5 rounded font-mono bg-[var(--code-bg)] text-[var(--accent)] border border-[var(--code-border)] tracking-tight shadow-sm mr-1 align-middle wrap-anywhere box-decoration-clone&quot;&gt;&amp;lt;important if=&quot;you are writing code&quot;&amp;gt;&lt;/code&gt; matches almost everything and defeats the purpose. I try to make each condition narrow enough that it only fires when I actually want those rules applied.&lt;/p&gt;
        &lt;h2 class=&quot;text-3xl mb-3 mt-20 font-bold&quot; style=&quot;color:var(--text-primary)&quot;&gt;## &lt;!-- --&gt;A skill to automate this&lt;/h2&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;I did a lot of restructuring CLAUDE.md files by hand, so I made a Claude Code skill that does the rewrite. It pulls out the foundational stuff, wraps domain-specific sections in conditional blocks, and cleans up some common problems (stale code snippets, style rules that should be in a linter, vague instructions that don&#39;t actually help).&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;It&#39;s not perfect—you&#39;ll probably want to review what it produces—but it&#39;s a decent starting point.&lt;/p&gt;
        &lt;h2 class=&quot;text-3xl mb-3 mt-20 font-bold&quot; style=&quot;color:var(--text-primary)&quot;&gt;## &lt;!-- --&gt;Try it&lt;/h2&gt;
        &lt;pre class=&quot;group&quot;&gt;&lt;pre class=&quot;prismjs rsh-code-block grid relative min-w-[250px] mb-6 bg-gray-400/5 p-4&quot;&gt;&lt;button data-slot=&quot;button&quot; class=&quot;inline-flex items-center justify-center gap-2 whitespace-nowrap rounded-none text-sm font-mono font-medium cursor-pointer disabled:cursor-not-allowed disabled:pointer-events-none disabled:opacity-50 [&amp;amp;_svg]:pointer-events-none [&amp;amp;_svg:not([class*=&#39;size-&#39;])]:size-4 shrink-0 [&amp;amp;_svg]:shrink-0 outline-none focus-visible:ring-[3px] uppercase tracking-wider border bg-transparent text-[var(--accent)] border-transparent hover:bg-[var(--accent)]/10 hover:border-[var(--accent)] focus-visible:border-[var(--accent)] focus-visible:ring-[var(--accent)]/50 size-9 absolute top-2 right-1 h-6 w-6 opacity-0 group-hover:opacity-100 transition-opacity duration-200 z-10&quot; aria-label=&quot;Copy code&quot; title=&quot;Copy code&quot;&gt;&lt;svg xmlns=&quot;http://www.w3.org/2000/svg&quot; width=&quot;24&quot; height=&quot;24&quot; viewBox=&quot;0 0 24 24&quot; fill=&quot;none&quot; stroke=&quot;currentColor&quot; stroke-width=&quot;2&quot; stroke-linecap=&quot;round&quot; stroke-linejoin=&quot;round&quot; class=&quot;lucide lucide-copy h-3 w-3&quot; aria-hidden=&quot;true&quot;&gt;&lt;rect width=&quot;14&quot; height=&quot;14&quot; x=&quot;8&quot; y=&quot;8&quot; rx=&quot;2&quot; ry=&quot;2&quot;&gt;&lt;/rect&gt;&lt;path d=&quot;M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/button&gt;&lt;code class=&quot;overflow-x-auto&quot; style=&quot;white-space:pre&quot;&gt;&lt;span class=&quot;&quot;&gt;npx skills add humanlayer/skills --skill improve-claude-md&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/pre&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;Then in your project:&lt;/p&gt;
        &lt;pre class=&quot;group&quot;&gt;&lt;pre class=&quot;prismjs rsh-code-block grid relative min-w-[250px] mb-6 bg-gray-400/5 p-4&quot;&gt;&lt;button data-slot=&quot;button&quot; class=&quot;inline-flex items-center justify-center gap-2 whitespace-nowrap rounded-none text-sm font-mono font-medium cursor-pointer disabled:cursor-not-allowed disabled:pointer-events-none disabled:opacity-50 [&amp;amp;_svg]:pointer-events-none [&amp;amp;_svg:not([class*=&#39;size-&#39;])]:size-4 shrink-0 [&amp;amp;_svg]:shrink-0 outline-none focus-visible:ring-[3px] uppercase tracking-wider border bg-transparent text-[var(--accent)] border-transparent hover:bg-[var(--accent)]/10 hover:border-[var(--accent)] focus-visible:border-[var(--accent)] focus-visible:ring-[var(--accent)]/50 size-9 absolute top-2 right-1 h-6 w-6 opacity-0 group-hover:opacity-100 transition-opacity duration-200 z-10&quot; aria-label=&quot;Copy code&quot; title=&quot;Copy code&quot;&gt;&lt;svg xmlns=&quot;http://www.w3.org/2000/svg&quot; width=&quot;24&quot; height=&quot;24&quot; viewBox=&quot;0 0 24 24&quot; fill=&quot;none&quot; stroke=&quot;currentColor&quot; stroke-width=&quot;2&quot; stroke-linecap=&quot;round&quot; stroke-linejoin=&quot;round&quot; class=&quot;lucide lucide-copy h-3 w-3&quot; aria-hidden=&quot;true&quot;&gt;&lt;rect width=&quot;14&quot; height=&quot;14&quot; x=&quot;8&quot; y=&quot;8&quot; rx=&quot;2&quot; ry=&quot;2&quot;&gt;&lt;/rect&gt;&lt;path d=&quot;M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/button&gt;&lt;code class=&quot;overflow-x-auto&quot; style=&quot;white-space:pre&quot;&gt;&lt;span class=&quot;&quot;&gt;/improve-claude-md&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/pre&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;It reads your existing CLAUDE.md and outputs a rewritten version. Source is at &lt;a href=&quot;https://github.com/humanlayer/skills/tree/main/plugins/improve-claude-md&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;github.com/humanlayer/skills&lt;/a&gt;.&lt;/p&gt;</description>
      <link>https://www.humanlayer.dev/blog/stop-claude-from-ignoring-your-claude-md</link>
      <guid isPermaLink="false">https://www.humanlayer.dev/blog/stop-claude-from-ignoring-your-claude-md</guid>
      <pubDate>Mon, 16 Mar 2026 16:00:00 GMT</pubDate>
      <author>Dex</author>
    </item>
    <item>
      <title>Skill Issue: Harness Engineering for Coding Agents</title>
      <description>&lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;We&#39;ve spent the past year watching coding agents fail in every conceivable way: ignoring instructions, executing dangerous commands un-prompted, and going in circles on the simplest of tasks.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;We&#39;ve seen teams ship immense amounts of slop. We&#39;ve even &lt;a href=&quot;https://www.youtube.com/live/99Kxkemj1g8?si=tVsK88M3XPyo0urD&amp;amp;t=20967&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;shipped a little bit of slop ourselves&lt;/a&gt;.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;Every time, the instinct was the same:&lt;/p&gt;
        &lt;div class=&quot;pl-8 mb-6&quot;&gt;&lt;ul class=&quot;list-disc space-y-4&quot; style=&quot;color:var(--text-secondary)&quot;&gt;
        &lt;li&gt;&quot;We just need better models, GPT-6 will fix it&quot;&lt;/li&gt;
        &lt;li&gt;&quot;We just need better instruction-following&quot;&lt;/li&gt;
        &lt;li&gt;&quot;It&#39;ll work once [niche library I&#39;m using] is in the training data&quot;&lt;/li&gt;
        &lt;/ul&gt;&lt;/div&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;But over the course of dozens of projects and hundreds of agent sessions, we kept arriving at the same conclusion: it&#39;s not a model problem. It&#39;s a &lt;strong style=&quot;color:var(--text-primary)&quot;&gt;configuration problem&lt;/strong&gt;.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;Yes, models will get smarter, and some existing failure modes will disappear. And then because they are smarter, we will give them new problems which are bigger and harder, and they will &lt;strong style=&quot;color:var(--text-primary)&quot;&gt;continue to fail in unexpected ways&lt;/strong&gt;. Unexpected failures modes are a fundamental problem for non-deterministic systems.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;&lt;img src=&quot;https://www.humanlayer.dev/blog/skill-issue/gpt-6.png&quot; alt=&quot;gpt-6&quot; class=&quot;max-w-full h-auto my-4 rounded-none&quot; style=&quot;border:1px solid var(--border-color)&quot; referrerpolicy=&quot;no-referrer&quot;&gt;&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;So instead of praying for &lt;code class=&quot;px-2 py-0.5 rounded font-mono bg-[var(--code-bg)] text-[var(--accent)] border border-[var(--code-border)] tracking-tight shadow-sm mr-1 align-middle wrap-anywhere box-decoration-clone&quot;&gt;gpt-6.4-codex-ultrahigh_extended&lt;/code&gt; to save us all, we try to focus on answering the question of &lt;em style=&quot;color:var(--text-secondary)&quot;&gt;&quot;how do we get the most out of &lt;strong style=&quot;color:var(--text-primary)&quot;&gt;today&#39;s&lt;/strong&gt; models?&quot;&lt;/em&gt;&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;There are lots of ways to get better performance out of your coding agent. If you use coding agents for moderately hard tasks, you&#39;ve probably configured your coding agent a bit. Have you used skills? MCP servers? Sub-agents? Memory? AGENTS.md files?&lt;/p&gt;
        &lt;pre class=&quot;group&quot;&gt;&lt;pre class=&quot;prismjs rsh-code-block grid relative min-w-[250px] mb-6 bg-gray-400/5 p-4&quot;&gt;&lt;button data-slot=&quot;button&quot; class=&quot;inline-flex items-center justify-center gap-2 whitespace-nowrap rounded-none text-sm font-mono font-medium cursor-pointer disabled:cursor-not-allowed disabled:pointer-events-none disabled:opacity-50 [&amp;amp;_svg]:pointer-events-none [&amp;amp;_svg:not([class*=&#39;size-&#39;])]:size-4 shrink-0 [&amp;amp;_svg]:shrink-0 outline-none focus-visible:ring-[3px] uppercase tracking-wider border bg-transparent text-[var(--accent)] border-transparent hover:bg-[var(--accent)]/10 hover:border-[var(--accent)] focus-visible:border-[var(--accent)] focus-visible:ring-[var(--accent)]/50 size-9 absolute top-2 right-1 h-6 w-6 opacity-0 group-hover:opacity-100 transition-opacity duration-200 z-10&quot; aria-label=&quot;Copy code&quot; title=&quot;Copy code&quot;&gt;&lt;svg xmlns=&quot;http://www.w3.org/2000/svg&quot; width=&quot;24&quot; height=&quot;24&quot; viewBox=&quot;0 0 24 24&quot; fill=&quot;none&quot; stroke=&quot;currentColor&quot; stroke-width=&quot;2&quot; stroke-linecap=&quot;round&quot; stroke-linejoin=&quot;round&quot; class=&quot;lucide lucide-copy h-3 w-3&quot; aria-hidden=&quot;true&quot;&gt;&lt;rect width=&quot;14&quot; height=&quot;14&quot; x=&quot;8&quot; y=&quot;8&quot; rx=&quot;2&quot; ry=&quot;2&quot;&gt;&lt;/rect&gt;&lt;path d=&quot;M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/button&gt;&lt;code class=&quot;overflow-x-auto&quot; style=&quot;white-space:pre&quot;&gt;&lt;span class=&quot;&quot;&gt;coding agent = AI model(s) + harness&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/pre&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;These are all technically separate concepts, but they are all part of the coding agent&#39;s configuration surface. We call this the &lt;strong style=&quot;color:var(--text-primary)&quot;&gt;coding agent&#39;s harness&lt;/strong&gt;, and we think of it as &lt;a href=&quot;https://www.vtrivedy.com/posts/claude-code-sdk-haas-harness-as-a-service&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;the agent’s runtime&lt;/a&gt;, or as its peripherals: what does the model use to interact with its environment?&lt;/p&gt;
        &lt;h2 class=&quot;text-3xl mb-3 mt-20 font-bold&quot; style=&quot;color:var(--text-primary)&quot;&gt;## &lt;!-- --&gt;Harness Engineering&lt;/h2&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;&lt;strong style=&quot;color:var(--text-primary)&quot;&gt;Harness engineering&lt;/strong&gt;, coined by &lt;a href=&quot;https://x.com/Vtrivedy10&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;Viv&lt;/a&gt;, describes the practice of leveraging these configuration points to customize and improve your coding agent&#39;s output quality and reliability.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;&lt;img src=&quot;https://www.humanlayer.dev/blog/skill-issue/harness-components.png&quot; alt=&quot;harness components&quot; class=&quot;max-w-full h-auto my-4 rounded-none&quot; style=&quot;border:1px solid var(--border-color)&quot; referrerpolicy=&quot;no-referrer&quot;&gt;
        &lt;em style=&quot;color:var(--text-secondary)&quot;&gt;Image from &lt;a href=&quot;https://blog.langchain.com/the-anatomy-of-an-agent-harness/&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;Viv&#39;s post&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;As &lt;a href=&quot;https://mitchellh.com/writing/my-ai-adoption-journey#step-5-engineer-the-harness&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;Mitchell Hashimoto put it&lt;/a&gt;, harness engineering&lt;/p&gt;
        &lt;blockquote class=&quot;border-l-4 pl-4 my-4 italic&quot; style=&quot;border-color:var(--accent);color:var(--text-muted)&quot;&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;[...] is the idea that anytime you find an agent makes a mistake, you take the time to engineer a solution such that the agent never makes that mistake again.&lt;/p&gt;
        &lt;/blockquote&gt;
        &lt;h2 class=&quot;text-3xl mb-3 mt-20 font-bold&quot; style=&quot;color:var(--text-primary)&quot;&gt;## &lt;!-- --&gt;... as a Subset of Context Engineering&lt;/h2&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;We view harness engineering as a subset of &lt;a href=&quot;https://www.humanlayer.dev/blog/advanced-context-engineering&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;context engineering&lt;/a&gt;. Coined by my cofounder &lt;a href=&quot;https://x.com/dexhorthy&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;Dex&lt;/a&gt; in &lt;a href=&quot;https://github.com/humanlayer/12-factor-agents&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;12-factor agents&lt;/a&gt;, context engineering is a superset of “prompt engineering” and a variety of other techniques for systematically improving AI agents’ reliability.&amp;nbsp;You can find the original talk on &lt;a href=&quot;https://www.youtube.com/watch?v=IS_y40zY-hc&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;Harness engineering then is the subset of context engineering which primarily involves leveraging harness configuration points to carefully manage the context windows of coding agents.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;&lt;img src=&quot;https://www.humanlayer.dev/blog/skill-issue/harness-engineering.png&quot; alt=&quot;harness engineering as context engineering&quot; class=&quot;max-w-full h-auto my-4 rounded-none&quot; style=&quot;border:1px solid var(--border-color)&quot; referrerpolicy=&quot;no-referrer&quot;&gt;&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;It answers:&lt;/p&gt;
        &lt;div class=&quot;pl-8 mb-6&quot;&gt;&lt;ul class=&quot;list-disc space-y-4&quot; style=&quot;color:var(--text-secondary)&quot;&gt;
        &lt;li&gt;How do we give our coding agent new capabilities?&lt;/li&gt;
        &lt;li&gt;How do we teach it things about our codebase that aren’t in the training data?&lt;/li&gt;
        &lt;li&gt;How do we add determinism beyond &lt;code class=&quot;px-2 py-0.5 rounded font-mono bg-[var(--code-bg)] text-[var(--accent)] border border-[var(--code-border)] tracking-tight shadow-sm mr-1 align-middle wrap-anywhere box-decoration-clone&quot;&gt;CRITICAL: always do XYZ&lt;/code&gt; in the system message?&lt;/li&gt;
        &lt;li&gt;How do we adapt the agent’s behavior for our specific codebase?&lt;/li&gt;
        &lt;li&gt;How do we increase task success rates beyond “magic prompts”?&lt;/li&gt;
        &lt;li&gt;How do we prevent our context window from inflating too rapidly, or with too much bad context?&lt;/li&gt;
        &lt;/ul&gt;&lt;/div&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;Skills, MCP servers, sub-agents, hooks, and back-pressure mechanisms are all tactical solutions we’ve arrived at.&lt;/p&gt;
        &lt;h3 class=&quot;text-2xl mb-2 mt-16 font-bold&quot; style=&quot;color:var(--text-primary)&quot;&gt;### &lt;!-- --&gt;Views on Harness Engineering&lt;/h3&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;Viv’s posts on harness engineering are worth reading alongside this one — &lt;a href=&quot;https://www.vtrivedy.com/posts/claude-code-sdk-haas-harness-as-a-service&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;the first&lt;/a&gt; frames the four customization levers (system prompt, tools/MCPs, context, sub-agents), and &lt;a href=&quot;https://blog.langchain.com/the-anatomy-of-an-agent-harness/&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;the second&lt;/a&gt; works backwards from what models &lt;em style=&quot;color:var(--text-secondary)&quot;&gt;can’t&lt;/em&gt; do natively to derive why each harness component exists.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;&lt;img src=&quot;https://www.humanlayer.dev/blog/skill-issue/backwards.png&quot; alt=&quot;working backwards from what models can&#39;t do natively&quot; class=&quot;max-w-full h-auto my-4 rounded-none&quot; style=&quot;border:1px solid var(--border-color)&quot; referrerpolicy=&quot;no-referrer&quot;&gt;
        &lt;em style=&quot;color:var(--text-secondary)&quot;&gt;Image from &lt;a href=&quot;https://blog.langchain.com/the-anatomy-of-an-agent-harness/&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;Viv&#39;s post&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;We’d add two levers he doesn’t emphasize:&lt;/p&gt;
        &lt;div class=&quot;pl-8 mb-6&quot;&gt;&lt;ol class=&quot;list-decimal space-y-4&quot; style=&quot;color:var(--text-secondary)&quot;&gt;
        &lt;li&gt;&lt;strong style=&quot;color:var(--text-primary)&quot;&gt;hooks&lt;/strong&gt; for automated integration and deterministic control flow&lt;/li&gt;
        &lt;li&gt;&lt;strong style=&quot;color:var(--text-primary)&quot;&gt;skills&lt;/strong&gt; for progressive disclosure of knowledge. (Dex likes to refer to them as &quot;Instruction Modules&quot; - more on this in another post.)&lt;/li&gt;
        &lt;/ol&gt;&lt;/div&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;After months of solving hard problems in complex brownfield enterprise-scale codebases, we have found that sub-agents are a particularly powerful lever. When working on hard problems that require many, many context windows to solve, &lt;strong style=&quot;color:var(--text-primary)&quot;&gt;sub-agents are the key to maintaining coherency across many sessions&lt;/strong&gt;. Sub-agents &lt;strong style=&quot;color:var(--text-primary)&quot;&gt;function as a &quot;context firewall&quot;&lt;/strong&gt; that ensures discrete tasks can run in isolated context windows so none of the intermediate noise accumulates in your parent thread which is responsible for orchestration, and you can maintain coherency for much, much longer.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;&lt;img src=&quot;https://www.humanlayer.dev/blog/skill-issue/context-firewall.png&quot; alt=&quot;context firewall&quot; class=&quot;max-w-full h-auto my-4 rounded-none&quot; style=&quot;border:1px solid var(--border-color)&quot; referrerpolicy=&quot;no-referrer&quot;&gt;&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;OpenAI recently wrote a &lt;a href=&quot;https://openai.com/index/harness-engineering/&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;blog post&lt;/a&gt; on the topic as well. There&#39;s some great content in there, and it seems to indicate that they view harness engineering as configuring everything &lt;em style=&quot;color:var(--text-secondary)&quot;&gt;outside&lt;/em&gt; of the agent&#39;s runtime. It&#39;s more focused on back-pressure and verification mechanisms. (Although this may be a mis-reading; the post is somewhat unclear: the word &quot;harness&quot; only appears once in the text of the post, and in reference to evals rather than harness engineering itself.)&lt;/p&gt;
        &lt;h3 class=&quot;text-2xl mb-2 mt-16 font-bold&quot; style=&quot;color:var(--text-primary)&quot;&gt;### &lt;!-- --&gt;But What About Post-Training?&lt;/h3&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;Given that frontier coding models are post-trained on their harnesses (e.g. Claude in Claude Code, GPT-5 Codex in Codex), some will argue that the best harness and/or configuration is the one that the model was trained on.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;For example, the Codex models are so tightly coupled with the Codex harness&#39;s &lt;code class=&quot;px-2 py-0.5 rounded font-mono bg-[var(--code-bg)] text-[var(--accent)] border border-[var(--code-border)] tracking-tight shadow-sm mr-1 align-middle wrap-anywhere box-decoration-clone&quot;&gt;apply_patch&lt;/code&gt; tool that &lt;a href=&quot;https://opencode.ai/&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;OpenCode&lt;/a&gt; — built as an open-source alternative to Claude Code — had to &lt;a href=&quot;https://github.com/anomalyco/opencode/pull/9127&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;add an &lt;code class=&quot;px-2 py-0.5 rounded font-mono bg-[var(--code-bg)] text-[var(--accent)] border border-[var(--code-border)] tracking-tight shadow-sm mr-1 align-middle wrap-anywhere box-decoration-clone&quot;&gt;apply_patch&lt;/code&gt; tool&lt;/a&gt; specifically for GPT/Codex models to mimic the Codex harness to improve the Codex models&#39; performance in the OpenCode harness - while Claude and other models still use normal &lt;code class=&quot;px-2 py-0.5 rounded font-mono bg-[var(--code-bg)] text-[var(--accent)] border border-[var(--code-border)] tracking-tight shadow-sm mr-1 align-middle wrap-anywhere box-decoration-clone&quot;&gt;edit&lt;/code&gt; and &lt;code class=&quot;px-2 py-0.5 rounded font-mono bg-[var(--code-bg)] text-[var(--accent)] border border-[var(--code-border)] tracking-tight shadow-sm mr-1 align-middle wrap-anywhere box-decoration-clone&quot;&gt;write&lt;/code&gt; tools.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;This &lt;em style=&quot;color:var(--text-secondary)&quot;&gt;can&lt;/em&gt; mean that a model will perform better when coupled with the harness it was post-trained on, and some might infer that this means you shouldn&#39;t customize the harness at all.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;But it cuts both ways: &lt;strong style=&quot;color:var(--text-primary)&quot;&gt;models can be over-fitted to their harness&lt;/strong&gt;. Viv cites &lt;a href=&quot;https://terminalbench.com/&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;Terminal Bench 2.0&lt;/a&gt; where Opus 4.6 in Claude Code comes in position #33, but when placed in a different harness that wasn&#39;t seen during post-training, it comes in at #5 (+/- about 4 positions in either direction).&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;&lt;img src=&quot;https://www.humanlayer.dev/blog/skill-issue/terminal-bench.png&quot; alt=&quot;terminal bench&quot; class=&quot;max-w-full h-auto my-4 rounded-none&quot; style=&quot;border:1px solid var(--border-color)&quot; referrerpolicy=&quot;no-referrer&quot;&gt;&lt;/p&gt;
        &lt;h2 class=&quot;text-3xl mb-3 mt-20 font-bold&quot; style=&quot;color:var(--text-primary)&quot;&gt;## &lt;!-- --&gt;Engineering Your Harness&lt;/h2&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;With that in mind, let&#39;s walk through the configuration surfaces we&#39;ve found most impactful.&lt;/p&gt;
        &lt;h3 class=&quot;text-2xl mb-2 mt-16 font-bold&quot; style=&quot;color:var(--text-primary)&quot;&gt;### &lt;!-- --&gt;CLAUDE.md &amp;amp; AGENTS.md&lt;/h3&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;Before touching any other harness configuration points, it&#39;s usually worth customizing your CLAUDE.md / AGENTS.md files. These are markdown files at the top-level of your repository that get deterministically injected into the agent&#39;s system prompt by the harness.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;We have already shared &lt;a href=&quot;https://www.humanlayer.dev/blog/writing-a-good-claude-md&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;some opinions about what makes a good CLAUDE.md&lt;/a&gt; file and how to use it correctly, so give that a read if you&#39;re not familiar with it. Matt Pocock also wrote a &lt;a href=&quot;https://www.aihero.dev/a-complete-guide-to-agents-md&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;great follow-up&lt;/a&gt; that more generally applies to AGENTS.md.&lt;/p&gt;
        &lt;h4 class=&quot;text-xl mb-1 mt-8 font-bold&quot; style=&quot;color:var(--text-primary)&quot;&gt;#### &lt;!-- --&gt;The ETH Zurich Study&lt;/h4&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;After ETH Zurich published their &lt;a href=&quot;https://arxiv.org/abs/2602.11988&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;study testing 138 agentfiles across various repos&lt;/a&gt; which indicated that most agentfiles were useless-or-worse, we got a lot of feedback on our &lt;a href=&quot;https://www.humanlayer.dev/blog/writing-a-good-claude-md&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;CLAUDE.md post&lt;/a&gt;:&lt;/p&gt;
        &lt;blockquote class=&quot;border-l-4 pl-4 my-4 italic&quot; style=&quot;border-color:var(--accent);color:var(--text-muted)&quot;&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;See, CLAUDE.md files don&#39;t even help — they&#39;re a waste of time.&lt;/p&gt;
        &lt;/blockquote&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;Indeed: the study tested many agentfiles across a wide variety of repos, and found:&lt;/p&gt;
        &lt;div class=&quot;pl-8 mb-6&quot;&gt;&lt;ul class=&quot;list-disc space-y-4&quot; style=&quot;color:var(--text-secondary)&quot;&gt;
        &lt;li&gt;that LLM-generated ones actually &lt;em style=&quot;color:var(--text-secondary)&quot;&gt;hurt&lt;/em&gt; performance while costing 20%+ more&lt;/li&gt;
        &lt;li&gt;human-written ones only helped about 4%.&lt;/li&gt;
        &lt;li&gt;Agents spent 14-22% more reasoning tokens processing context file instructions, took more steps to complete tasks, and ran more tools — all without improving resolution rates.&lt;/li&gt;
        &lt;li&gt;Codebase overviews and directory listings didn&#39;t help at all; agents discover repository structure on their own just fine.&lt;/li&gt;
        &lt;/ul&gt;&lt;/div&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;A careful reading of the study &lt;strong style=&quot;color:var(--text-primary)&quot;&gt;indicates that what we said in our post was correct:&lt;/strong&gt;&lt;/p&gt;
        &lt;div class=&quot;pl-8 mb-6&quot;&gt;&lt;ul class=&quot;list-disc space-y-4&quot; style=&quot;color:var(--text-secondary)&quot;&gt;
        &lt;li&gt;&lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;Agent-generated files were worse. Yes, we said:&lt;/p&gt;
        &lt;blockquote class=&quot;border-l-4 pl-4 my-4 italic&quot; style=&quot;border-color:var(--accent);color:var(--text-muted)&quot;&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;avoid auto-generating it. You should carefully craft its contents for best results.&lt;/p&gt;
        &lt;/blockquote&gt;
        &lt;/li&gt;
        &lt;li&gt;&lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;Lots of files too-heavily-steered the model to use specific tools, causing worse outcomes. Yep, we said:&lt;/p&gt;
        &lt;blockquote class=&quot;border-l-4 pl-4 my-4 italic&quot; style=&quot;border-color:var(--accent);color:var(--text-muted)&quot;&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;Less (instructions) is more. While you shouldn&#39;t omit necessary instructions, you should include as few instructions as reasonably possible in the file.&lt;/p&gt;
        &lt;/blockquote&gt;
        &lt;/li&gt;
        &lt;li&gt;&lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;Files contained irrelevant context. Yep, we said:&lt;/p&gt;
        &lt;blockquote class=&quot;border-l-4 pl-4 my-4 italic&quot; style=&quot;border-color:var(--accent);color:var(--text-muted)&quot;&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;Use Progressive Disclosure&lt;/p&gt;
        &lt;/blockquote&gt;
        &lt;/li&gt;
        &lt;li&gt;&lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;The human-written ones barely helped because of too many conditional rules. Yep, we said:&lt;/p&gt;
        &lt;blockquote class=&quot;border-l-4 pl-4 my-4 italic&quot; style=&quot;border-color:var(--accent);color:var(--text-muted)&quot;&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;Keep the contents of your CLAUDE.md concise and universally applicable.&lt;/p&gt;
        &lt;/blockquote&gt;
        &lt;/li&gt;
        &lt;/ul&gt;&lt;/div&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;Our CLAUDE.md is under 60 lines.&lt;/p&gt;
        &lt;h3 class=&quot;text-2xl mb-2 mt-16 font-bold&quot; style=&quot;color:var(--text-primary)&quot;&gt;### &lt;!-- --&gt;MCP Servers Are for Tools&lt;/h3&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;MCP servers are&amp;nbsp;primarily for plugging tools into your coding agent to extend its capabilities beyond file I/O and bash commands. The MCP specification includes additional features like resources, prompts, and elicitations, but &lt;strong style=&quot;color:var(--text-primary)&quot;&gt;these are generally not well-supported by MCP clients and coding agent harnesses&lt;/strong&gt;.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;The MCP spec supports servers that run on your (or the agent&#39;s) local machine which allow the agent to interact with its local environment, but it also supports HTTP-based MCP servers that can connect your agent with remote tools and services like Linear, Sentry, and more.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;When you plug an MCP server into your coding agent, the list of available tools, their descriptions, and the arguments needed to invoke them &lt;strong style=&quot;color:var(--text-primary)&quot;&gt;are injected into your coding agent&#39;s system prompt&lt;/strong&gt;. As a result, the MCP server can use the tool descriptions to customize your agent’s behavior by providing your agent with instructions about when to use them.&amp;nbsp;&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;&lt;strong style=&quot;color:var(--text-primary)&quot;&gt;WARNING&lt;/strong&gt;: because MCP servers’ tool descriptions are added to your coding agent’s system prompt, never connect to one you don’t trust. This can be a dangerous vector for prompt injection! STDIO servers and other servers that run client-side with npx or uvx can also execute code on your host in the absence of prompt injection.&lt;/p&gt;
        &lt;h4 class=&quot;text-xl mb-1 mt-8 font-bold&quot; style=&quot;color:var(--text-primary)&quot;&gt;#### &lt;!-- --&gt;Too Many Tools Is Bad&lt;/h4&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;We’ve seen this firsthand: plug too many MCP tools into your agent, and the context window fills up with tool descriptions, pushing you into &lt;a href=&quot;https://youtu.be/rmvDxxNubIg?si=O17nmS3SScaAkpp-&amp;amp;t=355&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;the dumb zone&lt;/a&gt; much faster:&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;&lt;img src=&quot;https://www.humanlayer.dev/blog/skill-issue/too-many-tools.png&quot; alt=&quot;too many tools&quot; class=&quot;max-w-full h-auto my-4 rounded-none&quot; style=&quot;border:1px solid var(--border-color)&quot; referrerpolicy=&quot;no-referrer&quot;&gt;&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;The &lt;a href=&quot;https://www.aihero.dev/a-complete-guide-to-agents-md#the-instruction-budget&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;instruction budget&lt;/a&gt; matters too — every irrelevant tool description is an instruction the agent has to process without any benefit.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;In fact, these failure modes are so common that Anthropic released &lt;a href=&quot;https://platform.claude.com/docs/en/agents-and-tools/tool-use/tool-search-tool&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;experimental support for MCP tool search&lt;/a&gt; to progressively disclose tools to Claude when the user has too many MCP tools connected. TL;DR: if you’re not actively using a server which provides a large number of tools, turn it off.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;We also found that if an MCP server duplicates functionality that’s already available as a CLI well-represented in training data, it works better to just prompt the agent to use the CLI. For things like GitHub, Docker, or most databases, your coding agent can just use the right CLIs and shell commands. The model has seen these tools enough during training that it already knows how to use them, and you gain the added benefit of composability with tools like &lt;code class=&quot;px-2 py-0.5 rounded font-mono bg-[var(--code-bg)] text-[var(--accent)] border border-[var(--code-border)] tracking-tight shadow-sm mr-1 align-middle wrap-anywhere box-decoration-clone&quot;&gt;grep&lt;/code&gt; and &lt;code class=&quot;px-2 py-0.5 rounded font-mono bg-[var(--code-bg)] text-[var(--accent)] border border-[var(--code-border)] tracking-tight shadow-sm mr-1 align-middle wrap-anywhere box-decoration-clone&quot;&gt;jq&lt;/code&gt; to enable additional context-efficiency.&lt;/p&gt;
        &lt;h4 class=&quot;text-xl mb-1 mt-8 font-bold&quot; style=&quot;color:var(--text-primary)&quot;&gt;#### &lt;!-- --&gt;Always Be Context-Engineering&lt;/h4&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;At HumanLayer, we used the Linear MCP server for a while before realizing that we really only used a small subset of the tools it provides - so we wrote a small CLI that wraps the Linear API and provides very context-efficient responses, and we included 6 example usages in our &lt;a href=&quot;https://www.humanlayer.dev/blog/writing-a-good-claude-md&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;CLAUDE.md&lt;/a&gt; file:&lt;/p&gt;
        &lt;pre class=&quot;group&quot;&gt;&lt;pre class=&quot;prismjs rsh-code-block grid relative min-w-[250px] mb-6 bg-gray-400/5 p-4&quot;&gt;&lt;button data-slot=&quot;button&quot; class=&quot;inline-flex items-center justify-center gap-2 whitespace-nowrap rounded-none text-sm font-mono font-medium cursor-pointer disabled:cursor-not-allowed disabled:pointer-events-none disabled:opacity-50 [&amp;amp;_svg]:pointer-events-none [&amp;amp;_svg:not([class*=&#39;size-&#39;])]:size-4 shrink-0 [&amp;amp;_svg]:shrink-0 outline-none focus-visible:ring-[3px] uppercase tracking-wider border bg-transparent text-[var(--accent)] border-transparent hover:bg-[var(--accent)]/10 hover:border-[var(--accent)] focus-visible:border-[var(--accent)] focus-visible:ring-[var(--accent)]/50 size-9 absolute top-2 right-1 h-6 w-6 opacity-0 group-

@github-actions
Copy link
Copy Markdown
Contributor

Auto Review

No clear rule violations found in the current diff.

Comment on lines +43 to +46
.filter((el) => {
const href = $(el).attr('href')!;
return !href.startsWith('/blog/tags/');
})
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Useless filtering since

const list = $('a.block.py-2.group[href^="/blog/"]')
does remove href starts with /blog/tags/

…anlayer blog

Address PR review feedback: move /blog/tags/ exclusion into the CSS selector
instead of using a separate .filter() call.

via [HAPI](https://hapi.run)

Co-Authored-By: HAPI <noreply@hapi.run>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 1, 2026

Successfully generated as following:

http://localhost:1200/humanlayer/blog - Success ✔️
<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:atom="http://www.w3.org/2005/Atom" version="2.0">
  <channel>
    <title>HumanLayer Blog</title>
    <link>https://www.humanlayer.dev/blog</link>
    <atom:link href="http://localhost:1200/humanlayer/blog" rel="self" type="application/rss+xml"></atom:link>
    <description>HumanLayer Blog - Powered by RSSHub</description>
    <generator>RSSHub</generator>
    <webMaster>contact@rsshub.app (RSSHub)</webMaster>
    <language>en</language>
    <lastBuildDate>Wed, 01 Apr 2026 09:14:14 GMT</lastBuildDate>
    <ttl>5</ttl>
    <item>
      <title>Long-Context Isn&#39;t the Answer</title>
      <description>&lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;Anthropic just switched the default model in Claude Code to Opus 4.6 with a 1M context window. We tried it when it launched. But now we&#39;re switching back to Opus 4.5&lt;/p&gt;
        &lt;h2 class=&quot;text-3xl mb-3 mt-20 font-bold&quot; style=&quot;color:var(--text-primary)&quot;&gt;## &lt;!-- --&gt;How about some context?&lt;/h2&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;Unlike previous Claude Opus models which have a ~ 200k context window, Opus 4.6 has as 1M context window. This was very exciting for us!&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;We spend a lot of time working on hard problems in large enterprise codebases, and if the model can reliably hold more of the codebase in its head at a time, so to speak, then that creates new possibilities for the scale of problem that can be solved in a single context window.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;It would mean less compaction (whether &lt;a href=&quot;https://www.humanlayer.dev/blog/advanced-context-engineering&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;frequent intentional compaction&lt;/a&gt; or auto-compact), and less &quot;context pressure&quot; to solve the problem before your context window fills up.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;As Calvin French-Owen, who helped launch Codex last year, &lt;a href=&quot;https://youtu.be/qwmmWzPnhog?si=ReRWTavUKTItIGGr&amp;amp;t=918&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;puts it in an episode of YC&#39;s Lightcone podcast&lt;/a&gt;:&lt;/p&gt;
        &lt;blockquote class=&quot;border-l-4 pl-4 my-4 italic&quot; style=&quot;border-color:var(--accent);color:var(--text-muted)&quot;&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;...imagine you&#39;re a college student. You&#39;re taking an exam. In the first five minutes of that exam, you&#39;re like, &quot;Oh, I have all the time in the world. I&#39;ll do a great job. I&#39;ll think through each of these problems.&quot;&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;Let&#39;s say you have like five minutes left and you still have half the exam left. You&#39;re like, &quot;Oh man, I just got to do whatever I can.&quot; Like, that&#39;s the LLM with a context window&lt;/p&gt;
        &lt;/blockquote&gt;
        &lt;h2 class=&quot;text-3xl mb-3 mt-20 font-bold&quot; style=&quot;color:var(--text-primary)&quot;&gt;## &lt;!-- --&gt;More context, less instruction adherence&lt;/h2&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;While the context window is dramatically larger, we noticed over the course of a couple of weeks that instruction adherence was dramatically degraded, and not just at longer context lengths.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;Even well-within what we would consider the &lt;a href=&quot;https://youtu.be/rmvDxxNubIg?si=O17nmS3SScaAkpp-&amp;amp;t=355&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;smart zone&lt;/a&gt; of a 200k-context frontier model, the model was less precise. It would ignore design documents and other inputs when writing a plan file. It would make trivial mistakes, or misunderstand simple instructions - or worse, directly disobey them.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;At longer context lengths, the degradation was even steeper - like the user instructions were getting drowned out by the intermediate tool results and mass of accumulated context.&lt;/p&gt;
        &lt;h2 class=&quot;text-3xl mb-3 mt-20 font-bold&quot; style=&quot;color:var(--text-primary)&quot;&gt;## &lt;!-- --&gt;What determines instruction adherence?&lt;/h2&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;We&#39;ve written about the concept of &lt;a href=&quot;https://www.humanlayer.dev/blog/skill-issue-harness-engineering-for-coding-agents&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;the instruction budget&lt;/a&gt; before - a measurable property of LLMs which describes how many instructions they can follow reasonably well before instruction adherence drops off.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;It&#39;s different for each model, but it&#39;s a function primarily of the quality of the model&#39;s instruction tuning, and more importantly for our purposes, it is &lt;strong style=&quot;color:var(--text-primary)&quot;&gt;strongly correlated with the size of the model&lt;/strong&gt;.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;&lt;img src=&quot;https://www.humanlayer.dev/blog/writing-a-good-claude-md/instructionfollowing.png&quot; alt=&quot;Instruction following&quot; class=&quot;max-w-full h-auto my-4 rounded-none&quot; style=&quot;border:1px solid var(--border-color)&quot; referrerpolicy=&quot;no-referrer&quot;&gt;&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;You can clearly see that the larger models from the &lt;a href=&quot;https://arxiv.org/pdf/2507.11538&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;study&lt;/a&gt; can follow dramatically more instructions before adherence drops off.&lt;/p&gt;
        &lt;h2 class=&quot;text-3xl mb-3 mt-20 font-bold&quot; style=&quot;color:var(--text-primary)&quot;&gt;## &lt;!-- --&gt;Instruction adherence at long context lengths&lt;/h2&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;Why does this matter?&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;When a lab offers an extended-context version of a model, you&#39;re usually not getting a bigger model with more parameters and therefore a larger &quot;instruction budget&quot; to go with the larger context window - you&#39;re likely getting the same model with some clever math (e.g. &lt;a href=&quot;https://arxiv.org/pdf/2309.00071&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;YaRN&lt;/a&gt;) to extend the sequence length the model can attend to, and probably some more post-training to stabilize the model at the longer sequence lengths.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;This means that while the context window size increases, the instruction budget remains the same. You can fit more context and more instructions in the context window, but the model isn&#39;t actually better at attending to those instructions over the context length.&lt;/p&gt;
        &lt;h2 class=&quot;text-3xl mb-3 mt-20 font-bold&quot; style=&quot;color:var(--text-primary)&quot;&gt;## &lt;!-- --&gt;Instruction-following as needle in a haystack&lt;/h2&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;You can think of your context window as a haystack where all of your tool calls and documents and files are hay - every line in your CLAUDE.md file, every instruction in your tool descriptions, every tool result, every instruction in your system prompt, and every user message.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;The quality &amp;amp; correctness of the agent&#39;s next step depends on the LLM&#39;s ability to find a needle in there: namely, the instruction(s) in the context window that are most-relevant to the current state of the context window, which give it the information it needs to make the correct decision about its next action.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;Now imagine we increase the size of the haystack by 500% - but the size of the needle remains the same. Unless our ability to find the needle also increases by 500%, we will have a dramatically harder time finding it. The extra context isn&#39;t really helping us - it&#39;s just digging us deeper into the dumb zone.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;&lt;img src=&quot;https://www.humanlayer.dev/blog/skill-issue/long-context.jpg&quot; alt=&quot;long context&quot; class=&quot;max-w-full h-auto my-4 rounded-none&quot; style=&quot;border:1px solid var(--border-color)&quot; referrerpolicy=&quot;no-referrer&quot;&gt;&lt;/p&gt;
        &lt;h2 class=&quot;text-3xl mb-3 mt-20 font-bold&quot; style=&quot;color:var(--text-primary)&quot;&gt;## &lt;!-- --&gt;What Works Instead&lt;/h2&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;Instead of trying to stuff as much information in a context window as possible and having the LLM reason over it, we found that the context management techniques we had been using had to be used even &lt;em style=&quot;color:var(--text-secondary)&quot;&gt;more&lt;/em&gt; aggressively to keep Opus 4.6 1M on track.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;In my post on &lt;a href=&quot;https://www.humanlayer.dev/blog/skill-issue-harness-engineering-for-coding-agents&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;harness engineering&lt;/a&gt;, I wrote about this as the limit case for using sub-agents:&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;&lt;img src=&quot;https://www.humanlayer.dev/blog/skill-issue/limit-case.png&quot; alt=&quot;limit case&quot; class=&quot;max-w-full h-auto my-4 rounded-none&quot; style=&quot;border:1px solid var(--border-color)&quot; referrerpolicy=&quot;no-referrer&quot;&gt;&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;The day after I published it, I decided maybe it wasn&#39;t such a bad idea, and sat down and wrote this skill:&lt;/p&gt;
        &lt;pre class=&quot;group&quot;&gt;&lt;pre class=&quot;prismjs rsh-code-block grid relative min-w-[250px] mb-6 bg-gray-400/5 p-4&quot;&gt;&lt;button data-slot=&quot;button&quot; class=&quot;inline-flex items-center justify-center gap-2 whitespace-nowrap rounded-none text-sm font-mono font-medium cursor-pointer disabled:cursor-not-allowed disabled:pointer-events-none disabled:opacity-50 [&amp;amp;_svg]:pointer-events-none [&amp;amp;_svg:not([class*=&#39;size-&#39;])]:size-4 shrink-0 [&amp;amp;_svg]:shrink-0 outline-none focus-visible:ring-[3px] uppercase tracking-wider border bg-transparent text-[var(--accent)] border-transparent hover:bg-[var(--accent)]/10 hover:border-[var(--accent)] focus-visible:border-[var(--accent)] focus-visible:ring-[var(--accent)]/50 size-9 absolute top-2 right-1 h-6 w-6 opacity-0 group-hover:opacity-100 transition-opacity duration-200 z-10&quot; aria-label=&quot;Copy code&quot; title=&quot;Copy code&quot;&gt;&lt;svg xmlns=&quot;http://www.w3.org/2000/svg&quot; width=&quot;24&quot; height=&quot;24&quot; viewBox=&quot;0 0 24 24&quot; fill=&quot;none&quot; stroke=&quot;currentColor&quot; stroke-width=&quot;2&quot; stroke-linecap=&quot;round&quot; stroke-linejoin=&quot;round&quot; class=&quot;lucide lucide-copy h-3 w-3&quot; aria-hidden=&quot;true&quot;&gt;&lt;rect width=&quot;14&quot; height=&quot;14&quot; x=&quot;8&quot; y=&quot;8&quot; rx=&quot;2&quot; ry=&quot;2&quot;&gt;&lt;/rect&gt;&lt;path d=&quot;M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/button&gt;&lt;code class=&quot;overflow-x-auto&quot; style=&quot;white-space:pre&quot;&gt;&lt;span class=&quot;&quot;&gt;---
        &lt;/span&gt;name: subagent-orchestrator
        &lt;!-- --&gt;description: orchestrate sub-agents to accomplish complex long-horizon tasks without losing coherency by delegating to sub-agents
        &lt;!-- --&gt;---
        &lt;!-- --&gt;
        &lt;!-- --&gt;This skill provides you with **CRITICAL** instructions that will help you to maintain coherency in long-horizon context-heavy tasks.
        &lt;!-- --&gt;
        &lt;!-- --&gt;You have a large number of tools available to you. The most important one is the one that allows you to dispatch sub-agents: either `Agent` or `Task`.
        &lt;!-- --&gt;
        &lt;!-- --&gt;All non-trivial operations should be delegated to sub-agents. You should delegate research and codebase understanding tasks to codebase-analyzer, codebase-locator and pattern-locator sub-agents.
        &lt;!-- --&gt;
        &lt;!-- --&gt;You should delegate running bash commands (particularly ones that are likely to produce lots of output) such as investigating with the `aws` CLI, using the `gh` CLI, digging through logs to `Bash` sub-agents.
        &lt;!-- --&gt;
        &lt;!-- --&gt;You should use separate sub-agents for separate tasks, and you may launch them in parallel - but do not delegate multiple tasks that are likely to have significant overlap to separate sub-agents.
        &lt;!-- --&gt;
        &lt;!-- --&gt;IMPORTANT: if the user has already given you a task, you should proceed with that task using this approach.
        &lt;!-- --&gt;
        &lt;!-- --&gt;If you have not already been explicitly given a taks, you should ask the user what task they would like for you to work on - do not assume or begin working on a ticket automatically.&lt;/code&gt;&lt;/pre&gt;&lt;/pre&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;I found that using this skill in combination with our existing workflow skills helped me keep Opus coherent at longer context lengths. Why?
        As I wrote about in that post, sub-agents encapsulate context and ensure that only highly-relevant context (the prompt, and the focused sub-agent result) end up in the context window, avoiding &lt;a href=&quot;https://research.trychroma.com/context-rot&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;context rot&lt;/a&gt;:&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;&lt;img src=&quot;https://www.humanlayer.dev/blog/skill-issue/context-firewall.png&quot; alt=&quot;context firewall&quot; class=&quot;max-w-full h-auto my-4 rounded-none&quot; style=&quot;border:1px solid var(--border-color)&quot; referrerpolicy=&quot;no-referrer&quot;&gt;&lt;/p&gt;
        &lt;h2 class=&quot;text-3xl mb-3 mt-20 font-bold&quot; style=&quot;color:var(--text-primary)&quot;&gt;## &lt;!-- --&gt;How we&#39;re incorporating this at HumanLayer&lt;/h2&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;We have had a feature in the tool for a while that warns the user when they&#39;re context is getting high. We used to set this at around %40 of sonnets 168k token window (200k - 32k reserved for output). This came out to about 100k tokens.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;To help users maximize instruction adherence and intelligence on hard codebase problems, we&#39;ve updated our context warnings for long-context models to trigger at the 100k token mark instead of 40% of the usable context. For opus 1m this is only 10% of the context window.&lt;/p&gt;
        &lt;h2 class=&quot;text-3xl mb-3 mt-20 font-bold&quot; style=&quot;color:var(--text-primary)&quot;&gt;## &lt;!-- --&gt;TL;DR&lt;/h2&gt;
        &lt;div class=&quot;pl-8 mb-6&quot;&gt;&lt;ol class=&quot;list-decimal space-y-4&quot; style=&quot;color:var(--text-secondary)&quot;&gt;
        &lt;li&gt;Long-context models degrade at all context lengths, not just long ones.&lt;/li&gt;
        &lt;li&gt;More context isn&#39;t more capability - the instruction budget doesn&#39;t scale with the context window.&lt;/li&gt;
        &lt;li&gt;Context isolation beats context expansion. Sub-agents, progressive disclosure, and &lt;a href=&quot;https://www.humanlayer.dev/blog/context-efficient-backpressure&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;context-efficient backpressure&lt;/a&gt; keep each context window small, focused, and in the smart zone.&lt;/li&gt;
        &lt;/ol&gt;&lt;/div&gt;</description>
      <link>https://www.humanlayer.dev/blog/long-context-isnt-the-answer</link>
      <guid isPermaLink="false">https://www.humanlayer.dev/blog/long-context-isnt-the-answer</guid>
      <pubDate>Sun, 22 Mar 2026 16:00:00 GMT</pubDate>
      <author>Kyle</author>
    </item>
    <item>
      <title>Getting Claude to Actually Read Your CLAUDE.md</title>
      <description>&lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;Claude Code wraps your CLAUDE.md in a &lt;code class=&quot;px-2 py-0.5 rounded font-mono bg-[var(--code-bg)] text-[var(--accent)] border border-[var(--code-border)] tracking-tight shadow-sm mr-1 align-middle wrap-anywhere box-decoration-clone&quot;&gt;&amp;lt;system_reminder&amp;gt;&lt;/code&gt; that explicitly tells the model the contents &quot;may or may not be relevant.&quot; The longer your file gets, the more Claude seems to treat individual sections as optional.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;We &lt;a href=&quot;https://www.humanlayer.dev/blog/writing-a-good-claude-md&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;wrote before&lt;/a&gt; about keeping these files concise and avoiding stuff that belongs in a linter. But even after trimming things down, I kept running into cases where Claude would ignore testing instructions while writing tests, or skip our API conventions when building endpoints.&lt;/p&gt;
        &lt;h2 class=&quot;text-3xl mb-3 mt-20 font-bold&quot; style=&quot;color:var(--text-primary)&quot;&gt;## &lt;!-- --&gt;Conditional &lt;code class=&quot;px-2 py-0.5 rounded font-mono bg-[var(--code-bg)] text-[var(--accent)] border border-[var(--code-border)] tracking-tight shadow-sm mr-1 align-middle wrap-anywhere box-decoration-clone&quot;&gt;&amp;lt;important if&amp;gt;&lt;/code&gt; blocks&lt;/h2&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;What&#39;s been working for me: wrapping sections in &lt;code class=&quot;px-2 py-0.5 rounded font-mono bg-[var(--code-bg)] text-[var(--accent)] border border-[var(--code-border)] tracking-tight shadow-sm mr-1 align-middle wrap-anywhere box-decoration-clone&quot;&gt;&amp;lt;important if=&quot;condition&quot;&amp;gt;&lt;/code&gt; tags.&lt;/p&gt;
        &lt;pre class=&quot;group&quot;&gt;&lt;pre class=&quot;prismjs rsh-code-block grid relative min-w-[250px] mb-6 bg-gray-400/5 p-4&quot;&gt;&lt;button data-slot=&quot;button&quot; class=&quot;inline-flex items-center justify-center gap-2 whitespace-nowrap rounded-none text-sm font-mono font-medium cursor-pointer disabled:cursor-not-allowed disabled:pointer-events-none disabled:opacity-50 [&amp;amp;_svg]:pointer-events-none [&amp;amp;_svg:not([class*=&#39;size-&#39;])]:size-4 shrink-0 [&amp;amp;_svg]:shrink-0 outline-none focus-visible:ring-[3px] uppercase tracking-wider border bg-transparent text-[var(--accent)] border-transparent hover:bg-[var(--accent)]/10 hover:border-[var(--accent)] focus-visible:border-[var(--accent)] focus-visible:ring-[var(--accent)]/50 size-9 absolute top-2 right-1 h-6 w-6 opacity-0 group-hover:opacity-100 transition-opacity duration-200 z-10&quot; aria-label=&quot;Copy code&quot; title=&quot;Copy code&quot;&gt;&lt;svg xmlns=&quot;http://www.w3.org/2000/svg&quot; width=&quot;24&quot; height=&quot;24&quot; viewBox=&quot;0 0 24 24&quot; fill=&quot;none&quot; stroke=&quot;currentColor&quot; stroke-width=&quot;2&quot; stroke-linecap=&quot;round&quot; stroke-linejoin=&quot;round&quot; class=&quot;lucide lucide-copy h-3 w-3&quot; aria-hidden=&quot;true&quot;&gt;&lt;rect width=&quot;14&quot; height=&quot;14&quot; x=&quot;8&quot; y=&quot;8&quot; rx=&quot;2&quot; ry=&quot;2&quot;&gt;&lt;/rect&gt;&lt;path d=&quot;M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/button&gt;&lt;code class=&quot;overflow-x-auto&quot; style=&quot;white-space:pre&quot;&gt;&lt;span class=&quot;&quot;&gt;&amp;lt;important if=&quot;you are writing or modifying tests&quot;&amp;gt;
        &lt;/span&gt;- Use `createTestApp()` helper for integration tests
        &lt;!-- --&gt;- Mock database with `dbMock` from `packages/db/test`
        &lt;!-- --&gt;- Test fixtures live in `__fixtures__/` directories
        &lt;!-- --&gt;&amp;lt;/important&amp;gt;&lt;/code&gt;&lt;/pre&gt;&lt;/pre&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;We don&#39;t have a rigorous explanation for why this helps. My guess is that the explicit condition gives Claude a clearer signal about when to apply the instructions, rather than leaving it to decide relevance on its own. Whatever the mechanism, we&#39;ve seen noticeably better adherence on tasks where only some sections of my CLAUDE.md should apply.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;A few things I&#39;ve learned:&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;&lt;strong style=&quot;color:var(--text-primary)&quot;&gt;Don&#39;t wrap everything.&lt;/strong&gt; Project identity, directory structure, and tech stack are relevant to basically every task. The conditional blocks are for testing setup or deployment procedures that only matter sometimes.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;&lt;strong style=&quot;color:var(--text-primary)&quot;&gt;Make conditions specific.&lt;/strong&gt; &lt;code class=&quot;px-2 py-0.5 rounded font-mono bg-[var(--code-bg)] text-[var(--accent)] border border-[var(--code-border)] tracking-tight shadow-sm mr-1 align-middle wrap-anywhere box-decoration-clone&quot;&gt;&amp;lt;important if=&quot;you are writing code&quot;&amp;gt;&lt;/code&gt; matches almost everything and defeats the purpose. I try to make each condition narrow enough that it only fires when I actually want those rules applied.&lt;/p&gt;
        &lt;h2 class=&quot;text-3xl mb-3 mt-20 font-bold&quot; style=&quot;color:var(--text-primary)&quot;&gt;## &lt;!-- --&gt;A skill to automate this&lt;/h2&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;I did a lot of restructuring CLAUDE.md files by hand, so I made a Claude Code skill that does the rewrite. It pulls out the foundational stuff, wraps domain-specific sections in conditional blocks, and cleans up some common problems (stale code snippets, style rules that should be in a linter, vague instructions that don&#39;t actually help).&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;It&#39;s not perfect—you&#39;ll probably want to review what it produces—but it&#39;s a decent starting point.&lt;/p&gt;
        &lt;h2 class=&quot;text-3xl mb-3 mt-20 font-bold&quot; style=&quot;color:var(--text-primary)&quot;&gt;## &lt;!-- --&gt;Try it&lt;/h2&gt;
        &lt;pre class=&quot;group&quot;&gt;&lt;pre class=&quot;prismjs rsh-code-block grid relative min-w-[250px] mb-6 bg-gray-400/5 p-4&quot;&gt;&lt;button data-slot=&quot;button&quot; class=&quot;inline-flex items-center justify-center gap-2 whitespace-nowrap rounded-none text-sm font-mono font-medium cursor-pointer disabled:cursor-not-allowed disabled:pointer-events-none disabled:opacity-50 [&amp;amp;_svg]:pointer-events-none [&amp;amp;_svg:not([class*=&#39;size-&#39;])]:size-4 shrink-0 [&amp;amp;_svg]:shrink-0 outline-none focus-visible:ring-[3px] uppercase tracking-wider border bg-transparent text-[var(--accent)] border-transparent hover:bg-[var(--accent)]/10 hover:border-[var(--accent)] focus-visible:border-[var(--accent)] focus-visible:ring-[var(--accent)]/50 size-9 absolute top-2 right-1 h-6 w-6 opacity-0 group-hover:opacity-100 transition-opacity duration-200 z-10&quot; aria-label=&quot;Copy code&quot; title=&quot;Copy code&quot;&gt;&lt;svg xmlns=&quot;http://www.w3.org/2000/svg&quot; width=&quot;24&quot; height=&quot;24&quot; viewBox=&quot;0 0 24 24&quot; fill=&quot;none&quot; stroke=&quot;currentColor&quot; stroke-width=&quot;2&quot; stroke-linecap=&quot;round&quot; stroke-linejoin=&quot;round&quot; class=&quot;lucide lucide-copy h-3 w-3&quot; aria-hidden=&quot;true&quot;&gt;&lt;rect width=&quot;14&quot; height=&quot;14&quot; x=&quot;8&quot; y=&quot;8&quot; rx=&quot;2&quot; ry=&quot;2&quot;&gt;&lt;/rect&gt;&lt;path d=&quot;M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/button&gt;&lt;code class=&quot;overflow-x-auto&quot; style=&quot;white-space:pre&quot;&gt;&lt;span class=&quot;&quot;&gt;npx skills add humanlayer/skills --skill improve-claude-md&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/pre&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;Then in your project:&lt;/p&gt;
        &lt;pre class=&quot;group&quot;&gt;&lt;pre class=&quot;prismjs rsh-code-block grid relative min-w-[250px] mb-6 bg-gray-400/5 p-4&quot;&gt;&lt;button data-slot=&quot;button&quot; class=&quot;inline-flex items-center justify-center gap-2 whitespace-nowrap rounded-none text-sm font-mono font-medium cursor-pointer disabled:cursor-not-allowed disabled:pointer-events-none disabled:opacity-50 [&amp;amp;_svg]:pointer-events-none [&amp;amp;_svg:not([class*=&#39;size-&#39;])]:size-4 shrink-0 [&amp;amp;_svg]:shrink-0 outline-none focus-visible:ring-[3px] uppercase tracking-wider border bg-transparent text-[var(--accent)] border-transparent hover:bg-[var(--accent)]/10 hover:border-[var(--accent)] focus-visible:border-[var(--accent)] focus-visible:ring-[var(--accent)]/50 size-9 absolute top-2 right-1 h-6 w-6 opacity-0 group-hover:opacity-100 transition-opacity duration-200 z-10&quot; aria-label=&quot;Copy code&quot; title=&quot;Copy code&quot;&gt;&lt;svg xmlns=&quot;http://www.w3.org/2000/svg&quot; width=&quot;24&quot; height=&quot;24&quot; viewBox=&quot;0 0 24 24&quot; fill=&quot;none&quot; stroke=&quot;currentColor&quot; stroke-width=&quot;2&quot; stroke-linecap=&quot;round&quot; stroke-linejoin=&quot;round&quot; class=&quot;lucide lucide-copy h-3 w-3&quot; aria-hidden=&quot;true&quot;&gt;&lt;rect width=&quot;14&quot; height=&quot;14&quot; x=&quot;8&quot; y=&quot;8&quot; rx=&quot;2&quot; ry=&quot;2&quot;&gt;&lt;/rect&gt;&lt;path d=&quot;M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/button&gt;&lt;code class=&quot;overflow-x-auto&quot; style=&quot;white-space:pre&quot;&gt;&lt;span class=&quot;&quot;&gt;/improve-claude-md&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/pre&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;It reads your existing CLAUDE.md and outputs a rewritten version. Source is at &lt;a href=&quot;https://github.com/humanlayer/skills/tree/main/plugins/improve-claude-md&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;github.com/humanlayer/skills&lt;/a&gt;.&lt;/p&gt;</description>
      <link>https://www.humanlayer.dev/blog/stop-claude-from-ignoring-your-claude-md</link>
      <guid isPermaLink="false">https://www.humanlayer.dev/blog/stop-claude-from-ignoring-your-claude-md</guid>
      <pubDate>Mon, 16 Mar 2026 16:00:00 GMT</pubDate>
      <author>Dex</author>
    </item>
    <item>
      <title>Skill Issue: Harness Engineering for Coding Agents</title>
      <description>&lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;We&#39;ve spent the past year watching coding agents fail in every conceivable way: ignoring instructions, executing dangerous commands un-prompted, and going in circles on the simplest of tasks.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;We&#39;ve seen teams ship immense amounts of slop. We&#39;ve even &lt;a href=&quot;https://www.youtube.com/live/99Kxkemj1g8?si=tVsK88M3XPyo0urD&amp;amp;t=20967&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;shipped a little bit of slop ourselves&lt;/a&gt;.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;Every time, the instinct was the same:&lt;/p&gt;
        &lt;div class=&quot;pl-8 mb-6&quot;&gt;&lt;ul class=&quot;list-disc space-y-4&quot; style=&quot;color:var(--text-secondary)&quot;&gt;
        &lt;li&gt;&quot;We just need better models, GPT-6 will fix it&quot;&lt;/li&gt;
        &lt;li&gt;&quot;We just need better instruction-following&quot;&lt;/li&gt;
        &lt;li&gt;&quot;It&#39;ll work once [niche library I&#39;m using] is in the training data&quot;&lt;/li&gt;
        &lt;/ul&gt;&lt;/div&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;But over the course of dozens of projects and hundreds of agent sessions, we kept arriving at the same conclusion: it&#39;s not a model problem. It&#39;s a &lt;strong style=&quot;color:var(--text-primary)&quot;&gt;configuration problem&lt;/strong&gt;.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;Yes, models will get smarter, and some existing failure modes will disappear. And then because they are smarter, we will give them new problems which are bigger and harder, and they will &lt;strong style=&quot;color:var(--text-primary)&quot;&gt;continue to fail in unexpected ways&lt;/strong&gt;. Unexpected failures modes are a fundamental problem for non-deterministic systems.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;&lt;img src=&quot;https://www.humanlayer.dev/blog/skill-issue/gpt-6.png&quot; alt=&quot;gpt-6&quot; class=&quot;max-w-full h-auto my-4 rounded-none&quot; style=&quot;border:1px solid var(--border-color)&quot; referrerpolicy=&quot;no-referrer&quot;&gt;&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;So instead of praying for &lt;code class=&quot;px-2 py-0.5 rounded font-mono bg-[var(--code-bg)] text-[var(--accent)] border border-[var(--code-border)] tracking-tight shadow-sm mr-1 align-middle wrap-anywhere box-decoration-clone&quot;&gt;gpt-6.4-codex-ultrahigh_extended&lt;/code&gt; to save us all, we try to focus on answering the question of &lt;em style=&quot;color:var(--text-secondary)&quot;&gt;&quot;how do we get the most out of &lt;strong style=&quot;color:var(--text-primary)&quot;&gt;today&#39;s&lt;/strong&gt; models?&quot;&lt;/em&gt;&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;There are lots of ways to get better performance out of your coding agent. If you use coding agents for moderately hard tasks, you&#39;ve probably configured your coding agent a bit. Have you used skills? MCP servers? Sub-agents? Memory? AGENTS.md files?&lt;/p&gt;
        &lt;pre class=&quot;group&quot;&gt;&lt;pre class=&quot;prismjs rsh-code-block grid relative min-w-[250px] mb-6 bg-gray-400/5 p-4&quot;&gt;&lt;button data-slot=&quot;button&quot; class=&quot;inline-flex items-center justify-center gap-2 whitespace-nowrap rounded-none text-sm font-mono font-medium cursor-pointer disabled:cursor-not-allowed disabled:pointer-events-none disabled:opacity-50 [&amp;amp;_svg]:pointer-events-none [&amp;amp;_svg:not([class*=&#39;size-&#39;])]:size-4 shrink-0 [&amp;amp;_svg]:shrink-0 outline-none focus-visible:ring-[3px] uppercase tracking-wider border bg-transparent text-[var(--accent)] border-transparent hover:bg-[var(--accent)]/10 hover:border-[var(--accent)] focus-visible:border-[var(--accent)] focus-visible:ring-[var(--accent)]/50 size-9 absolute top-2 right-1 h-6 w-6 opacity-0 group-hover:opacity-100 transition-opacity duration-200 z-10&quot; aria-label=&quot;Copy code&quot; title=&quot;Copy code&quot;&gt;&lt;svg xmlns=&quot;http://www.w3.org/2000/svg&quot; width=&quot;24&quot; height=&quot;24&quot; viewBox=&quot;0 0 24 24&quot; fill=&quot;none&quot; stroke=&quot;currentColor&quot; stroke-width=&quot;2&quot; stroke-linecap=&quot;round&quot; stroke-linejoin=&quot;round&quot; class=&quot;lucide lucide-copy h-3 w-3&quot; aria-hidden=&quot;true&quot;&gt;&lt;rect width=&quot;14&quot; height=&quot;14&quot; x=&quot;8&quot; y=&quot;8&quot; rx=&quot;2&quot; ry=&quot;2&quot;&gt;&lt;/rect&gt;&lt;path d=&quot;M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/button&gt;&lt;code class=&quot;overflow-x-auto&quot; style=&quot;white-space:pre&quot;&gt;&lt;span class=&quot;&quot;&gt;coding agent = AI model(s) + harness&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/pre&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;These are all technically separate concepts, but they are all part of the coding agent&#39;s configuration surface. We call this the &lt;strong style=&quot;color:var(--text-primary)&quot;&gt;coding agent&#39;s harness&lt;/strong&gt;, and we think of it as &lt;a href=&quot;https://www.vtrivedy.com/posts/claude-code-sdk-haas-harness-as-a-service&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;the agent’s runtime&lt;/a&gt;, or as its peripherals: what does the model use to interact with its environment?&lt;/p&gt;
        &lt;h2 class=&quot;text-3xl mb-3 mt-20 font-bold&quot; style=&quot;color:var(--text-primary)&quot;&gt;## &lt;!-- --&gt;Harness Engineering&lt;/h2&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;&lt;strong style=&quot;color:var(--text-primary)&quot;&gt;Harness engineering&lt;/strong&gt;, coined by &lt;a href=&quot;https://x.com/Vtrivedy10&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;Viv&lt;/a&gt;, describes the practice of leveraging these configuration points to customize and improve your coding agent&#39;s output quality and reliability.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;&lt;img src=&quot;https://www.humanlayer.dev/blog/skill-issue/harness-components.png&quot; alt=&quot;harness components&quot; class=&quot;max-w-full h-auto my-4 rounded-none&quot; style=&quot;border:1px solid var(--border-color)&quot; referrerpolicy=&quot;no-referrer&quot;&gt;
        &lt;em style=&quot;color:var(--text-secondary)&quot;&gt;Image from &lt;a href=&quot;https://blog.langchain.com/the-anatomy-of-an-agent-harness/&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;Viv&#39;s post&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;As &lt;a href=&quot;https://mitchellh.com/writing/my-ai-adoption-journey#step-5-engineer-the-harness&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;Mitchell Hashimoto put it&lt;/a&gt;, harness engineering&lt;/p&gt;
        &lt;blockquote class=&quot;border-l-4 pl-4 my-4 italic&quot; style=&quot;border-color:var(--accent);color:var(--text-muted)&quot;&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;[...] is the idea that anytime you find an agent makes a mistake, you take the time to engineer a solution such that the agent never makes that mistake again.&lt;/p&gt;
        &lt;/blockquote&gt;
        &lt;h2 class=&quot;text-3xl mb-3 mt-20 font-bold&quot; style=&quot;color:var(--text-primary)&quot;&gt;## &lt;!-- --&gt;... as a Subset of Context Engineering&lt;/h2&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;We view harness engineering as a subset of &lt;a href=&quot;https://www.humanlayer.dev/blog/advanced-context-engineering&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;context engineering&lt;/a&gt;. Coined by my cofounder &lt;a href=&quot;https://x.com/dexhorthy&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;Dex&lt;/a&gt; in &lt;a href=&quot;https://github.com/humanlayer/12-factor-agents&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;12-factor agents&lt;/a&gt;, context engineering is a superset of “prompt engineering” and a variety of other techniques for systematically improving AI agents’ reliability.&amp;nbsp;You can find the original talk on &lt;a href=&quot;https://www.youtube.com/watch?v=IS_y40zY-hc&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;Harness engineering then is the subset of context engineering which primarily involves leveraging harness configuration points to carefully manage the context windows of coding agents.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;&lt;img src=&quot;https://www.humanlayer.dev/blog/skill-issue/harness-engineering.png&quot; alt=&quot;harness engineering as context engineering&quot; class=&quot;max-w-full h-auto my-4 rounded-none&quot; style=&quot;border:1px solid var(--border-color)&quot; referrerpolicy=&quot;no-referrer&quot;&gt;&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;It answers:&lt;/p&gt;
        &lt;div class=&quot;pl-8 mb-6&quot;&gt;&lt;ul class=&quot;list-disc space-y-4&quot; style=&quot;color:var(--text-secondary)&quot;&gt;
        &lt;li&gt;How do we give our coding agent new capabilities?&lt;/li&gt;
        &lt;li&gt;How do we teach it things about our codebase that aren’t in the training data?&lt;/li&gt;
        &lt;li&gt;How do we add determinism beyond &lt;code class=&quot;px-2 py-0.5 rounded font-mono bg-[var(--code-bg)] text-[var(--accent)] border border-[var(--code-border)] tracking-tight shadow-sm mr-1 align-middle wrap-anywhere box-decoration-clone&quot;&gt;CRITICAL: always do XYZ&lt;/code&gt; in the system message?&lt;/li&gt;
        &lt;li&gt;How do we adapt the agent’s behavior for our specific codebase?&lt;/li&gt;
        &lt;li&gt;How do we increase task success rates beyond “magic prompts”?&lt;/li&gt;
        &lt;li&gt;How do we prevent our context window from inflating too rapidly, or with too much bad context?&lt;/li&gt;
        &lt;/ul&gt;&lt;/div&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;Skills, MCP servers, sub-agents, hooks, and back-pressure mechanisms are all tactical solutions we’ve arrived at.&lt;/p&gt;
        &lt;h3 class=&quot;text-2xl mb-2 mt-16 font-bold&quot; style=&quot;color:var(--text-primary)&quot;&gt;### &lt;!-- --&gt;Views on Harness Engineering&lt;/h3&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;Viv’s posts on harness engineering are worth reading alongside this one — &lt;a href=&quot;https://www.vtrivedy.com/posts/claude-code-sdk-haas-harness-as-a-service&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;the first&lt;/a&gt; frames the four customization levers (system prompt, tools/MCPs, context, sub-agents), and &lt;a href=&quot;https://blog.langchain.com/the-anatomy-of-an-agent-harness/&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;the second&lt;/a&gt; works backwards from what models &lt;em style=&quot;color:var(--text-secondary)&quot;&gt;can’t&lt;/em&gt; do natively to derive why each harness component exists.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;&lt;img src=&quot;https://www.humanlayer.dev/blog/skill-issue/backwards.png&quot; alt=&quot;working backwards from what models can&#39;t do natively&quot; class=&quot;max-w-full h-auto my-4 rounded-none&quot; style=&quot;border:1px solid var(--border-color)&quot; referrerpolicy=&quot;no-referrer&quot;&gt;
        &lt;em style=&quot;color:var(--text-secondary)&quot;&gt;Image from &lt;a href=&quot;https://blog.langchain.com/the-anatomy-of-an-agent-harness/&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;Viv&#39;s post&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;We’d add two levers he doesn’t emphasize:&lt;/p&gt;
        &lt;div class=&quot;pl-8 mb-6&quot;&gt;&lt;ol class=&quot;list-decimal space-y-4&quot; style=&quot;color:var(--text-secondary)&quot;&gt;
        &lt;li&gt;&lt;strong style=&quot;color:var(--text-primary)&quot;&gt;hooks&lt;/strong&gt; for automated integration and deterministic control flow&lt;/li&gt;
        &lt;li&gt;&lt;strong style=&quot;color:var(--text-primary)&quot;&gt;skills&lt;/strong&gt; for progressive disclosure of knowledge. (Dex likes to refer to them as &quot;Instruction Modules&quot; - more on this in another post.)&lt;/li&gt;
        &lt;/ol&gt;&lt;/div&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;After months of solving hard problems in complex brownfield enterprise-scale codebases, we have found that sub-agents are a particularly powerful lever. When working on hard problems that require many, many context windows to solve, &lt;strong style=&quot;color:var(--text-primary)&quot;&gt;sub-agents are the key to maintaining coherency across many sessions&lt;/strong&gt;. Sub-agents &lt;strong style=&quot;color:var(--text-primary)&quot;&gt;function as a &quot;context firewall&quot;&lt;/strong&gt; that ensures discrete tasks can run in isolated context windows so none of the intermediate noise accumulates in your parent thread which is responsible for orchestration, and you can maintain coherency for much, much longer.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;&lt;img src=&quot;https://www.humanlayer.dev/blog/skill-issue/context-firewall.png&quot; alt=&quot;context firewall&quot; class=&quot;max-w-full h-auto my-4 rounded-none&quot; style=&quot;border:1px solid var(--border-color)&quot; referrerpolicy=&quot;no-referrer&quot;&gt;&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;OpenAI recently wrote a &lt;a href=&quot;https://openai.com/index/harness-engineering/&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;blog post&lt;/a&gt; on the topic as well. There&#39;s some great content in there, and it seems to indicate that they view harness engineering as configuring everything &lt;em style=&quot;color:var(--text-secondary)&quot;&gt;outside&lt;/em&gt; of the agent&#39;s runtime. It&#39;s more focused on back-pressure and verification mechanisms. (Although this may be a mis-reading; the post is somewhat unclear: the word &quot;harness&quot; only appears once in the text of the post, and in reference to evals rather than harness engineering itself.)&lt;/p&gt;
        &lt;h3 class=&quot;text-2xl mb-2 mt-16 font-bold&quot; style=&quot;color:var(--text-primary)&quot;&gt;### &lt;!-- --&gt;But What About Post-Training?&lt;/h3&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;Given that frontier coding models are post-trained on their harnesses (e.g. Claude in Claude Code, GPT-5 Codex in Codex), some will argue that the best harness and/or configuration is the one that the model was trained on.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;For example, the Codex models are so tightly coupled with the Codex harness&#39;s &lt;code class=&quot;px-2 py-0.5 rounded font-mono bg-[var(--code-bg)] text-[var(--accent)] border border-[var(--code-border)] tracking-tight shadow-sm mr-1 align-middle wrap-anywhere box-decoration-clone&quot;&gt;apply_patch&lt;/code&gt; tool that &lt;a href=&quot;https://opencode.ai/&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;OpenCode&lt;/a&gt; — built as an open-source alternative to Claude Code — had to &lt;a href=&quot;https://github.com/anomalyco/opencode/pull/9127&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;add an &lt;code class=&quot;px-2 py-0.5 rounded font-mono bg-[var(--code-bg)] text-[var(--accent)] border border-[var(--code-border)] tracking-tight shadow-sm mr-1 align-middle wrap-anywhere box-decoration-clone&quot;&gt;apply_patch&lt;/code&gt; tool&lt;/a&gt; specifically for GPT/Codex models to mimic the Codex harness to improve the Codex models&#39; performance in the OpenCode harness - while Claude and other models still use normal &lt;code class=&quot;px-2 py-0.5 rounded font-mono bg-[var(--code-bg)] text-[var(--accent)] border border-[var(--code-border)] tracking-tight shadow-sm mr-1 align-middle wrap-anywhere box-decoration-clone&quot;&gt;edit&lt;/code&gt; and &lt;code class=&quot;px-2 py-0.5 rounded font-mono bg-[var(--code-bg)] text-[var(--accent)] border border-[var(--code-border)] tracking-tight shadow-sm mr-1 align-middle wrap-anywhere box-decoration-clone&quot;&gt;write&lt;/code&gt; tools.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;This &lt;em style=&quot;color:var(--text-secondary)&quot;&gt;can&lt;/em&gt; mean that a model will perform better when coupled with the harness it was post-trained on, and some might infer that this means you shouldn&#39;t customize the harness at all.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;But it cuts both ways: &lt;strong style=&quot;color:var(--text-primary)&quot;&gt;models can be over-fitted to their harness&lt;/strong&gt;. Viv cites &lt;a href=&quot;https://terminalbench.com/&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;Terminal Bench 2.0&lt;/a&gt; where Opus 4.6 in Claude Code comes in position #33, but when placed in a different harness that wasn&#39;t seen during post-training, it comes in at #5 (+/- about 4 positions in either direction).&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;&lt;img src=&quot;https://www.humanlayer.dev/blog/skill-issue/terminal-bench.png&quot; alt=&quot;terminal bench&quot; class=&quot;max-w-full h-auto my-4 rounded-none&quot; style=&quot;border:1px solid var(--border-color)&quot; referrerpolicy=&quot;no-referrer&quot;&gt;&lt;/p&gt;
        &lt;h2 class=&quot;text-3xl mb-3 mt-20 font-bold&quot; style=&quot;color:var(--text-primary)&quot;&gt;## &lt;!-- --&gt;Engineering Your Harness&lt;/h2&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;With that in mind, let&#39;s walk through the configuration surfaces we&#39;ve found most impactful.&lt;/p&gt;
        &lt;h3 class=&quot;text-2xl mb-2 mt-16 font-bold&quot; style=&quot;color:var(--text-primary)&quot;&gt;### &lt;!-- --&gt;CLAUDE.md &amp;amp; AGENTS.md&lt;/h3&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;Before touching any other harness configuration points, it&#39;s usually worth customizing your CLAUDE.md / AGENTS.md files. These are markdown files at the top-level of your repository that get deterministically injected into the agent&#39;s system prompt by the harness.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;We have already shared &lt;a href=&quot;https://www.humanlayer.dev/blog/writing-a-good-claude-md&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;some opinions about what makes a good CLAUDE.md&lt;/a&gt; file and how to use it correctly, so give that a read if you&#39;re not familiar with it. Matt Pocock also wrote a &lt;a href=&quot;https://www.aihero.dev/a-complete-guide-to-agents-md&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;great follow-up&lt;/a&gt; that more generally applies to AGENTS.md.&lt;/p&gt;
        &lt;h4 class=&quot;text-xl mb-1 mt-8 font-bold&quot; style=&quot;color:var(--text-primary)&quot;&gt;#### &lt;!-- --&gt;The ETH Zurich Study&lt;/h4&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;After ETH Zurich published their &lt;a href=&quot;https://arxiv.org/abs/2602.11988&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;study testing 138 agentfiles across various repos&lt;/a&gt; which indicated that most agentfiles were useless-or-worse, we got a lot of feedback on our &lt;a href=&quot;https://www.humanlayer.dev/blog/writing-a-good-claude-md&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;CLAUDE.md post&lt;/a&gt;:&lt;/p&gt;
        &lt;blockquote class=&quot;border-l-4 pl-4 my-4 italic&quot; style=&quot;border-color:var(--accent);color:var(--text-muted)&quot;&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;See, CLAUDE.md files don&#39;t even help — they&#39;re a waste of time.&lt;/p&gt;
        &lt;/blockquote&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;Indeed: the study tested many agentfiles across a wide variety of repos, and found:&lt;/p&gt;
        &lt;div class=&quot;pl-8 mb-6&quot;&gt;&lt;ul class=&quot;list-disc space-y-4&quot; style=&quot;color:var(--text-secondary)&quot;&gt;
        &lt;li&gt;that LLM-generated ones actually &lt;em style=&quot;color:var(--text-secondary)&quot;&gt;hurt&lt;/em&gt; performance while costing 20%+ more&lt;/li&gt;
        &lt;li&gt;human-written ones only helped about 4%.&lt;/li&gt;
        &lt;li&gt;Agents spent 14-22% more reasoning tokens processing context file instructions, took more steps to complete tasks, and ran more tools — all without improving resolution rates.&lt;/li&gt;
        &lt;li&gt;Codebase overviews and directory listings didn&#39;t help at all; agents discover repository structure on their own just fine.&lt;/li&gt;
        &lt;/ul&gt;&lt;/div&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;A careful reading of the study &lt;strong style=&quot;color:var(--text-primary)&quot;&gt;indicates that what we said in our post was correct:&lt;/strong&gt;&lt;/p&gt;
        &lt;div class=&quot;pl-8 mb-6&quot;&gt;&lt;ul class=&quot;list-disc space-y-4&quot; style=&quot;color:var(--text-secondary)&quot;&gt;
        &lt;li&gt;&lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;Agent-generated files were worse. Yes, we said:&lt;/p&gt;
        &lt;blockquote class=&quot;border-l-4 pl-4 my-4 italic&quot; style=&quot;border-color:var(--accent);color:var(--text-muted)&quot;&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;avoid auto-generating it. You should carefully craft its contents for best results.&lt;/p&gt;
        &lt;/blockquote&gt;
        &lt;/li&gt;
        &lt;li&gt;&lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;Lots of files too-heavily-steered the model to use specific tools, causing worse outcomes. Yep, we said:&lt;/p&gt;
        &lt;blockquote class=&quot;border-l-4 pl-4 my-4 italic&quot; style=&quot;border-color:var(--accent);color:var(--text-muted)&quot;&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;Less (instructions) is more. While you shouldn&#39;t omit necessary instructions, you should include as few instructions as reasonably possible in the file.&lt;/p&gt;
        &lt;/blockquote&gt;
        &lt;/li&gt;
        &lt;li&gt;&lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;Files contained irrelevant context. Yep, we said:&lt;/p&gt;
        &lt;blockquote class=&quot;border-l-4 pl-4 my-4 italic&quot; style=&quot;border-color:var(--accent);color:var(--text-muted)&quot;&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;Use Progressive Disclosure&lt;/p&gt;
        &lt;/blockquote&gt;
        &lt;/li&gt;
        &lt;li&gt;&lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;The human-written ones barely helped because of too many conditional rules. Yep, we said:&lt;/p&gt;
        &lt;blockquote class=&quot;border-l-4 pl-4 my-4 italic&quot; style=&quot;border-color:var(--accent);color:var(--text-muted)&quot;&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;Keep the contents of your CLAUDE.md concise and universally applicable.&lt;/p&gt;
        &lt;/blockquote&gt;
        &lt;/li&gt;
        &lt;/ul&gt;&lt;/div&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;Our CLAUDE.md is under 60 lines.&lt;/p&gt;
        &lt;h3 class=&quot;text-2xl mb-2 mt-16 font-bold&quot; style=&quot;color:var(--text-primary)&quot;&gt;### &lt;!-- --&gt;MCP Servers Are for Tools&lt;/h3&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;MCP servers are&amp;nbsp;primarily for plugging tools into your coding agent to extend its capabilities beyond file I/O and bash commands. The MCP specification includes additional features like resources, prompts, and elicitations, but &lt;strong style=&quot;color:var(--text-primary)&quot;&gt;these are generally not well-supported by MCP clients and coding agent harnesses&lt;/strong&gt;.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;The MCP spec supports servers that run on your (or the agent&#39;s) local machine which allow the agent to interact with its local environment, but it also supports HTTP-based MCP servers that can connect your agent with remote tools and services like Linear, Sentry, and more.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;When you plug an MCP server into your coding agent, the list of available tools, their descriptions, and the arguments needed to invoke them &lt;strong style=&quot;color:var(--text-primary)&quot;&gt;are injected into your coding agent&#39;s system prompt&lt;/strong&gt;. As a result, the MCP server can use the tool descriptions to customize your agent’s behavior by providing your agent with instructions about when to use them.&amp;nbsp;&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;&lt;strong style=&quot;color:var(--text-primary)&quot;&gt;WARNING&lt;/strong&gt;: because MCP servers’ tool descriptions are added to your coding agent’s system prompt, never connect to one you don’t trust. This can be a dangerous vector for prompt injection! STDIO servers and other servers that run client-side with npx or uvx can also execute code on your host in the absence of prompt injection.&lt;/p&gt;
        &lt;h4 class=&quot;text-xl mb-1 mt-8 font-bold&quot; style=&quot;color:var(--text-primary)&quot;&gt;#### &lt;!-- --&gt;Too Many Tools Is Bad&lt;/h4&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;We’ve seen this firsthand: plug too many MCP tools into your agent, and the context window fills up with tool descriptions, pushing you into &lt;a href=&quot;https://youtu.be/rmvDxxNubIg?si=O17nmS3SScaAkpp-&amp;amp;t=355&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;the dumb zone&lt;/a&gt; much faster:&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;&lt;img src=&quot;https://www.humanlayer.dev/blog/skill-issue/too-many-tools.png&quot; alt=&quot;too many tools&quot; class=&quot;max-w-full h-auto my-4 rounded-none&quot; style=&quot;border:1px solid var(--border-color)&quot; referrerpolicy=&quot;no-referrer&quot;&gt;&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;The &lt;a href=&quot;https://www.aihero.dev/a-complete-guide-to-agents-md#the-instruction-budget&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;instruction budget&lt;/a&gt; matters too — every irrelevant tool description is an instruction the agent has to process without any benefit.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;In fact, these failure modes are so common that Anthropic released &lt;a href=&quot;https://platform.claude.com/docs/en/agents-and-tools/tool-use/tool-search-tool&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;experimental support for MCP tool search&lt;/a&gt; to progressively disclose tools to Claude when the user has too many MCP tools connected. TL;DR: if you’re not actively using a server which provides a large number of tools, turn it off.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;We also found that if an MCP server duplicates functionality that’s already available as a CLI well-represented in training data, it works better to just prompt the agent to use the CLI. For things like GitHub, Docker, or most databases, your coding agent can just use the right CLIs and shell commands. The model has seen these tools enough during training that it already knows how to use them, and you gain the added benefit of composability with tools like &lt;code class=&quot;px-2 py-0.5 rounded font-mono bg-[var(--code-bg)] text-[var(--accent)] border border-[var(--code-border)] tracking-tight shadow-sm mr-1 align-middle wrap-anywhere box-decoration-clone&quot;&gt;grep&lt;/code&gt; and &lt;code class=&quot;px-2 py-0.5 rounded font-mono bg-[var(--code-bg)] text-[var(--accent)] border border-[var(--code-border)] tracking-tight shadow-sm mr-1 align-middle wrap-anywhere box-decoration-clone&quot;&gt;jq&lt;/code&gt; to enable additional context-efficiency.&lt;/p&gt;
        &lt;h4 class=&quot;text-xl mb-1 mt-8 font-bold&quot; style=&quot;color:var(--text-primary)&quot;&gt;#### &lt;!-- --&gt;Always Be Context-Engineering&lt;/h4&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;At HumanLayer, we used the Linear MCP server for a while before realizing that we really only used a small subset of the tools it provides - so we wrote a small CLI that wraps the Linear API and provides very context-efficient responses, and we included 6 example usages in our &lt;a href=&quot;https://www.humanlayer.dev/blog/writing-a-good-claude-md&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;CLAUDE.md&lt;/a&gt; file:&lt;/p&gt;
        &lt;pre class=&quot;group&quot;&gt;&lt;pre class=&quot;prismjs rsh-code-block grid relative min-w-[250px] mb-6 bg-gray-400/5 p-4&quot;&gt;&lt;button data-slot=&quot;button&quot; class=&quot;inline-flex items-center justify-center gap-2 whitespace-nowrap rounded-none text-sm font-mono font-medium cursor-pointer disabled:cursor-not-allowed disabled:pointer-events-none disabled:opacity-50 [&amp;amp;_svg]:pointer-events-none [&amp;amp;_svg:not([class*=&#39;size-&#39;])]:size-4 shrink-0 [&amp;amp;_svg]:shrink-0 outline-none focus-visible:ring-[3px] uppercase tracking-wider border bg-transparent text-[var(--accent)] border-transparent hover:bg-[var(--accent)]/10 hover:border-[var(--accent)] focus-visible:border-[var(--accent)] focus-visible:ring-[var(--accent)]/50 size-9 absolute top-2 right-1 h-6 w-6 opacity-0 group-

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 1, 2026

Auto Review

No clear rule violations found in the current diff.

Parse #-prefixed tags from the meta line and populate the category field.

via [HAPI](https://hapi.run)

Co-Authored-By: HAPI <noreply@hapi.run>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 1, 2026

Successfully generated as following:

http://localhost:1200/humanlayer/blog - Success ✔️
<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:atom="http://www.w3.org/2005/Atom" version="2.0">
  <channel>
    <title>HumanLayer Blog</title>
    <link>https://www.humanlayer.dev/blog</link>
    <atom:link href="http://localhost:1200/humanlayer/blog" rel="self" type="application/rss+xml"></atom:link>
    <description>HumanLayer Blog - Powered by RSSHub</description>
    <generator>RSSHub</generator>
    <webMaster>contact@rsshub.app (RSSHub)</webMaster>
    <language>en</language>
    <lastBuildDate>Wed, 01 Apr 2026 09:42:06 GMT</lastBuildDate>
    <ttl>5</ttl>
    <item>
      <title>Long-Context Isn&#39;t the Answer</title>
      <description>&lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;Anthropic just switched the default model in Claude Code to Opus 4.6 with a 1M context window. We tried it when it launched. But now we&#39;re switching back to Opus 4.5&lt;/p&gt;
        &lt;h2 class=&quot;text-3xl mb-3 mt-20 font-bold&quot; style=&quot;color:var(--text-primary)&quot;&gt;## &lt;!-- --&gt;How about some context?&lt;/h2&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;Unlike previous Claude Opus models which have a ~ 200k context window, Opus 4.6 has as 1M context window. This was very exciting for us!&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;We spend a lot of time working on hard problems in large enterprise codebases, and if the model can reliably hold more of the codebase in its head at a time, so to speak, then that creates new possibilities for the scale of problem that can be solved in a single context window.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;It would mean less compaction (whether &lt;a href=&quot;https://www.humanlayer.dev/blog/advanced-context-engineering&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;frequent intentional compaction&lt;/a&gt; or auto-compact), and less &quot;context pressure&quot; to solve the problem before your context window fills up.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;As Calvin French-Owen, who helped launch Codex last year, &lt;a href=&quot;https://youtu.be/qwmmWzPnhog?si=ReRWTavUKTItIGGr&amp;amp;t=918&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;puts it in an episode of YC&#39;s Lightcone podcast&lt;/a&gt;:&lt;/p&gt;
        &lt;blockquote class=&quot;border-l-4 pl-4 my-4 italic&quot; style=&quot;border-color:var(--accent);color:var(--text-muted)&quot;&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;...imagine you&#39;re a college student. You&#39;re taking an exam. In the first five minutes of that exam, you&#39;re like, &quot;Oh, I have all the time in the world. I&#39;ll do a great job. I&#39;ll think through each of these problems.&quot;&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;Let&#39;s say you have like five minutes left and you still have half the exam left. You&#39;re like, &quot;Oh man, I just got to do whatever I can.&quot; Like, that&#39;s the LLM with a context window&lt;/p&gt;
        &lt;/blockquote&gt;
        &lt;h2 class=&quot;text-3xl mb-3 mt-20 font-bold&quot; style=&quot;color:var(--text-primary)&quot;&gt;## &lt;!-- --&gt;More context, less instruction adherence&lt;/h2&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;While the context window is dramatically larger, we noticed over the course of a couple of weeks that instruction adherence was dramatically degraded, and not just at longer context lengths.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;Even well-within what we would consider the &lt;a href=&quot;https://youtu.be/rmvDxxNubIg?si=O17nmS3SScaAkpp-&amp;amp;t=355&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;smart zone&lt;/a&gt; of a 200k-context frontier model, the model was less precise. It would ignore design documents and other inputs when writing a plan file. It would make trivial mistakes, or misunderstand simple instructions - or worse, directly disobey them.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;At longer context lengths, the degradation was even steeper - like the user instructions were getting drowned out by the intermediate tool results and mass of accumulated context.&lt;/p&gt;
        &lt;h2 class=&quot;text-3xl mb-3 mt-20 font-bold&quot; style=&quot;color:var(--text-primary)&quot;&gt;## &lt;!-- --&gt;What determines instruction adherence?&lt;/h2&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;We&#39;ve written about the concept of &lt;a href=&quot;https://www.humanlayer.dev/blog/skill-issue-harness-engineering-for-coding-agents&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;the instruction budget&lt;/a&gt; before - a measurable property of LLMs which describes how many instructions they can follow reasonably well before instruction adherence drops off.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;It&#39;s different for each model, but it&#39;s a function primarily of the quality of the model&#39;s instruction tuning, and more importantly for our purposes, it is &lt;strong style=&quot;color:var(--text-primary)&quot;&gt;strongly correlated with the size of the model&lt;/strong&gt;.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;&lt;img src=&quot;https://www.humanlayer.dev/blog/writing-a-good-claude-md/instructionfollowing.png&quot; alt=&quot;Instruction following&quot; class=&quot;max-w-full h-auto my-4 rounded-none&quot; style=&quot;border:1px solid var(--border-color)&quot; referrerpolicy=&quot;no-referrer&quot;&gt;&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;You can clearly see that the larger models from the &lt;a href=&quot;https://arxiv.org/pdf/2507.11538&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;study&lt;/a&gt; can follow dramatically more instructions before adherence drops off.&lt;/p&gt;
        &lt;h2 class=&quot;text-3xl mb-3 mt-20 font-bold&quot; style=&quot;color:var(--text-primary)&quot;&gt;## &lt;!-- --&gt;Instruction adherence at long context lengths&lt;/h2&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;Why does this matter?&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;When a lab offers an extended-context version of a model, you&#39;re usually not getting a bigger model with more parameters and therefore a larger &quot;instruction budget&quot; to go with the larger context window - you&#39;re likely getting the same model with some clever math (e.g. &lt;a href=&quot;https://arxiv.org/pdf/2309.00071&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;YaRN&lt;/a&gt;) to extend the sequence length the model can attend to, and probably some more post-training to stabilize the model at the longer sequence lengths.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;This means that while the context window size increases, the instruction budget remains the same. You can fit more context and more instructions in the context window, but the model isn&#39;t actually better at attending to those instructions over the context length.&lt;/p&gt;
        &lt;h2 class=&quot;text-3xl mb-3 mt-20 font-bold&quot; style=&quot;color:var(--text-primary)&quot;&gt;## &lt;!-- --&gt;Instruction-following as needle in a haystack&lt;/h2&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;You can think of your context window as a haystack where all of your tool calls and documents and files are hay - every line in your CLAUDE.md file, every instruction in your tool descriptions, every tool result, every instruction in your system prompt, and every user message.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;The quality &amp;amp; correctness of the agent&#39;s next step depends on the LLM&#39;s ability to find a needle in there: namely, the instruction(s) in the context window that are most-relevant to the current state of the context window, which give it the information it needs to make the correct decision about its next action.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;Now imagine we increase the size of the haystack by 500% - but the size of the needle remains the same. Unless our ability to find the needle also increases by 500%, we will have a dramatically harder time finding it. The extra context isn&#39;t really helping us - it&#39;s just digging us deeper into the dumb zone.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;&lt;img src=&quot;https://www.humanlayer.dev/blog/skill-issue/long-context.jpg&quot; alt=&quot;long context&quot; class=&quot;max-w-full h-auto my-4 rounded-none&quot; style=&quot;border:1px solid var(--border-color)&quot; referrerpolicy=&quot;no-referrer&quot;&gt;&lt;/p&gt;
        &lt;h2 class=&quot;text-3xl mb-3 mt-20 font-bold&quot; style=&quot;color:var(--text-primary)&quot;&gt;## &lt;!-- --&gt;What Works Instead&lt;/h2&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;Instead of trying to stuff as much information in a context window as possible and having the LLM reason over it, we found that the context management techniques we had been using had to be used even &lt;em style=&quot;color:var(--text-secondary)&quot;&gt;more&lt;/em&gt; aggressively to keep Opus 4.6 1M on track.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;In my post on &lt;a href=&quot;https://www.humanlayer.dev/blog/skill-issue-harness-engineering-for-coding-agents&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;harness engineering&lt;/a&gt;, I wrote about this as the limit case for using sub-agents:&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;&lt;img src=&quot;https://www.humanlayer.dev/blog/skill-issue/limit-case.png&quot; alt=&quot;limit case&quot; class=&quot;max-w-full h-auto my-4 rounded-none&quot; style=&quot;border:1px solid var(--border-color)&quot; referrerpolicy=&quot;no-referrer&quot;&gt;&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;The day after I published it, I decided maybe it wasn&#39;t such a bad idea, and sat down and wrote this skill:&lt;/p&gt;
        &lt;pre class=&quot;group&quot;&gt;&lt;pre class=&quot;prismjs rsh-code-block grid relative min-w-[250px] mb-6 bg-gray-400/5 p-4&quot;&gt;&lt;button data-slot=&quot;button&quot; class=&quot;inline-flex items-center justify-center gap-2 whitespace-nowrap rounded-none text-sm font-mono font-medium cursor-pointer disabled:cursor-not-allowed disabled:pointer-events-none disabled:opacity-50 [&amp;amp;_svg]:pointer-events-none [&amp;amp;_svg:not([class*=&#39;size-&#39;])]:size-4 shrink-0 [&amp;amp;_svg]:shrink-0 outline-none focus-visible:ring-[3px] uppercase tracking-wider border bg-transparent text-[var(--accent)] border-transparent hover:bg-[var(--accent)]/10 hover:border-[var(--accent)] focus-visible:border-[var(--accent)] focus-visible:ring-[var(--accent)]/50 size-9 absolute top-2 right-1 h-6 w-6 opacity-0 group-hover:opacity-100 transition-opacity duration-200 z-10&quot; aria-label=&quot;Copy code&quot; title=&quot;Copy code&quot;&gt;&lt;svg xmlns=&quot;http://www.w3.org/2000/svg&quot; width=&quot;24&quot; height=&quot;24&quot; viewBox=&quot;0 0 24 24&quot; fill=&quot;none&quot; stroke=&quot;currentColor&quot; stroke-width=&quot;2&quot; stroke-linecap=&quot;round&quot; stroke-linejoin=&quot;round&quot; class=&quot;lucide lucide-copy h-3 w-3&quot; aria-hidden=&quot;true&quot;&gt;&lt;rect width=&quot;14&quot; height=&quot;14&quot; x=&quot;8&quot; y=&quot;8&quot; rx=&quot;2&quot; ry=&quot;2&quot;&gt;&lt;/rect&gt;&lt;path d=&quot;M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/button&gt;&lt;code class=&quot;overflow-x-auto&quot; style=&quot;white-space:pre&quot;&gt;&lt;span class=&quot;&quot;&gt;---
        &lt;/span&gt;name: subagent-orchestrator
        &lt;!-- --&gt;description: orchestrate sub-agents to accomplish complex long-horizon tasks without losing coherency by delegating to sub-agents
        &lt;!-- --&gt;---
        &lt;!-- --&gt;
        &lt;!-- --&gt;This skill provides you with **CRITICAL** instructions that will help you to maintain coherency in long-horizon context-heavy tasks.
        &lt;!-- --&gt;
        &lt;!-- --&gt;You have a large number of tools available to you. The most important one is the one that allows you to dispatch sub-agents: either `Agent` or `Task`.
        &lt;!-- --&gt;
        &lt;!-- --&gt;All non-trivial operations should be delegated to sub-agents. You should delegate research and codebase understanding tasks to codebase-analyzer, codebase-locator and pattern-locator sub-agents.
        &lt;!-- --&gt;
        &lt;!-- --&gt;You should delegate running bash commands (particularly ones that are likely to produce lots of output) such as investigating with the `aws` CLI, using the `gh` CLI, digging through logs to `Bash` sub-agents.
        &lt;!-- --&gt;
        &lt;!-- --&gt;You should use separate sub-agents for separate tasks, and you may launch them in parallel - but do not delegate multiple tasks that are likely to have significant overlap to separate sub-agents.
        &lt;!-- --&gt;
        &lt;!-- --&gt;IMPORTANT: if the user has already given you a task, you should proceed with that task using this approach.
        &lt;!-- --&gt;
        &lt;!-- --&gt;If you have not already been explicitly given a taks, you should ask the user what task they would like for you to work on - do not assume or begin working on a ticket automatically.&lt;/code&gt;&lt;/pre&gt;&lt;/pre&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;I found that using this skill in combination with our existing workflow skills helped me keep Opus coherent at longer context lengths. Why?
        As I wrote about in that post, sub-agents encapsulate context and ensure that only highly-relevant context (the prompt, and the focused sub-agent result) end up in the context window, avoiding &lt;a href=&quot;https://research.trychroma.com/context-rot&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;context rot&lt;/a&gt;:&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;&lt;img src=&quot;https://www.humanlayer.dev/blog/skill-issue/context-firewall.png&quot; alt=&quot;context firewall&quot; class=&quot;max-w-full h-auto my-4 rounded-none&quot; style=&quot;border:1px solid var(--border-color)&quot; referrerpolicy=&quot;no-referrer&quot;&gt;&lt;/p&gt;
        &lt;h2 class=&quot;text-3xl mb-3 mt-20 font-bold&quot; style=&quot;color:var(--text-primary)&quot;&gt;## &lt;!-- --&gt;How we&#39;re incorporating this at HumanLayer&lt;/h2&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;We have had a feature in the tool for a while that warns the user when they&#39;re context is getting high. We used to set this at around %40 of sonnets 168k token window (200k - 32k reserved for output). This came out to about 100k tokens.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;To help users maximize instruction adherence and intelligence on hard codebase problems, we&#39;ve updated our context warnings for long-context models to trigger at the 100k token mark instead of 40% of the usable context. For opus 1m this is only 10% of the context window.&lt;/p&gt;
        &lt;h2 class=&quot;text-3xl mb-3 mt-20 font-bold&quot; style=&quot;color:var(--text-primary)&quot;&gt;## &lt;!-- --&gt;TL;DR&lt;/h2&gt;
        &lt;div class=&quot;pl-8 mb-6&quot;&gt;&lt;ol class=&quot;list-decimal space-y-4&quot; style=&quot;color:var(--text-secondary)&quot;&gt;
        &lt;li&gt;Long-context models degrade at all context lengths, not just long ones.&lt;/li&gt;
        &lt;li&gt;More context isn&#39;t more capability - the instruction budget doesn&#39;t scale with the context window.&lt;/li&gt;
        &lt;li&gt;Context isolation beats context expansion. Sub-agents, progressive disclosure, and &lt;a href=&quot;https://www.humanlayer.dev/blog/context-efficient-backpressure&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;context-efficient backpressure&lt;/a&gt; keep each context window small, focused, and in the smart zone.&lt;/li&gt;
        &lt;/ol&gt;&lt;/div&gt;</description>
      <link>https://www.humanlayer.dev/blog/long-context-isnt-the-answer</link>
      <guid isPermaLink="false">https://www.humanlayer.dev/blog/long-context-isnt-the-answer</guid>
      <pubDate>Sun, 22 Mar 2026 16:00:00 GMT</pubDate>
      <author>Kyle</author>
      <category>agents</category>
      <category>claudecode</category>
      <category>best</category>
      <category>context</category>
    </item>
    <item>
      <title>Getting Claude to Actually Read Your CLAUDE.md</title>
      <description>&lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;Claude Code wraps your CLAUDE.md in a &lt;code class=&quot;px-2 py-0.5 rounded font-mono bg-[var(--code-bg)] text-[var(--accent)] border border-[var(--code-border)] tracking-tight shadow-sm mr-1 align-middle wrap-anywhere box-decoration-clone&quot;&gt;&amp;lt;system_reminder&amp;gt;&lt;/code&gt; that explicitly tells the model the contents &quot;may or may not be relevant.&quot; The longer your file gets, the more Claude seems to treat individual sections as optional.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;We &lt;a href=&quot;https://www.humanlayer.dev/blog/writing-a-good-claude-md&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;wrote before&lt;/a&gt; about keeping these files concise and avoiding stuff that belongs in a linter. But even after trimming things down, I kept running into cases where Claude would ignore testing instructions while writing tests, or skip our API conventions when building endpoints.&lt;/p&gt;
        &lt;h2 class=&quot;text-3xl mb-3 mt-20 font-bold&quot; style=&quot;color:var(--text-primary)&quot;&gt;## &lt;!-- --&gt;Conditional &lt;code class=&quot;px-2 py-0.5 rounded font-mono bg-[var(--code-bg)] text-[var(--accent)] border border-[var(--code-border)] tracking-tight shadow-sm mr-1 align-middle wrap-anywhere box-decoration-clone&quot;&gt;&amp;lt;important if&amp;gt;&lt;/code&gt; blocks&lt;/h2&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;What&#39;s been working for me: wrapping sections in &lt;code class=&quot;px-2 py-0.5 rounded font-mono bg-[var(--code-bg)] text-[var(--accent)] border border-[var(--code-border)] tracking-tight shadow-sm mr-1 align-middle wrap-anywhere box-decoration-clone&quot;&gt;&amp;lt;important if=&quot;condition&quot;&amp;gt;&lt;/code&gt; tags.&lt;/p&gt;
        &lt;pre class=&quot;group&quot;&gt;&lt;pre class=&quot;prismjs rsh-code-block grid relative min-w-[250px] mb-6 bg-gray-400/5 p-4&quot;&gt;&lt;button data-slot=&quot;button&quot; class=&quot;inline-flex items-center justify-center gap-2 whitespace-nowrap rounded-none text-sm font-mono font-medium cursor-pointer disabled:cursor-not-allowed disabled:pointer-events-none disabled:opacity-50 [&amp;amp;_svg]:pointer-events-none [&amp;amp;_svg:not([class*=&#39;size-&#39;])]:size-4 shrink-0 [&amp;amp;_svg]:shrink-0 outline-none focus-visible:ring-[3px] uppercase tracking-wider border bg-transparent text-[var(--accent)] border-transparent hover:bg-[var(--accent)]/10 hover:border-[var(--accent)] focus-visible:border-[var(--accent)] focus-visible:ring-[var(--accent)]/50 size-9 absolute top-2 right-1 h-6 w-6 opacity-0 group-hover:opacity-100 transition-opacity duration-200 z-10&quot; aria-label=&quot;Copy code&quot; title=&quot;Copy code&quot;&gt;&lt;svg xmlns=&quot;http://www.w3.org/2000/svg&quot; width=&quot;24&quot; height=&quot;24&quot; viewBox=&quot;0 0 24 24&quot; fill=&quot;none&quot; stroke=&quot;currentColor&quot; stroke-width=&quot;2&quot; stroke-linecap=&quot;round&quot; stroke-linejoin=&quot;round&quot; class=&quot;lucide lucide-copy h-3 w-3&quot; aria-hidden=&quot;true&quot;&gt;&lt;rect width=&quot;14&quot; height=&quot;14&quot; x=&quot;8&quot; y=&quot;8&quot; rx=&quot;2&quot; ry=&quot;2&quot;&gt;&lt;/rect&gt;&lt;path d=&quot;M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/button&gt;&lt;code class=&quot;overflow-x-auto&quot; style=&quot;white-space:pre&quot;&gt;&lt;span class=&quot;&quot;&gt;&amp;lt;important if=&quot;you are writing or modifying tests&quot;&amp;gt;
        &lt;/span&gt;- Use `createTestApp()` helper for integration tests
        &lt;!-- --&gt;- Mock database with `dbMock` from `packages/db/test`
        &lt;!-- --&gt;- Test fixtures live in `__fixtures__/` directories
        &lt;!-- --&gt;&amp;lt;/important&amp;gt;&lt;/code&gt;&lt;/pre&gt;&lt;/pre&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;We don&#39;t have a rigorous explanation for why this helps. My guess is that the explicit condition gives Claude a clearer signal about when to apply the instructions, rather than leaving it to decide relevance on its own. Whatever the mechanism, we&#39;ve seen noticeably better adherence on tasks where only some sections of my CLAUDE.md should apply.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;A few things I&#39;ve learned:&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;&lt;strong style=&quot;color:var(--text-primary)&quot;&gt;Don&#39;t wrap everything.&lt;/strong&gt; Project identity, directory structure, and tech stack are relevant to basically every task. The conditional blocks are for testing setup or deployment procedures that only matter sometimes.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;&lt;strong style=&quot;color:var(--text-primary)&quot;&gt;Make conditions specific.&lt;/strong&gt; &lt;code class=&quot;px-2 py-0.5 rounded font-mono bg-[var(--code-bg)] text-[var(--accent)] border border-[var(--code-border)] tracking-tight shadow-sm mr-1 align-middle wrap-anywhere box-decoration-clone&quot;&gt;&amp;lt;important if=&quot;you are writing code&quot;&amp;gt;&lt;/code&gt; matches almost everything and defeats the purpose. I try to make each condition narrow enough that it only fires when I actually want those rules applied.&lt;/p&gt;
        &lt;h2 class=&quot;text-3xl mb-3 mt-20 font-bold&quot; style=&quot;color:var(--text-primary)&quot;&gt;## &lt;!-- --&gt;A skill to automate this&lt;/h2&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;I did a lot of restructuring CLAUDE.md files by hand, so I made a Claude Code skill that does the rewrite. It pulls out the foundational stuff, wraps domain-specific sections in conditional blocks, and cleans up some common problems (stale code snippets, style rules that should be in a linter, vague instructions that don&#39;t actually help).&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;It&#39;s not perfect—you&#39;ll probably want to review what it produces—but it&#39;s a decent starting point.&lt;/p&gt;
        &lt;h2 class=&quot;text-3xl mb-3 mt-20 font-bold&quot; style=&quot;color:var(--text-primary)&quot;&gt;## &lt;!-- --&gt;Try it&lt;/h2&gt;
        &lt;pre class=&quot;group&quot;&gt;&lt;pre class=&quot;prismjs rsh-code-block grid relative min-w-[250px] mb-6 bg-gray-400/5 p-4&quot;&gt;&lt;button data-slot=&quot;button&quot; class=&quot;inline-flex items-center justify-center gap-2 whitespace-nowrap rounded-none text-sm font-mono font-medium cursor-pointer disabled:cursor-not-allowed disabled:pointer-events-none disabled:opacity-50 [&amp;amp;_svg]:pointer-events-none [&amp;amp;_svg:not([class*=&#39;size-&#39;])]:size-4 shrink-0 [&amp;amp;_svg]:shrink-0 outline-none focus-visible:ring-[3px] uppercase tracking-wider border bg-transparent text-[var(--accent)] border-transparent hover:bg-[var(--accent)]/10 hover:border-[var(--accent)] focus-visible:border-[var(--accent)] focus-visible:ring-[var(--accent)]/50 size-9 absolute top-2 right-1 h-6 w-6 opacity-0 group-hover:opacity-100 transition-opacity duration-200 z-10&quot; aria-label=&quot;Copy code&quot; title=&quot;Copy code&quot;&gt;&lt;svg xmlns=&quot;http://www.w3.org/2000/svg&quot; width=&quot;24&quot; height=&quot;24&quot; viewBox=&quot;0 0 24 24&quot; fill=&quot;none&quot; stroke=&quot;currentColor&quot; stroke-width=&quot;2&quot; stroke-linecap=&quot;round&quot; stroke-linejoin=&quot;round&quot; class=&quot;lucide lucide-copy h-3 w-3&quot; aria-hidden=&quot;true&quot;&gt;&lt;rect width=&quot;14&quot; height=&quot;14&quot; x=&quot;8&quot; y=&quot;8&quot; rx=&quot;2&quot; ry=&quot;2&quot;&gt;&lt;/rect&gt;&lt;path d=&quot;M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/button&gt;&lt;code class=&quot;overflow-x-auto&quot; style=&quot;white-space:pre&quot;&gt;&lt;span class=&quot;&quot;&gt;npx skills add humanlayer/skills --skill improve-claude-md&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/pre&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;Then in your project:&lt;/p&gt;
        &lt;pre class=&quot;group&quot;&gt;&lt;pre class=&quot;prismjs rsh-code-block grid relative min-w-[250px] mb-6 bg-gray-400/5 p-4&quot;&gt;&lt;button data-slot=&quot;button&quot; class=&quot;inline-flex items-center justify-center gap-2 whitespace-nowrap rounded-none text-sm font-mono font-medium cursor-pointer disabled:cursor-not-allowed disabled:pointer-events-none disabled:opacity-50 [&amp;amp;_svg]:pointer-events-none [&amp;amp;_svg:not([class*=&#39;size-&#39;])]:size-4 shrink-0 [&amp;amp;_svg]:shrink-0 outline-none focus-visible:ring-[3px] uppercase tracking-wider border bg-transparent text-[var(--accent)] border-transparent hover:bg-[var(--accent)]/10 hover:border-[var(--accent)] focus-visible:border-[var(--accent)] focus-visible:ring-[var(--accent)]/50 size-9 absolute top-2 right-1 h-6 w-6 opacity-0 group-hover:opacity-100 transition-opacity duration-200 z-10&quot; aria-label=&quot;Copy code&quot; title=&quot;Copy code&quot;&gt;&lt;svg xmlns=&quot;http://www.w3.org/2000/svg&quot; width=&quot;24&quot; height=&quot;24&quot; viewBox=&quot;0 0 24 24&quot; fill=&quot;none&quot; stroke=&quot;currentColor&quot; stroke-width=&quot;2&quot; stroke-linecap=&quot;round&quot; stroke-linejoin=&quot;round&quot; class=&quot;lucide lucide-copy h-3 w-3&quot; aria-hidden=&quot;true&quot;&gt;&lt;rect width=&quot;14&quot; height=&quot;14&quot; x=&quot;8&quot; y=&quot;8&quot; rx=&quot;2&quot; ry=&quot;2&quot;&gt;&lt;/rect&gt;&lt;path d=&quot;M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/button&gt;&lt;code class=&quot;overflow-x-auto&quot; style=&quot;white-space:pre&quot;&gt;&lt;span class=&quot;&quot;&gt;/improve-claude-md&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/pre&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;It reads your existing CLAUDE.md and outputs a rewritten version. Source is at &lt;a href=&quot;https://github.com/humanlayer/skills/tree/main/plugins/improve-claude-md&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;github.com/humanlayer/skills&lt;/a&gt;.&lt;/p&gt;</description>
      <link>https://www.humanlayer.dev/blog/stop-claude-from-ignoring-your-claude-md</link>
      <guid isPermaLink="false">https://www.humanlayer.dev/blog/stop-claude-from-ignoring-your-claude-md</guid>
      <pubDate>Mon, 16 Mar 2026 16:00:00 GMT</pubDate>
      <author>Dex</author>
      <category>agents</category>
      <category>claudecode</category>
      <category>skills</category>
    </item>
    <item>
      <title>Skill Issue: Harness Engineering for Coding Agents</title>
      <description>&lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;We&#39;ve spent the past year watching coding agents fail in every conceivable way: ignoring instructions, executing dangerous commands un-prompted, and going in circles on the simplest of tasks.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;We&#39;ve seen teams ship immense amounts of slop. We&#39;ve even &lt;a href=&quot;https://www.youtube.com/live/99Kxkemj1g8?si=tVsK88M3XPyo0urD&amp;amp;t=20967&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;shipped a little bit of slop ourselves&lt;/a&gt;.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;Every time, the instinct was the same:&lt;/p&gt;
        &lt;div class=&quot;pl-8 mb-6&quot;&gt;&lt;ul class=&quot;list-disc space-y-4&quot; style=&quot;color:var(--text-secondary)&quot;&gt;
        &lt;li&gt;&quot;We just need better models, GPT-6 will fix it&quot;&lt;/li&gt;
        &lt;li&gt;&quot;We just need better instruction-following&quot;&lt;/li&gt;
        &lt;li&gt;&quot;It&#39;ll work once [niche library I&#39;m using] is in the training data&quot;&lt;/li&gt;
        &lt;/ul&gt;&lt;/div&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;But over the course of dozens of projects and hundreds of agent sessions, we kept arriving at the same conclusion: it&#39;s not a model problem. It&#39;s a &lt;strong style=&quot;color:var(--text-primary)&quot;&gt;configuration problem&lt;/strong&gt;.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;Yes, models will get smarter, and some existing failure modes will disappear. And then because they are smarter, we will give them new problems which are bigger and harder, and they will &lt;strong style=&quot;color:var(--text-primary)&quot;&gt;continue to fail in unexpected ways&lt;/strong&gt;. Unexpected failures modes are a fundamental problem for non-deterministic systems.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;&lt;img src=&quot;https://www.humanlayer.dev/blog/skill-issue/gpt-6.png&quot; alt=&quot;gpt-6&quot; class=&quot;max-w-full h-auto my-4 rounded-none&quot; style=&quot;border:1px solid var(--border-color)&quot; referrerpolicy=&quot;no-referrer&quot;&gt;&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;So instead of praying for &lt;code class=&quot;px-2 py-0.5 rounded font-mono bg-[var(--code-bg)] text-[var(--accent)] border border-[var(--code-border)] tracking-tight shadow-sm mr-1 align-middle wrap-anywhere box-decoration-clone&quot;&gt;gpt-6.4-codex-ultrahigh_extended&lt;/code&gt; to save us all, we try to focus on answering the question of &lt;em style=&quot;color:var(--text-secondary)&quot;&gt;&quot;how do we get the most out of &lt;strong style=&quot;color:var(--text-primary)&quot;&gt;today&#39;s&lt;/strong&gt; models?&quot;&lt;/em&gt;&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;There are lots of ways to get better performance out of your coding agent. If you use coding agents for moderately hard tasks, you&#39;ve probably configured your coding agent a bit. Have you used skills? MCP servers? Sub-agents? Memory? AGENTS.md files?&lt;/p&gt;
        &lt;pre class=&quot;group&quot;&gt;&lt;pre class=&quot;prismjs rsh-code-block grid relative min-w-[250px] mb-6 bg-gray-400/5 p-4&quot;&gt;&lt;button data-slot=&quot;button&quot; class=&quot;inline-flex items-center justify-center gap-2 whitespace-nowrap rounded-none text-sm font-mono font-medium cursor-pointer disabled:cursor-not-allowed disabled:pointer-events-none disabled:opacity-50 [&amp;amp;_svg]:pointer-events-none [&amp;amp;_svg:not([class*=&#39;size-&#39;])]:size-4 shrink-0 [&amp;amp;_svg]:shrink-0 outline-none focus-visible:ring-[3px] uppercase tracking-wider border bg-transparent text-[var(--accent)] border-transparent hover:bg-[var(--accent)]/10 hover:border-[var(--accent)] focus-visible:border-[var(--accent)] focus-visible:ring-[var(--accent)]/50 size-9 absolute top-2 right-1 h-6 w-6 opacity-0 group-hover:opacity-100 transition-opacity duration-200 z-10&quot; aria-label=&quot;Copy code&quot; title=&quot;Copy code&quot;&gt;&lt;svg xmlns=&quot;http://www.w3.org/2000/svg&quot; width=&quot;24&quot; height=&quot;24&quot; viewBox=&quot;0 0 24 24&quot; fill=&quot;none&quot; stroke=&quot;currentColor&quot; stroke-width=&quot;2&quot; stroke-linecap=&quot;round&quot; stroke-linejoin=&quot;round&quot; class=&quot;lucide lucide-copy h-3 w-3&quot; aria-hidden=&quot;true&quot;&gt;&lt;rect width=&quot;14&quot; height=&quot;14&quot; x=&quot;8&quot; y=&quot;8&quot; rx=&quot;2&quot; ry=&quot;2&quot;&gt;&lt;/rect&gt;&lt;path d=&quot;M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/button&gt;&lt;code class=&quot;overflow-x-auto&quot; style=&quot;white-space:pre&quot;&gt;&lt;span class=&quot;&quot;&gt;coding agent = AI model(s) + harness&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/pre&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;These are all technically separate concepts, but they are all part of the coding agent&#39;s configuration surface. We call this the &lt;strong style=&quot;color:var(--text-primary)&quot;&gt;coding agent&#39;s harness&lt;/strong&gt;, and we think of it as &lt;a href=&quot;https://www.vtrivedy.com/posts/claude-code-sdk-haas-harness-as-a-service&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;the agent’s runtime&lt;/a&gt;, or as its peripherals: what does the model use to interact with its environment?&lt;/p&gt;
        &lt;h2 class=&quot;text-3xl mb-3 mt-20 font-bold&quot; style=&quot;color:var(--text-primary)&quot;&gt;## &lt;!-- --&gt;Harness Engineering&lt;/h2&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;&lt;strong style=&quot;color:var(--text-primary)&quot;&gt;Harness engineering&lt;/strong&gt;, coined by &lt;a href=&quot;https://x.com/Vtrivedy10&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;Viv&lt;/a&gt;, describes the practice of leveraging these configuration points to customize and improve your coding agent&#39;s output quality and reliability.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;&lt;img src=&quot;https://www.humanlayer.dev/blog/skill-issue/harness-components.png&quot; alt=&quot;harness components&quot; class=&quot;max-w-full h-auto my-4 rounded-none&quot; style=&quot;border:1px solid var(--border-color)&quot; referrerpolicy=&quot;no-referrer&quot;&gt;
        &lt;em style=&quot;color:var(--text-secondary)&quot;&gt;Image from &lt;a href=&quot;https://blog.langchain.com/the-anatomy-of-an-agent-harness/&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;Viv&#39;s post&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;As &lt;a href=&quot;https://mitchellh.com/writing/my-ai-adoption-journey#step-5-engineer-the-harness&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;Mitchell Hashimoto put it&lt;/a&gt;, harness engineering&lt;/p&gt;
        &lt;blockquote class=&quot;border-l-4 pl-4 my-4 italic&quot; style=&quot;border-color:var(--accent);color:var(--text-muted)&quot;&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;[...] is the idea that anytime you find an agent makes a mistake, you take the time to engineer a solution such that the agent never makes that mistake again.&lt;/p&gt;
        &lt;/blockquote&gt;
        &lt;h2 class=&quot;text-3xl mb-3 mt-20 font-bold&quot; style=&quot;color:var(--text-primary)&quot;&gt;## &lt;!-- --&gt;... as a Subset of Context Engineering&lt;/h2&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;We view harness engineering as a subset of &lt;a href=&quot;https://www.humanlayer.dev/blog/advanced-context-engineering&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;context engineering&lt;/a&gt;. Coined by my cofounder &lt;a href=&quot;https://x.com/dexhorthy&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;Dex&lt;/a&gt; in &lt;a href=&quot;https://github.com/humanlayer/12-factor-agents&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;12-factor agents&lt;/a&gt;, context engineering is a superset of “prompt engineering” and a variety of other techniques for systematically improving AI agents’ reliability.&amp;nbsp;You can find the original talk on &lt;a href=&quot;https://www.youtube.com/watch?v=IS_y40zY-hc&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;Harness engineering then is the subset of context engineering which primarily involves leveraging harness configuration points to carefully manage the context windows of coding agents.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;&lt;img src=&quot;https://www.humanlayer.dev/blog/skill-issue/harness-engineering.png&quot; alt=&quot;harness engineering as context engineering&quot; class=&quot;max-w-full h-auto my-4 rounded-none&quot; style=&quot;border:1px solid var(--border-color)&quot; referrerpolicy=&quot;no-referrer&quot;&gt;&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;It answers:&lt;/p&gt;
        &lt;div class=&quot;pl-8 mb-6&quot;&gt;&lt;ul class=&quot;list-disc space-y-4&quot; style=&quot;color:var(--text-secondary)&quot;&gt;
        &lt;li&gt;How do we give our coding agent new capabilities?&lt;/li&gt;
        &lt;li&gt;How do we teach it things about our codebase that aren’t in the training data?&lt;/li&gt;
        &lt;li&gt;How do we add determinism beyond &lt;code class=&quot;px-2 py-0.5 rounded font-mono bg-[var(--code-bg)] text-[var(--accent)] border border-[var(--code-border)] tracking-tight shadow-sm mr-1 align-middle wrap-anywhere box-decoration-clone&quot;&gt;CRITICAL: always do XYZ&lt;/code&gt; in the system message?&lt;/li&gt;
        &lt;li&gt;How do we adapt the agent’s behavior for our specific codebase?&lt;/li&gt;
        &lt;li&gt;How do we increase task success rates beyond “magic prompts”?&lt;/li&gt;
        &lt;li&gt;How do we prevent our context window from inflating too rapidly, or with too much bad context?&lt;/li&gt;
        &lt;/ul&gt;&lt;/div&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;Skills, MCP servers, sub-agents, hooks, and back-pressure mechanisms are all tactical solutions we’ve arrived at.&lt;/p&gt;
        &lt;h3 class=&quot;text-2xl mb-2 mt-16 font-bold&quot; style=&quot;color:var(--text-primary)&quot;&gt;### &lt;!-- --&gt;Views on Harness Engineering&lt;/h3&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;Viv’s posts on harness engineering are worth reading alongside this one — &lt;a href=&quot;https://www.vtrivedy.com/posts/claude-code-sdk-haas-harness-as-a-service&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;the first&lt;/a&gt; frames the four customization levers (system prompt, tools/MCPs, context, sub-agents), and &lt;a href=&quot;https://blog.langchain.com/the-anatomy-of-an-agent-harness/&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;the second&lt;/a&gt; works backwards from what models &lt;em style=&quot;color:var(--text-secondary)&quot;&gt;can’t&lt;/em&gt; do natively to derive why each harness component exists.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;&lt;img src=&quot;https://www.humanlayer.dev/blog/skill-issue/backwards.png&quot; alt=&quot;working backwards from what models can&#39;t do natively&quot; class=&quot;max-w-full h-auto my-4 rounded-none&quot; style=&quot;border:1px solid var(--border-color)&quot; referrerpolicy=&quot;no-referrer&quot;&gt;
        &lt;em style=&quot;color:var(--text-secondary)&quot;&gt;Image from &lt;a href=&quot;https://blog.langchain.com/the-anatomy-of-an-agent-harness/&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;Viv&#39;s post&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;We’d add two levers he doesn’t emphasize:&lt;/p&gt;
        &lt;div class=&quot;pl-8 mb-6&quot;&gt;&lt;ol class=&quot;list-decimal space-y-4&quot; style=&quot;color:var(--text-secondary)&quot;&gt;
        &lt;li&gt;&lt;strong style=&quot;color:var(--text-primary)&quot;&gt;hooks&lt;/strong&gt; for automated integration and deterministic control flow&lt;/li&gt;
        &lt;li&gt;&lt;strong style=&quot;color:var(--text-primary)&quot;&gt;skills&lt;/strong&gt; for progressive disclosure of knowledge. (Dex likes to refer to them as &quot;Instruction Modules&quot; - more on this in another post.)&lt;/li&gt;
        &lt;/ol&gt;&lt;/div&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;After months of solving hard problems in complex brownfield enterprise-scale codebases, we have found that sub-agents are a particularly powerful lever. When working on hard problems that require many, many context windows to solve, &lt;strong style=&quot;color:var(--text-primary)&quot;&gt;sub-agents are the key to maintaining coherency across many sessions&lt;/strong&gt;. Sub-agents &lt;strong style=&quot;color:var(--text-primary)&quot;&gt;function as a &quot;context firewall&quot;&lt;/strong&gt; that ensures discrete tasks can run in isolated context windows so none of the intermediate noise accumulates in your parent thread which is responsible for orchestration, and you can maintain coherency for much, much longer.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;&lt;img src=&quot;https://www.humanlayer.dev/blog/skill-issue/context-firewall.png&quot; alt=&quot;context firewall&quot; class=&quot;max-w-full h-auto my-4 rounded-none&quot; style=&quot;border:1px solid var(--border-color)&quot; referrerpolicy=&quot;no-referrer&quot;&gt;&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;OpenAI recently wrote a &lt;a href=&quot;https://openai.com/index/harness-engineering/&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;blog post&lt;/a&gt; on the topic as well. There&#39;s some great content in there, and it seems to indicate that they view harness engineering as configuring everything &lt;em style=&quot;color:var(--text-secondary)&quot;&gt;outside&lt;/em&gt; of the agent&#39;s runtime. It&#39;s more focused on back-pressure and verification mechanisms. (Although this may be a mis-reading; the post is somewhat unclear: the word &quot;harness&quot; only appears once in the text of the post, and in reference to evals rather than harness engineering itself.)&lt;/p&gt;
        &lt;h3 class=&quot;text-2xl mb-2 mt-16 font-bold&quot; style=&quot;color:var(--text-primary)&quot;&gt;### &lt;!-- --&gt;But What About Post-Training?&lt;/h3&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;Given that frontier coding models are post-trained on their harnesses (e.g. Claude in Claude Code, GPT-5 Codex in Codex), some will argue that the best harness and/or configuration is the one that the model was trained on.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;For example, the Codex models are so tightly coupled with the Codex harness&#39;s &lt;code class=&quot;px-2 py-0.5 rounded font-mono bg-[var(--code-bg)] text-[var(--accent)] border border-[var(--code-border)] tracking-tight shadow-sm mr-1 align-middle wrap-anywhere box-decoration-clone&quot;&gt;apply_patch&lt;/code&gt; tool that &lt;a href=&quot;https://opencode.ai/&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;OpenCode&lt;/a&gt; — built as an open-source alternative to Claude Code — had to &lt;a href=&quot;https://github.com/anomalyco/opencode/pull/9127&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;add an &lt;code class=&quot;px-2 py-0.5 rounded font-mono bg-[var(--code-bg)] text-[var(--accent)] border border-[var(--code-border)] tracking-tight shadow-sm mr-1 align-middle wrap-anywhere box-decoration-clone&quot;&gt;apply_patch&lt;/code&gt; tool&lt;/a&gt; specifically for GPT/Codex models to mimic the Codex harness to improve the Codex models&#39; performance in the OpenCode harness - while Claude and other models still use normal &lt;code class=&quot;px-2 py-0.5 rounded font-mono bg-[var(--code-bg)] text-[var(--accent)] border border-[var(--code-border)] tracking-tight shadow-sm mr-1 align-middle wrap-anywhere box-decoration-clone&quot;&gt;edit&lt;/code&gt; and &lt;code class=&quot;px-2 py-0.5 rounded font-mono bg-[var(--code-bg)] text-[var(--accent)] border border-[var(--code-border)] tracking-tight shadow-sm mr-1 align-middle wrap-anywhere box-decoration-clone&quot;&gt;write&lt;/code&gt; tools.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;This &lt;em style=&quot;color:var(--text-secondary)&quot;&gt;can&lt;/em&gt; mean that a model will perform better when coupled with the harness it was post-trained on, and some might infer that this means you shouldn&#39;t customize the harness at all.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;But it cuts both ways: &lt;strong style=&quot;color:var(--text-primary)&quot;&gt;models can be over-fitted to their harness&lt;/strong&gt;. Viv cites &lt;a href=&quot;https://terminalbench.com/&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;Terminal Bench 2.0&lt;/a&gt; where Opus 4.6 in Claude Code comes in position #33, but when placed in a different harness that wasn&#39;t seen during post-training, it comes in at #5 (+/- about 4 positions in either direction).&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;&lt;img src=&quot;https://www.humanlayer.dev/blog/skill-issue/terminal-bench.png&quot; alt=&quot;terminal bench&quot; class=&quot;max-w-full h-auto my-4 rounded-none&quot; style=&quot;border:1px solid var(--border-color)&quot; referrerpolicy=&quot;no-referrer&quot;&gt;&lt;/p&gt;
        &lt;h2 class=&quot;text-3xl mb-3 mt-20 font-bold&quot; style=&quot;color:var(--text-primary)&quot;&gt;## &lt;!-- --&gt;Engineering Your Harness&lt;/h2&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;With that in mind, let&#39;s walk through the configuration surfaces we&#39;ve found most impactful.&lt;/p&gt;
        &lt;h3 class=&quot;text-2xl mb-2 mt-16 font-bold&quot; style=&quot;color:var(--text-primary)&quot;&gt;### &lt;!-- --&gt;CLAUDE.md &amp;amp; AGENTS.md&lt;/h3&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;Before touching any other harness configuration points, it&#39;s usually worth customizing your CLAUDE.md / AGENTS.md files. These are markdown files at the top-level of your repository that get deterministically injected into the agent&#39;s system prompt by the harness.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;We have already shared &lt;a href=&quot;https://www.humanlayer.dev/blog/writing-a-good-claude-md&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;some opinions about what makes a good CLAUDE.md&lt;/a&gt; file and how to use it correctly, so give that a read if you&#39;re not familiar with it. Matt Pocock also wrote a &lt;a href=&quot;https://www.aihero.dev/a-complete-guide-to-agents-md&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;great follow-up&lt;/a&gt; that more generally applies to AGENTS.md.&lt;/p&gt;
        &lt;h4 class=&quot;text-xl mb-1 mt-8 font-bold&quot; style=&quot;color:var(--text-primary)&quot;&gt;#### &lt;!-- --&gt;The ETH Zurich Study&lt;/h4&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;After ETH Zurich published their &lt;a href=&quot;https://arxiv.org/abs/2602.11988&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;study testing 138 agentfiles across various repos&lt;/a&gt; which indicated that most agentfiles were useless-or-worse, we got a lot of feedback on our &lt;a href=&quot;https://www.humanlayer.dev/blog/writing-a-good-claude-md&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;CLAUDE.md post&lt;/a&gt;:&lt;/p&gt;
        &lt;blockquote class=&quot;border-l-4 pl-4 my-4 italic&quot; style=&quot;border-color:var(--accent);color:var(--text-muted)&quot;&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;See, CLAUDE.md files don&#39;t even help — they&#39;re a waste of time.&lt;/p&gt;
        &lt;/blockquote&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;Indeed: the study tested many agentfiles across a wide variety of repos, and found:&lt;/p&gt;
        &lt;div class=&quot;pl-8 mb-6&quot;&gt;&lt;ul class=&quot;list-disc space-y-4&quot; style=&quot;color:var(--text-secondary)&quot;&gt;
        &lt;li&gt;that LLM-generated ones actually &lt;em style=&quot;color:var(--text-secondary)&quot;&gt;hurt&lt;/em&gt; performance while costing 20%+ more&lt;/li&gt;
        &lt;li&gt;human-written ones only helped about 4%.&lt;/li&gt;
        &lt;li&gt;Agents spent 14-22% more reasoning tokens processing context file instructions, took more steps to complete tasks, and ran more tools — all without improving resolution rates.&lt;/li&gt;
        &lt;li&gt;Codebase overviews and directory listings didn&#39;t help at all; agents discover repository structure on their own just fine.&lt;/li&gt;
        &lt;/ul&gt;&lt;/div&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;A careful reading of the study &lt;strong style=&quot;color:var(--text-primary)&quot;&gt;indicates that what we said in our post was correct:&lt;/strong&gt;&lt;/p&gt;
        &lt;div class=&quot;pl-8 mb-6&quot;&gt;&lt;ul class=&quot;list-disc space-y-4&quot; style=&quot;color:var(--text-secondary)&quot;&gt;
        &lt;li&gt;&lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;Agent-generated files were worse. Yes, we said:&lt;/p&gt;
        &lt;blockquote class=&quot;border-l-4 pl-4 my-4 italic&quot; style=&quot;border-color:var(--accent);color:var(--text-muted)&quot;&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;avoid auto-generating it. You should carefully craft its contents for best results.&lt;/p&gt;
        &lt;/blockquote&gt;
        &lt;/li&gt;
        &lt;li&gt;&lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;Lots of files too-heavily-steered the model to use specific tools, causing worse outcomes. Yep, we said:&lt;/p&gt;
        &lt;blockquote class=&quot;border-l-4 pl-4 my-4 italic&quot; style=&quot;border-color:var(--accent);color:var(--text-muted)&quot;&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;Less (instructions) is more. While you shouldn&#39;t omit necessary instructions, you should include as few instructions as reasonably possible in the file.&lt;/p&gt;
        &lt;/blockquote&gt;
        &lt;/li&gt;
        &lt;li&gt;&lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;Files contained irrelevant context. Yep, we said:&lt;/p&gt;
        &lt;blockquote class=&quot;border-l-4 pl-4 my-4 italic&quot; style=&quot;border-color:var(--accent);color:var(--text-muted)&quot;&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;Use Progressive Disclosure&lt;/p&gt;
        &lt;/blockquote&gt;
        &lt;/li&gt;
        &lt;li&gt;&lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;The human-written ones barely helped because of too many conditional rules. Yep, we said:&lt;/p&gt;
        &lt;blockquote class=&quot;border-l-4 pl-4 my-4 italic&quot; style=&quot;border-color:var(--accent);color:var(--text-muted)&quot;&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;Keep the contents of your CLAUDE.md concise and universally applicable.&lt;/p&gt;
        &lt;/blockquote&gt;
        &lt;/li&gt;
        &lt;/ul&gt;&lt;/div&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;Our CLAUDE.md is under 60 lines.&lt;/p&gt;
        &lt;h3 class=&quot;text-2xl mb-2 mt-16 font-bold&quot; style=&quot;color:var(--text-primary)&quot;&gt;### &lt;!-- --&gt;MCP Servers Are for Tools&lt;/h3&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;MCP servers are&amp;nbsp;primarily for plugging tools into your coding agent to extend its capabilities beyond file I/O and bash commands. The MCP specification includes additional features like resources, prompts, and elicitations, but &lt;strong style=&quot;color:var(--text-primary)&quot;&gt;these are generally not well-supported by MCP clients and coding agent harnesses&lt;/strong&gt;.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;The MCP spec supports servers that run on your (or the agent&#39;s) local machine which allow the agent to interact with its local environment, but it also supports HTTP-based MCP servers that can connect your agent with remote tools and services like Linear, Sentry, and more.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;When you plug an MCP server into your coding agent, the list of available tools, their descriptions, and the arguments needed to invoke them &lt;strong style=&quot;color:var(--text-primary)&quot;&gt;are injected into your coding agent&#39;s system prompt&lt;/strong&gt;. As a result, the MCP server can use the tool descriptions to customize your agent’s behavior by providing your agent with instructions about when to use them.&amp;nbsp;&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;&lt;strong style=&quot;color:var(--text-primary)&quot;&gt;WARNING&lt;/strong&gt;: because MCP servers’ tool descriptions are added to your coding agent’s system prompt, never connect to one you don’t trust. This can be a dangerous vector for prompt injection! STDIO servers and other servers that run client-side with npx or uvx can also execute code on your host in the absence of prompt injection.&lt;/p&gt;
        &lt;h4 class=&quot;text-xl mb-1 mt-8 font-bold&quot; style=&quot;color:var(--text-primary)&quot;&gt;#### &lt;!-- --&gt;Too Many Tools Is Bad&lt;/h4&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;We’ve seen this firsthand: plug too many MCP tools into your agent, and the context window fills up with tool descriptions, pushing you into &lt;a href=&quot;https://youtu.be/rmvDxxNubIg?si=O17nmS3SScaAkpp-&amp;amp;t=355&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;the dumb zone&lt;/a&gt; much faster:&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;&lt;img src=&quot;https://www.humanlayer.dev/blog/skill-issue/too-many-tools.png&quot; alt=&quot;too many tools&quot; class=&quot;max-w-full h-auto my-4 rounded-none&quot; style=&quot;border:1px solid var(--border-color)&quot; referrerpolicy=&quot;no-referrer&quot;&gt;&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;The &lt;a href=&quot;https://www.aihero.dev/a-complete-guide-to-agents-md#the-instruction-budget&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;instruction budget&lt;/a&gt; matters too — every irrelevant tool description is an instruction the agent has to process without any benefit.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;In fact, these failure modes are so common that Anthropic released &lt;a href=&quot;https://platform.claude.com/docs/en/agents-and-tools/tool-use/tool-search-tool&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;experimental support for MCP tool search&lt;/a&gt; to progressively disclose tools to Claude when the user has too many MCP tools connected. TL;DR: if you’re not actively using a server which provides a large number of tools, turn it off.&lt;/p&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;We also found that if an MCP server duplicates functionality that’s already available as a CLI well-represented in training data, it works better to just prompt the agent to use the CLI. For things like GitHub, Docker, or most databases, your coding agent can just use the right CLIs and shell commands. The model has seen these tools enough during training that it already knows how to use them, and you gain the added benefit of composability with tools like &lt;code class=&quot;px-2 py-0.5 rounded font-mono bg-[var(--code-bg)] text-[var(--accent)] border border-[var(--code-border)] tracking-tight shadow-sm mr-1 align-middle wrap-anywhere box-decoration-clone&quot;&gt;grep&lt;/code&gt; and &lt;code class=&quot;px-2 py-0.5 rounded font-mono bg-[var(--code-bg)] text-[var(--accent)] border border-[var(--code-border)] tracking-tight shadow-sm mr-1 align-middle wrap-anywhere box-decoration-clone&quot;&gt;jq&lt;/code&gt; to enable additional context-efficiency.&lt;/p&gt;
        &lt;h4 class=&quot;text-xl mb-1 mt-8 font-bold&quot; style=&quot;color:var(--text-primary)&quot;&gt;#### &lt;!-- --&gt;Always Be Context-Engineering&lt;/h4&gt;
        &lt;p style=&quot;display:block;color:var(--text-secondary)&quot; class=&quot;mb-4&quot;&gt;At HumanLayer, we used the Linear MCP server for a while before realizing that we really only used a small subset of the tools it provides - so we wrote a small CLI that wraps the Linear API and provides very context-efficient responses, and we included 6 example usages in our &lt;a href=&quot;https://www.humanlayer.dev/blog/writing-a-good-claude-md&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; class=&quot;underline hover:opacity-80 transition-opacity&quot; style=&quot;color:var(--accent)&quot;&gt;CLAUDE.md&lt;/a&gt; file:&lt;/p&gt;
        &lt;pre class=&quot;group&quot;&gt;&lt;pre class=&quot;prismjs rsh-code-block grid relative min-w-[250px] mb-6 bg-gray-400/5 p-4&quot;&gt;&lt;button data-slot=&quot;button&quot; class=&quot;inline-flex items-center justify-center gap-2 whitespace-nowrap rounded-none text-sm font-mono font-medium cursor-pointer disabled:cursor-not-allowed disabled:pointer-events-none disabled:opacity-50 [&amp;amp;_svg]:pointer-events-none [&amp;amp;_svg:not([class*=&#39;size-&#39;])]:size-4 shrink-0 [&amp;amp;_svg]:shrink-0 outline-none focus-visible:ring-[3px] uppercase tracking-wider bor

const response = await ofetch(listUrl);
const $ = load(response);

const list = $('a.block.py-2.group[href^="/blog/"]:not([href^="/blog/tags/"])')
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once again redundant filtering using not(). Provide a screenshot if the site does return a[href^="/blog/tags/"] to you with class name block py-2 group.

Here's what I'm seeing Image

Comment on lines +55 to +56
const category = parts
.slice(3)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use parts[3] instead of copying it again.

const category = parts
.slice(3)
.join(' ')
.match(/#\w+/g)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/#\w+/ cannot match words after - which you can see from the results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants