Prompts Alone Are Not Enough — The Complete 4-Layer Harness Guide for Claude Code
The real competitive edge of an AI agent comes from its harness, not the model. A complete breakdown of the CLAUDE.md · Hooks · Skills · Subagents four-layer architecture for running Claude Code reliably in production, with step-by-step examples.
AI-assisted draft · Editorially reviewedThis blog content may use AI tools for drafting and structuring, and is published after editorial review by the RanketAI Editorial Team.
Prologue: The Harness Wins, Not the Model
There is a pattern that keeps repeating in AI coding. Swapping in a stronger model rarely changes team productivity by much. But adding structure around the same model? The results can be dramatic.
The clearest example is LangChain. Without touching the underlying model, they improved only their harness structure and lifted their score on Terminal Bench 2.0 from 52.8% to 66.5% — jumping from Top 30 to Top 5. Claude Code crossing $1 billion in annualized revenue within six months of launch was not solely because Anthropic built a better model. It was because Anthropic built the right harness: a streaming agent loop, a permission-governed tool dispatch system, and a context management layer that keeps the model focused across arbitrarily long sessions.
The formula is simple.
Agent = Model + Harness
The harness is everything except the model: rules, procedures, validation gates, and isolated execution environments. In Claude Code, those four things are CLAUDE.md, Skills, Hooks, and Subagents.
This guide explains what each of the four layers does, when to use which, and how to integrate them into a production-grade operating structure — with concrete examples throughout.
1. Overview: The Four Layers
Claude Code's harness breaks down into four distinct layers. Each has a different role and a different scope of responsibility. They cannot be collapsed into one.
| Layer | Component | Core Role | Persistence |
|---|---|---|---|
| Layer 1 | CLAUDE.md | Rules · Context · Long-term memory | Permanent across sessions |
| Layer 2 | Skills | Procedure automation · Domain expertise | Active when invoked |
| Layer 3 | Hooks | Deterministic execution gate | Fires on every matching event |
| Layer 4 | Subagents | Isolated execution · Parallel work | Independent per request |
Why four layers?
Trying to pack everything into a single prompt inevitably breaks down at scale. When rules and procedures share the same space, the model struggles to decide which takes priority. When validation depends on prompts, the model's mistakes take the validation down with them. Without isolation, a single session accumulates context pollution over time.
The four layers solve each of these problems independently.
2. Layer 1: CLAUDE.md — The Agent's Long-Term Memory
What CLAUDE.md Does
Every time a session starts, Claude Code automatically reads CLAUDE.md from the project root. This is the agent's cross-session memory — the equivalent of the onboarding document a new team member receives on day one.
Without a CLAUDE.md, the agent starts every session as if it just walked in the door. It has no idea what the tech stack is, which conventions to follow, or which files it should never touch.
What to Include — and What to Leave Out
Include:
- Tech stack and versions (package manager, framework)
- Code conventions (variable naming, file structure rules)
- Prohibited actions (files that must never be modified, patterns to avoid)
- Core architecture overview (which layer does what)
- Commit and PR message formats
- How to run tests
Leave out:
- One-off instructions valid only for the current session (handle those in the conversation)
- Long code blocks (reference a separate file instead)
- Frequently changing business logic (hardcoding it increases maintenance cost)
- Procedures that belong in Skills or Hooks (avoid mixing layers)
Real-World Example: CLAUDE.md for a Claude Code Project
# Project Overview
Next.js 15 + TypeScript. App Router only.
Package manager: npm (never use yarn or pnpm)
# Core Architecture
- app/ : routes (server components preferred)
- components/ : reusable UI
- lib/ : business logic (server-only)
- hooks/ : client hooks
# Code Rules
- Components: PascalCase, file name matches
- Variables/functions: camelCase
- Constants: UPPER_SNAKE_CASE
- Import order: external packages → internal absolute → relative
# Hard Prohibitions
- No useState/useEffect inside lib/
- No new `any` types
- No console.log in commits (console.error is allowed)
- Never modify node_modules or .env
# Tests
npm run test:unit # unit tests
npm run test:e2e # E2E (local only)
# Commit Format
feat(scope): description
fix(scope): description
refactor(scope): description
Layered CLAUDE.md: Subdirectory Overrides
As a project grows, a single root CLAUDE.md is not enough. Claude Code supports placing CLAUDE.md files in subdirectories. The agent loads the relevant one when working in that directory.
project/
├── CLAUDE.md # project-wide common rules
├── app/
│ └── CLAUDE.md # route-specific rules
├── lib/
│ └── CLAUDE.md # server logic rules
└── tests/
└── CLAUDE.md # test writing rules
For example, lib/CLAUDE.md might contain:
# lib/ Rules
All files in this directory are server-only.
- Never import React hooks
- Always use lib/fetcher.ts for external API calls
- Re-export new files from lib/index.ts
- Error handling: use throw new AppError(...) — never throw raw Error
Anti-Pattern: CLAUDE.md Bloat
When CLAUDE.md exceeds ~200 lines, it becomes a warning sign. The model has trouble deciding which rule takes precedence, and compliance actually drops. The key is to separate concerns cleanly:
- Rules: CLAUDE.md (what must be true)
- Procedures: Skills (how to do it)
- Enforced checks: Hooks (must run regardless)
3. Layer 2: Skills — The Agent's Domain Expertise
What Skills Do
Skills encapsulate repeatable work procedures into /slash-commands. Think of them as the company's standard operating procedures (SOPs), stored in a form the agent can pull up at any time.
If CLAUDE.md answers "what does the agent need to know?", Skills answer "how should it do it?"
Skill File Structure
Skills live as markdown files in .claude/skills/.
.claude/
└── skills/
├── commit.md
├── review-pr.md
├── create-component.md
└── deploy-check.md
Every Skill file has two parts:
- YAML frontmatter: metadata that tells Claude when to use this Skill
- Markdown body: step-by-step instructions Claude follows when the Skill is active
Example 1: Commit Skill
---
name: commit
description: Analyze staged changes and create a conventional commit message
triggers:
- "commit"
- "save changes"
- "commit this"
---
## Commit Procedure
1. Run `git diff --staged` to understand all staged changes
2. Classify the change type (feat/fix/refactor/chore/docs/test)
3. Determine scope — use the name of the primary module changed
4. Write a description under 72 characters
5. If there is a BREAKING CHANGE, note it in the commit body
6. Always show the drafted message to the user for confirmation before committing
### Commit Message Format
{type}({scope}): {description}
[optional body] [optional footer]
### Prohibitions
- Never use `git add -A` or `git add .`
- Never commit with failing tests
- If .env appears in the staging area, stop immediately and alert the user
Example 2: PR Review Skill
---
name: review-pr
description: Self-review code quality and safety before submitting a PR
triggers:
- "pr review"
- "review before merge"
- "pre-submit check"
---
## PR Review Checklist
### 1. Scope Check
- List changed files: `git diff --name-only origin/main`
- Confirm no unintended files are included
### 2. Code Quality
- TypeScript errors: `tsc --noEmit`
- Lint: `npm run lint`
- Verify new functions have corresponding tests
### 3. Security Check
- No hardcoded secrets (API keys, passwords)
- External inputs are validated
### 4. PR Description Draft
Write a PR description using this structure:
- **What changed:** 1–3 bullet points
- **Why:** business or technical reason
- **How to test:** verification steps
- **Screenshots:** mark as needed for UI changes
Auto-Trigger vs Manual Invocation
When triggers are defined in a Skill's frontmatter, Claude activates the Skill automatically when it detects a matching task. Alternatively, Skills can be called explicitly via slash commands like /commit.
Use auto-trigger when:
- The work always follows the same procedure
- The trigger context is clear and unambiguous
Use manual invocation when:
- The user must explicitly initiate the action (deployment, large refactors)
- Accidental triggering would be costly
Making Skills a Team Asset
The biggest advantage of Skills is shareability. Personal prompt know-how saved as files in .claude/skills/ becomes available to every team member who clones the repo.
# Recommended team-shared Skills
.claude/skills/
├── commit.md # commit message standard
├── review-pr.md # PR review checklist
├── create-component.md # component creation template
├── write-test.md # test writing guide
├── debug-error.md # debugging procedure
└── deploy-check.md # pre-deploy final check
4. Layer 3: Hooks — The Deterministic Execution Gate
The Critical Difference Between Hooks and Skills
Skills guide Claude to follow a procedure. But models make mistakes. Or they may skip steps when context is thin.
Hooks are different. Hooks execute regardless of model behavior. They are system-level enforcement, not prompts. The concept is analogous to git hooks, but applied across Claude Code's entire agent lifecycle.
An analogy:
- Skills: "Please follow this checklist before you start" (recommendation)
- Hooks: "The door won't open without an access card" (enforcement)
When Hooks Fire
Claude Code Hooks fire at two key points:
| Trigger | Purpose |
|---|---|
| PreToolUse | Before a tool runs — can block or modify |
| PostToolUse | After a tool runs — validate results or post-process |
Example 1: Secret Scan Hook (PreToolUse)
Checks for API keys or passwords before any file is saved.
{
"hooks": {
"PreToolUse": [
{
"matcher": "Write|Edit|MultiEdit",
"hooks": [
{
"type": "command",
"command": "bash -c \"echo '$CLAUDE_TOOL_INPUT' | python3 scripts/secret-scan.py\""
}
]
}
]
}
}
If secret-scan.py detects a secret pattern, Claude Code halts the file write. Even if the model accidentally tries to hardcode an API key, it is physically blocked.
Example 2: Auto-Lint Hook (PostToolUse)
Runs lint automatically after every file modification.
{
"hooks": {
"PostToolUse": [
{
"matcher": "Write|Edit",
"hooks": [
{
"type": "command",
"command": "npm run lint -- --fix $CLAUDE_TOOL_INPUT_PATH 2>&1 | tail -5"
}
]
}
]
}
}
Lint runs every time the model touches a file, and fixable issues are resolved automatically. "Oh, there's a lint error" discoveries disappear.
Example 3: Protected File Access Block
Forces the agent to never modify certain paths.
{
"hooks": {
"PreToolUse": [
{
"matcher": "Write|Edit|MultiEdit|Bash",
"hooks": [
{
"type": "command",
"command": "python3 -c \"\nimport json, sys, os\ntool_input = json.loads(os.environ.get('CLAUDE_TOOL_INPUT', '{}'))\npath = tool_input.get('file_path', '') or tool_input.get('path', '')\nprotected = ['.env', 'secrets/', 'migrations/']\nif any(p in path for p in protected):\n print(f'BLOCKED: {path} is a protected file')\n sys.exit(1)\n\""
}
]
}
]
}
}
Where to Place Hook Config
Hooks settings live in two places:
- Project-level:
.claude/settings.json— committed to the repo, applies to the whole team - User-level:
~/.claude/settings.json— applies to the individual's machine
.claude/
├── settings.json # team Hooks (committed)
└── settings.local.json # personal Hooks (gitignored)
When Hooks Become a Problem
Hooks are powerful, but heavy Hooks slow things down. Running expensive checks on every single file save significantly reduces agent speed. Use this filter:
- Only put things in Hooks that must always run
- Recommendations go in Skills
- Expensive checks go in CI/CD
5. Layer 4: Subagents — Isolation and Parallel Work
Why a Single Agent Session Is Not Enough
Handling research, implementation, and review all in one Claude Code session creates two problems.
Context pollution: The noise accumulated during exploration — files read, intermediate notes — clouds the judgment needed during implementation.
No independent review: In the review phase, the model is evaluating code it wrote itself. It tends to be lenient on its own work.
Subagents solve both problems. They run in independent context windows with independent goals.
Three Core Subagent Patterns
Pattern 1: Role Separation (Research → Implement → Review)
The most fundamental and effective pattern.
Main Agent
├── [Subagent A: Research]
│ Goal: explore codebase, analyze requirements
│ Output: context summary for implementation
│
├── [Subagent B: Implement]
│ Goal: write code based on context summary
│ Input: A's summary only (not the full exploration log)
│
└── [Subagent C: Review]
Goal: independently evaluate the implementation
Input: implemented code + original requirements
(B's reasoning process is NOT passed here)
Using subagents explicitly in Claude Code:
/fork research
Analyze how UserAuthService works in this repo.
Summarize session token handling, expiry logic, and related test files.
After receiving the research summary:
/fork implement
[paste research summary here]
Based on the analysis above, add refresh token rotation.
Do not change any existing interfaces — add new methods only.
After implementation:
/fork review
[list the implemented file paths]
Independently verify whether these files meet the following requirements:
1. No changes to existing interfaces
2. Edge cases for token expiry handled
3. No security vulnerabilities
Pattern 2: Parallel Independent Work
Process tasks with no dependencies simultaneously. For example, building three independent components at the same time, or writing tests for multiple files in parallel.
# Parallel subagent example
Main Agent → [Subagent A: implement Header component]
→ [Subagent B: implement Footer component]
→ [Subagent C: implement Sidebar component]
Since all three run at once, elapsed time drops significantly versus sequential processing.
Pattern 3: Specialized Agent Team
Create domain-specific subagents and reuse them. Examples: a security-review-only agent, a performance-analysis-only agent.
Custom agent configurations go in .claude/agents/.
---
name: security-reviewer
description: Dedicated security vulnerability code review agent. Auto-invoked when new code is added.
model: claude-sonnet-4-6
tools: Read, Grep
---
You are a security-focused code reviewer.
Always check for:
1. SQL injection (external input directly in queries)
2. XSS vulnerabilities (unescaped DOM manipulation)
3. Auth bypass (missing permission checks)
4. Hardcoded secrets
5. Sensitive data in logs
For each finding, report severity (High/Medium/Low) and a remediation path.
When Subagents Become Expensive
Subagents carry a cost. Each one consumes an independent context window, so token usage multiplies. General guidance:
- Use only for work where context pollution is a real problem at the current scale
- Prioritize for high-stakes tasks requiring independent validation
- Keep simple file edits and small tasks in a single session
6. Integration: A Full PR Workflow
Here is how all four layers work together in a real task.
Scenario: Add a New Feature → Submit a PR
[Session Start]
└─ CLAUDE.md auto-loaded
├─ Tech stack (Next.js 15, TypeScript)
├─ Code conventions (import order, naming)
└─ Prohibitions (no any, no console.log)
[Research Phase: Subagent A]
└─ Explore existing code relevant to requirements
└─ Output: context summary document
[Implementation Phase: Subagent B]
└─ Write code from context summary
├─ On file save → PreToolUse Hook fires
│ └─ Secret scan check
└─ After file save → PostToolUse Hook fires
└─ Auto-lint + auto-fix
[Review Phase: Subagent C]
└─ Independent evaluation of implementation
└─ Checklist-based report
[PR Preparation: /review-pr Skill]
└─ Auto-generate PR description draft
├─ Type check: tsc --noEmit
├─ Change scope verification
└─ PR description template complete
[Commit: /commit Skill]
└─ Conventional commit message auto-generated
└─ User confirms before execution
The human makes exactly two judgment calls in this entire flow:
- Review the subagent's report
- Confirm the final commit message
The harness handles the rest.
Decision Framework: Which Layer Gets What?
Three questions determine placement:
Q1: Does the agent need to know this at all times?
→ YES: CLAUDE.md
Q2: Is this a repeatable procedure?
→ YES: Skills
Q3: Must this always execute? (model mistakes not acceptable)
→ YES: Hooks
Q4: Does this require an independent context window?
→ YES: Subagents
7. Team Adoption Roadmap: 4 Stages
Trying to adopt all four layers at once is overwhelming. Build incrementally in this order.
Week 1: Write CLAUDE.md
The easiest start with the most immediate payoff.
Actions:
- Document the tech stack, conventions, and prohibitions your team currently uses
- Collect what team members find themselves repeatedly explaining to the agent
- Draft a root
CLAUDE.mdand refine it with team feedback over the week
Success signal: Team members stop repeating the same explanations for the same tasks
Weeks 2–3: Implement Core Hooks
Start with the minimum security and quality guardrails.
Actions:
- Secret scan Hook (before file save)
- Auto-lint Hook (after file modification)
- Protected file access block
Success signal: No more incidents where AI accidentally modifies .env or hardcodes an API key
Week 4: Standardize Skills
Turn the 3–5 most-repeated procedures into Skills.
Actions:
- Commit message Skill
- PR review checklist Skill
- Skills for the most common code patterns (component creation, API endpoint scaffolding, etc.)
Success signal: The whole team produces commit messages and PRs at a consistent quality level
Month 2: Introduce Subagents
Establish the research/implement/review separation pattern as a team standard.
Actions:
- Pilot the three-stage subagent pattern on one complex feature
- Write a dedicated security review agent config
- Identify which types of work benefit most from parallelism
Success signal: Fewer issues discovered during code review on complex feature PRs
8. Risks: Four Common Anti-Patterns
Anti-Pattern 1: Everything in CLAUDE.md
Mixing rules, procedures, and automation checks into a single CLAUDE.md causes compliance to drop as the file passes ~200 lines. Doubling the length does not double the effectiveness.
Fix: Move procedures to Skills, enforced checks to Hooks
Anti-Pattern 2: Using Skills Instead of Hooks
"Please run a thorough security check" as a Skill is a recommendation. If the model makes a mistake or runs low on context, the check gets skipped.
Fix: Anything that must always run goes in Hooks, no exceptions
Anti-Pattern 3: Overusing Subagents
Spawning subagents for simple file edits inflates token cost and actually slows things down.
Fix: Use subagents only when context pollution is a genuine concern at the current work scale
Anti-Pattern 4: Keeping the Harness Personal
A harness that only works on one developer's local machine never becomes an organizational asset. If team members produce inconsistent AI output quality, a non-shared harness is likely the cause.
Fix: Commit the .claude/ directory to the repository so the whole team runs the same harness
9. Practical Decision Guide
Quick Diagnosis: Which Layer Do You Need Right Now?
| Symptom | Root cause | Prescription |
|---|---|---|
| Agent keeps violating team conventions | CLAUDE.md missing or thin | Strengthen Layer 1 |
| Same procedure explained repeatedly | Skills missing | Introduce Layer 2 |
| AI accidentally modifies dangerous files | Hooks missing | Implement Layer 3 |
| Many unexpected bugs after complex features | Context pollution | Apply Layer 4 |
| Output quality varies by team member | Harness is still personal | Move to shared repo structure |
Quick Checklists by Role
Individual Developer
-
CLAUDE.mdexists at the project root - Most-repeated procedures are captured as Skills
- Secret scan Hook is configured
- Important tasks use research/implement/review separation
Team Lead
-
.claude/directory is committed to the repository - A shared team CLAUDE.md exists
- Essential Hooks (secret scan, lint) are in the team settings
- PR review Skill is used as the team standard
10. FAQ
Q1. What is the difference between CLAUDE.md and .cursorrules?
Functionally similar — both give the agent project context. The difference is that Claude Code's CLAUDE.md supports subdirectory layering and integrates natively with the Skills and Hooks system. If you are using Claude Code, committing CLAUDE.md to the repository is the recommended approach.
Q2. Are Hook settings JSON or YAML?
JSON, in .claude/settings.json. The file can be committed to the repository for team-wide application, or placed in ~/.claude/settings.json for personal machine use only.
Q3. How much more do subagents cost?
Each subagent uses an independent context window, so token consumption multiplies. Rough estimate: 1.5–3× the tokens of the same work done in a single session. The offset is faster elapsed time through parallelism, higher review quality, and reduced rework cost downstream.
Q4. What is the difference between Skills and MCP servers?
Skills teach the agent a procedure. MCP (Model Context Protocol) servers give the agent the ability to access external tools. "The procedure for creating a GitHub PR" fits a Skill. "The ability to call the GitHub API in real time to read PR status" fits an MCP server.
Q5. Do small teams (1–3 people) really need this structure?
They need it even more. Small teams have one person wearing multiple hats — the harness maintains consistency where human bandwidth cannot. When the team grows, onboarding cost drops dramatically. Start with at minimum a CLAUDE.md and a secret scan Hook.
Epilogue: Building the Harness Is Building the Team's Moat
As Claude Code improves, and as competing tools converge on similar model capability, the differentiator will increasingly live in the harness. What rules did you make explicit? What procedures did you automate? What validation gates did you design? What work did you isolate?
These are not things that disappear when the model is upgraded. If anything, a better model amplifies the effect of a well-designed harness.
Building the four-layer harness, layer by layer. That is the most important asset a team can create right now in the age of AI coding.
Key Action Summary
| Layer | Component | Right now | Within 1 month |
|---|---|---|---|
| Layer 1 | CLAUDE.md | Write root CLAUDE.md | Add subdirectory layering |
| Layer 2 | Skills | Create commit + PR review Skills | Standardize 5 team Skills |
| Layer 3 | Hooks | Add secret scan Hook | Add auto-lint + protected file block |
| Layer 4 | Subagents | Pilot on one complex task | Establish role-separation pattern as team standard |
Further Reading
- Why AI Coding's Decisive Battleground Became Verification, Not Generation
- Claude Code Advanced Patterns: How to Connect Skills, Fork, and Subagents
- Cursor vs Claude Code vs GitHub Copilot: Real-World AI Coding Tool Comparison, March 2026
Update History
- Content as of: 2026-04-11 (KST)
- Review cycle: monthly
- Next scheduled review: 2026-05-12
Execution Summary
| Item | Practical guideline |
|---|---|
| Core topic | Prompts Alone Are Not Enough — The Complete 4-Layer Harness Guide for Claude Code |
| Best fit | Prioritize for AI Productivity & Collaboration workflows |
| Primary action | Identify your highest-repetition task and pilot AI assistance there first |
| Risk check | Measure output quality before and after AI augmentation to detect accuracy trade-offs |
| Next step | Document time saved and error-rate changes after the first 30-day trial |
Data Basis
- Scope: Anthropic official docs, Claude Code public architecture analyses, and verified community production cases from March–April 2026
- Evaluation axis: role-separation criteria for each of the four layers (CLAUDE.md · Skills · Hooks · Subagents) and production operation patterns
- Verification standard: only official documentation and validated community cases — repeatable patterns, not one-off demos
Key Claims and Sources
This section maps key claims to their supporting sources one by one for fast verification. Review each claim together with its original reference link below.
Claim:Agent = Model + Harness. Everything outside the model is the harness — LangChain improved only the harness with no model swap and jumped from Top 30 to Top 5 on a coding benchmark
Source:WaveSpeedAI: Claude Code Agent Harness Architecture BreakdownClaim:Claude Code Hooks guarantee execution regardless of model behavior — unlike prompts, which the model can skip
Source:Claude Code Docs: Extend Claude with SkillsClaim:Harness engineering is the practice of making AI success or failure explicit through tests, acceptance criteria, and sample inputs
Source:Martin Fowler: Harness Engineering for Coding Agent UsersClaim:CLAUDE.md and Skills form the context layer (long-term memory + procedures) while Hooks serve as the deterministic execution gate
Source:Paradime: Claude Code Skills & Harness Engineering Complete Guide
External References
The links below are original sources directly used for the claims and numbers in this post. Checking source context reduces interpretation gaps and speeds up re-validation.
- Martin Fowler: Harness Engineering for Coding Agent Users
- Claude Code Docs: Create Custom Subagents
- Claude Code Docs: Extend Claude with Skills
- Anthropic Engineering: Demystifying Evals for AI Agents
- WaveSpeedAI: Claude Code Agent Harness Architecture Breakdown
- Paradime: Claude Code Skills & Harness Engineering Complete Guide
- Level Up Coding: Building Claude Code with Harness Engineering
Have a question about this post?
Sign in to ask anonymously in our Ask section.
Related Posts
These related posts are selected to help validate the same decision criteria in different contexts. Read them in order below to broaden comparison perspectives.
Why AI Coding Competition Shifted from Generation to Verification: The Rise of Harness Engineering
In the coding-agent era, advantage is moving away from generating more code and toward validating and accumulating reliable change. This deep dive analyzes structural signals from OpenAI, Anthropic, and GitHub.
How to Reduce Rework in Vibe Coding: Requirement Templates, Test-First Flow, and Review Routines
If AI outputs drift, rework repeats, and results vary every run, the root issue is usually operations. This practical guide shows how to improve consistency with requirement templates, test-first workflows, and checklist-based review.
[AI Trend] Coding Assistant 3.0: How Copilot, Cursor, and Claude Code Are Reshaping Development
From line-by-line autocomplete to autonomous codebase-wide agents — a trend analysis of how GitHub Copilot, Cursor, and Claude Code are creating a new software development paradigm in 2026.
AI Agent Project Kickoff Checklist: 7 Steps to Start Without Failing
A field-tested 7-step checklist for teams launching AI agent projects, covering failure pattern analysis, minimum viable agent design, human-in-the-loop gates, and measurable success criteria.
The Shift to "Agent-Centric" Interfaces We Must Watch in 2026
Analyzing the grand transition from the era of search bars and buttons to "Intent-based UX," where AI agents preemptively understand and execute user intentions.