💰 Token Savings

claude-atelier is designed from the ground up to cut Claude API costs. Here are the concrete numbers.

Summary

Up to 90% cost reduction compared to an unstructured Claude Code setup.

Technique	Savings	Mechanism
Model routing (Haiku/Sonnet vs always-Opus)	~80%	`routing-check.sh` auto-routes by task type
`/compact` context compression	60–80% per session	Built-in hook, triggered after explore + feature
Conditional stack loading	~30%	§10 loads only the relevant stack docs
QMD-first (search before Read)	~20%	`mcp__qmd__query` replaces `Read` on `.md` files
`maxBudgetUsd` hard cap	100% runaway prevention	Session killed if budget exceeded

Model Routing

The biggest lever. Claude Opus costs ~15× more than Haiku per token.

Haiku   →  exploration, search, lint, quick lookups
Sonnet  →  standard dev, bug fixes, features
Opus    →  architecture, migrations, complex decisions

routing-check.sh injects the current model into every message. CLAUDE.md §15 enforces routing rules. Result: ~80% cost reduction on a typical session vs always-Opus.

# What gets routed to Haiku (cheapest)
- Codebase exploration
- File search / grep
- Running tests
- Commit messages

# What stays on Sonnet (standard)
- Feature implementation
- Bug fixes
- Code review

# What escalates to Opus (most expensive)
- Architecture decisions
- Database schema design
- Multi-file migrations

`/compact` — Context Compression

Context window fills up fast. Each message on a 200k-token context costs proportionally more.

/compact compresses prior conversation by 60–80% while preserving semantic continuity.

When to run:

After an explore phase
After each completed feature
Before switching tasks
When the session hook warns about context size

The session hook (routing-check.sh) automatically alerts at 2MB+ context.

Conditional Stack Loading

CLAUDE.md §10 loads stack-specific standards only when relevant (defined in §0).

Full load (all stacks):  ~8,000 tokens on every message
Conditional load:        ~1,500 tokens — only the active stack

~30% reduction on sessions involving multiple stack references.

QMD-First Search

Instead of Read (loads full file into context), mcp__qmd__query returns only the relevant snippets.

Read on a 500-line .md file:   ~2,000 tokens
QMD query (same file):         ~200 tokens

~20% reduction on sessions with heavy documentation lookup.

`maxBudgetUsd` Hard Cap

Set in .claude/settings.json:

{
  "maxBudgetUsd": 5.00
}

Prevents runaway autonomous sessions from burning unlimited budget. Essential for night mode and long-running loops.

Combined Effect

On a typical 2-hour dev session:

Setup	Estimated cost
Claude Code, no framework, always-Opus	~$8–12
claude-atelier with routing + compact + QMD	~$0.80–1.50
Reduction	~85–90%

Quick Setup

npx claude-atelier init

All savings mechanisms are enabled by default after init.

Summary​

Model Routing​

/compact — Context Compression​

Conditional Stack Loading​

QMD-First Search​

maxBudgetUsd Hard Cap​

Combined Effect​

Quick Setup​