Developer Productivity Beyond the Hype

We saved significant hours on real projects. Here's what actually worked.

Not "could save" or "might save"—actual hours on a real project. A technical book with 25 chapters, 100+ notebooks, and complex codebase. Dozens of Claude Code sessions on complex projects.

The productivity gains were real. But they didn't come from where you'd expect.

What Didn't Work: Ad-Hoc Prompting

The intuitive approach to AI-assisted development: describe what you want, let the model generate code. Iterate until it works.

This fails on anything non-trivial.

Context degradation compounds. Early decisions affect later code. If the model loses track of architectural choices made in minute 20, the code generated in minute 90 contradicts them. You don't notice until integration, when things break in subtle ways.

Scope creep is invisible. Ask for one feature, get three. The model "helpfully" adds related functionality that wasn't requested. Each addition seems reasonable. The cumulative effect is architectural drift.

Progress isn't preserved. End the session, start fresh tomorrow. The model doesn't remember yesterday's decisions. You re-explain project context, coding conventions, architectural patterns. Every session begins at zero.

We tried this approach for the first few months. Productivity gains were modest and inconsistent. Some sessions were great. Others produced code that had to be rewritten.

What Worked: Systematic Workflows

The breakthrough came from treating Claude Code like a team member who needs explicit process, not a magic box that reads minds.

Explore before coding. Before implementing anything, we run explicit exploration: What exists? What are the integration points? What patterns does the codebase follow? The exploration output gets written to a file. It persists across sessions.

Plan before implementing. Based on exploration, we create explicit task breakdowns. Not "add backtesting feature" but twelve specific tasks with dependencies and success criteria. The plan is a document, not a conversation.

Execute one task at a time. Each session focuses on specific tasks from the plan. Not "work on the project"—"complete task 003." The narrow scope prevents drift. The explicit boundary prevents helpful over-reaching.

Persist everything important. Decisions, patterns, progress—all written to files. Tomorrow's session reads yesterday's decisions. Context is rebuilt from persistent state, not conversation history.

The workflow crystallized as: explore → plan → next → ship.

The Pattern in Practice

Chapter 12 of the book: Backtesting Strategies. A substantial addition requiring new modules, integration with existing code, and documentation.

Explore (Session 1, 30 minutes):

Analyzed existing backtesting code in `src/backtesting/`
Identified related chapters (4, 8) with potential conflicts
Documented integration points and constraints
Output: exploration.md with findings and recommendations

Plan (Session 1, 45 minutes):

Created 12-task breakdown based on exploration
Each task had success criteria and dependencies
Output: plan.md with sequenced implementation

Execute (Sessions 2-8, varying duration):

Each session tackled 1-3 tasks
State tracked in state.json
Progress resumed seamlessly across sessions
No re-explanation of context required

Ship (Session 9, 30 minutes):

Final validation and integration testing
Documentation updates
PR preparation

The workflow felt natural after a few sessions. Each step had clear boundaries and outputs.

What Made the Difference

Explicit boundaries prevent drift. When the instruction is "implement task 003: position tracking," the model doesn't add a performance analyzer that seems related. The task boundary constrains output.

File-based memory enables continuity. Session 4 doesn't need to re-learn what session 3 decided. The plan is in a file. The state is in a file. Context rebuilds from artifacts, not conversation replay.

Context management preserves quality. We handoff at 80% context utilization, before degradation becomes obvious. Fresh sessions with file-based state maintain quality throughout the project.

Validation is built in. Each task has success criteria. Ship includes explicit checks. Quality gates are part of the workflow, not afterthoughts.

What We Learned About AI Productivity

The productivity gains aren't in "AI writes code faster." They're in:

Reduced context switching. The exploration and planning phases surface integration requirements upfront. Fewer surprises during implementation means fewer interruptions to research existing code.

Maintained consistency. File-based persistence keeps architectural decisions stable across sessions. The code added in month 6 follows the same patterns as month 1.

Focused effort. Task-by-task execution prevents the "helpful extras" that create technical debt. Each addition is intentional.

Recoverable progress. If a session goes wrong, we revert to the last checkpoint. Progress is explicit and recoverable, not implicit in conversation history.

Implications

If you're using AI coding assistants for anything beyond quick scripts:

Build persistence into your workflow. Important decisions should live in files, not conversations. Treat AI sessions as stateless even when they claim memory.

Constrain scope explicitly. Narrow tasks produce better results than broad instructions. "Implement the authentication middleware" beats "add authentication to the app."

Validate before trusting. AI-generated code needs the same review as human code—maybe more. The model doesn't know your edge cases.

Design for handoffs. Sessions should have clear entry and exit points. What state did we start in? What state are we leaving? Document both.

Developer productivity with AI isn't about prompting technique or model selection. It's about workflow design.

The tools are powerful. The magic is in how you use them.

Real productivity gains. Not from better prompts. From better process.

Using AI for development work? Reply with what's working—and what isn't. We're always learning.