The 8-Agent SDLC: A Complete Walkthrough

I built an AI engineering team. Eight specialised agents, each handling a different phase of the software development lifecycle. Here’s exactly how it works.

This isn’t a theoretical framework or a pitch deck concept. It’s the actual system I use to ship software—the same one that produced the “sprint in half an hour” moment I described in my previous post.

By the end of this piece, you’ll understand the full architecture: what each agent does, how they work together, where the human stays in the loop, and what it takes to build something like this yourself.

Why Multi-Agent?

The obvious question: why not just use one AI? Why build eight specialised agents when you could give a single powerful model all the context and let it handle everything?

I tried that first. It doesn’t work—or at least, it doesn’t work well enough.

Single prompts hit limits. Ask an AI to “build me a todo app” and you get… something. But it’s missing requirements clarity, architectural decisions, test coverage, deployment configuration, and security review. The AI optimises for answering your immediate request, not for building production-quality software.

Specialisation beats generalisation. A single prompt trying to handle requirements AND architecture AND implementation AND testing will do all of them poorly. Each phase has its own best practices, quality standards, and failure modes. An agent specialised for testing thinks about edge cases, coverage strategy, and test pyramid design in a way that a generalist agent doesn’t.

It mirrors how real teams work. Software teams have specialists for a reason: PMs focus on requirements, architects focus on system design, developers focus on implementation, QA focuses on testing. The separation of concerns isn’t just organisational convenience—it produces better outcomes because each person can go deep in their domain.

The multi-agent approach brings this same principle to AI-assisted development. Each agent goes deep in its domain instead of being stretched thin across everything.

The Architecture

Here’s the system:

SDLC-07: Chief of Staff (Orchestration)
    │
    │   "The consigliere - coordinates everything"
    │
    ├── SDLC-01: Product Requirements
    │   "What are we building and why?"
    │
    ├── SDLC-02: Design & Architecture
    │   "How should it be structured?"
    │
    ├── SDLC-02b: Frontend Design
    │   "What should it look like?"
    │
    ├── SDLC-03: Build
    │   "Let's write the code"
    │
    ├── SDLC-04: Quality & Testing
    │   "Does it work correctly?"
    │
    ├── SDLC-05: Ship & Operate
    │   "Get it running in production"
    │
    └── SDLC-06: Security
        "Is it safe?"

Eight agents in total. Seven specialists, one orchestrator.

The key insight: SDLC-07 (Chief of Staff) isn’t just another specialist. It’s the coordination layer that makes the whole system work. Without it, you just have six disconnected tools.

SDLC-07: The Chief of Staff

I call it the “consigliere” because that captures its role better than “orchestrator” or “coordinator.” It’s not just dispatching tasks—it’s providing judgment, catching problems, and making sure nothing falls through the cracks.

What the Chief of Staff does:

Dispatches specialised agents. Based on what phase of work you’re in, it invokes the right specialist. “We need a PRD” → SDLC-01. “Let’s think through the architecture” → SDLC-02. “Time to implement” → SDLC-03.
Reviews what comes back. AI agents don’t always complete tasks fully. They miss edge cases, make assumptions you didn’t want, or generate outputs that don’t quite fit. The COS reviews specialist outputs and flags gaps.
Manages handoffs. When SDLC-01 (Requirements) finishes, that output needs to flow into SDLC-02 (Architecture). When SDLC-02 finishes, SDLC-03 (Build) needs the architecture decisions as context. The COS ensures each phase has what it needs from previous phases.
Maintains project state. There’s a single file—PROJECT-STATE.md—that captures the current status of the project: what’s been decided, what’s been built, what’s still pending. The COS keeps this updated as work progresses.
Works with the human. The COS is my primary interface to the system. I tell it what I want to accomplish, it figures out which specialists to invoke, and it brings me back outputs for review. When there are decisions to make or ambiguities to resolve, it escalates to me.

Why this matters: Without the coordination layer, you’re manually switching between different AI “modes,” copying context between conversations, and tracking progress yourself. The COS handles that cognitive overhead, letting you focus on the actual decisions.

The Specialists

SDLC-01: Product Requirements

Before any code gets written, you need to know what you’re building and why.

SDLC-01 handles:

PRFAQ (Press Release / FAQ): Amazon-style “working backwards” document that clarifies the customer problem and proposed solution before any implementation.
PRD (Product Requirements Document): Detailed requirements with goals, non-goals, success metrics, and scope boundaries.
User Stories: Individual features broken down with acceptance criteria.
Scope Management: Distinguishing MVP requirements from future enhancements.

Output: Clear requirements documents that serve as the spec for everything downstream.

The quality of requirements determines everything else. Vague requirements produce vague implementations. This is the most leverage-heavy phase of the whole system.

SDLC-02: Design & Architecture

Once you know what you’re building, you need to figure out how it should be structured.

SDLC-02 handles:

System Architecture: High-level design with component diagrams (usually as Mermaid diagrams that render in markdown).
ADRs (Architecture Decision Records): Documenting key technical decisions with context, alternatives considered, and rationale. These are gold for future maintenance.
Technical Specifications: Detailed specs for complex features—API contracts, data models, integration points.
Data Models: Schema design, entity relationships, data flow documentation.

Output: Technical design docs that guide implementation.

I’ve found ADRs particularly valuable. Six months from now, when you’re wondering “why did we do it this way?”, the ADR has the answer.

SDLC-02b: Frontend Design

This handles the user-facing design work—separate from backend architecture because it requires different thinking.

SDLC-02b handles:

User Flows: How users move through the application to accomplish their goals.
Wireframes & Mockups: Visual representations of screens and interactions.
Design System: Tokens (colours, typography, spacing), component definitions, patterns for consistency.
Developer Handoff: Specifications that translate design intent into implementable requirements.

Output: Design specs that the build phase can work from.

Not every project needs heavy frontend design work. For backend-only projects, this agent is mostly dormant.

SDLC-03: Build

The implementation phase. Actual code.

SDLC-03 handles:

Implementation: Writing code that follows the requirements and architecture decisions.
Code Quality: Following SOLID principles, appropriate patterns, clean structure.
Security During Build: The security checklist runs here too—no secrets in code, input validation, proper authentication patterns.
PR Descriptions: When code is ready for review, generating clear descriptions of what changed and why.

Output: Working code plus documentation.

This is the phase most people think about when they hear “AI-assisted development.” But it’s only one of seven pieces—and arguably not the most important one.

SDLC-04: Quality & Testing

Does the code actually work?

SDLC-04 handles:

Test Pyramid: Unit tests (lots of them, fast, focused), integration tests (API + database together), E2E tests (critical user flows only).
Risk-Based Testing: Critical paths (auth, payments, data exports) get thorough testing. Supporting features get lighter coverage. Not everything deserves the same investment.
AI-Assisted Test Generation: The agent can generate tests, but I review them to make sure they’re actually validating what matters.
Quality Reports: Client-facing documentation of test coverage and results.

Output: Test suites plus quality documentation.

The test pyramid is real. I’ve seen too many projects with E2E-heavy, unit-light test suites that are slow, flaky, and don’t catch bugs. The pyramid gets it right: lots of fast unit tests, fewer integration tests, even fewer E2E tests.

SDLC-05: Ship & Operate

Getting code into production and keeping it running.

SDLC-05 handles:

Deploy Scripts: The actual mechanism for shipping code. For my setup: test.sh (run before any deploy) → build → push → deploy → verify health check.
Client Version Registry: Tracking which version is deployed where. Essential when you have multiple environments or clients.
Rollback Procedures: When something goes wrong, how to get back to the previous working state.
Deployment Reports: Client-facing documentation of what was deployed and when.

Output: Deployment automation and operational documentation.

My deploy philosophy: test before deploy, rollback first if something breaks (investigate later), log everything.

SDLC-06: Security

Security isn’t a phase at the end. It runs parallel to everything.

SDLC-06 handles:

Threat Modelling: STRIDE analysis to identify potential attack vectors.
OWASP Top 10 Review: Checking code against common web security vulnerabilities.
Security Requirements: Defining what security controls are needed based on the data and operations involved.
Security Assessment: Output documentation of security posture.

Output: Security analysis and requirements.

Security runs throughout the lifecycle, not just at the end. The build phase includes security checks. The architecture phase includes security considerations. SDLC-06 provides the security-specific perspective that reinforces these checks.

How They Work Together

The typical flow:

Human: "I want to build a feature that lets users export their data"
    │
    ├── COS invokes SDLC-01 → PRD with export requirements, success criteria
    │
    ├── COS invokes SDLC-06 (parallel) → Security requirements for data export
    │
    ├── COS invokes SDLC-02 → Architecture: API design, file format decisions
    │
    ├── COS invokes SDLC-02b (if UI needed) → Export button placement, flow design
    │
    ├── COS invokes SDLC-03 → Implementation of export feature
    │
    ├── COS invokes SDLC-04 → Tests for export functionality
    │
    └── COS invokes SDLC-05 → Deploy and verify

Key points:

Sequential where dependencies exist. Can’t build without requirements. Can’t test without code.
Parallel where possible. Security analysis can run alongside requirements gathering.
Iteration loops. If SDLC-04 (testing) reveals a bug, back to SDLC-03 (build). If implementation reveals a requirements gap, back to SDLC-01.
Human touchpoints. I review outputs at each phase. The COS escalates decisions to me. I’m not just watching—I’m actively steering.

What This Enables

Sprint-level work in hours. A feature that would take a small team a week or two—requirements, design, implementation, testing, deployment—can be executed in hours.

Full documentation as byproduct. The system produces PRDs, ADRs, technical specs, and test documentation as part of the natural workflow. Documentation isn’t a separate task that gets skipped—it’s built into the process.

Consistent quality. Each phase has its standards. The testing agent applies the same test pyramid thinking every time. The architecture agent considers the same concerns. Quality becomes systematic rather than dependent on who’s working that day.

One person, full SDLC coverage. A single person can maintain oversight of the entire lifecycle without being a specialist in every area. The specialists handle the depth; the human provides the judgment and direction.

Limitations and Lessons

This system isn’t magic. Some honest notes on what doesn’t work:

Agents don’t always complete tasks. They make assumptions, skip edge cases, produce outputs that aren’t quite right. The COS catches some of this, but human review is essential. I don’t trust any output without reviewing it.

Context limits require chunking. Large features need to be broken into smaller pieces that fit within context windows. The COS helps manage this, but you can’t feed the entire codebase and expect coherent output.

Human review is still essential. The system accelerates my work; it doesn’t replace my judgment. Every PRD gets reviewed. Every architecture decision gets validated. Every piece of code gets inspected. The AI proposes; I decide.

What I got wrong in v1: My first version tried to have the COS do too much. It was both coordinating AND doing specialist work. Separating the orchestration role from the specialist roles made everything cleaner.

Can You Replicate This?

Yes. The components are:

Claude Code as the runtime environment
Claude Opus 4.5 as the underlying model
Skills files that define each agent’s role, capabilities, and standards
A project structure with clear documentation conventions

The skills files are essentially detailed prompts that tell each agent what it’s responsible for, how it should approach its work, and what quality standards apply.

Building your first version might take a few weeks of iteration. The concepts aren’t complicated—the work is in defining the right standards and patterns for each phase.

What’s Next

This post is the overview. The system is more nuanced than what fits in one article.

Coming up: deep dives into specific agents—starting with the four levels of AI-assisted development, because understanding where you are on the maturity curve determines how you should be working.

This is Part 2 of a series on AI-assisted software development. Previously: The Technical PM’s Moment. Next: The 4 Levels of AI-Assisted Development.