BMAD vs. OpenSpec vs. Spec Kit: A CxO's Field Guide to Spec-Driven AI Development
An independent perspective — June 2026.
The 60-Second Version (for the executive who only reads this)
Through 2025, "vibe coding" — letting an AI agent improvise code from a chat prompt — produced fast demos and slow disasters. The fix that the industry converged on in 2026 is Spec-Driven Development (SDD): you write a durable specification first, and the spec — not the chat history — becomes the source of truth the AI builds against.
Three frameworks dominate serious engineering conversations: BMAD, OpenSpec, and GitHub Spec Kit. A fourth, GSD ("Get Shit Done"), sits one notch above raw vibe coding as a lightweight, disciplined alternative. None of them is "best." Each optimizes for a different point on the trade-off curve between rigor and speed.
The CxO takeaway: don't standardize your whole organization on one framework. Match the framework to the project's risk, team size, and compliance exposure — and expect to run two or three across your portfolio.
What "Spec-Driven" Actually Buys You
It is tempting to frame Spec-Driven Development as a fix for one problem — the AI forgetting what you told it. That undersells it badly. A written, versioned specification becomes a durable asset that the chat window can never be, and once you have that asset, a long list of organizational benefits follow that have little to do with the AI at all.
Start with the three failures that sank early AI coding, all of which SDD closes outright:
- Context survives. The spec carries intent across sessions, across models, and across people. Swap Claude for GPT, or rotate a developer off the project, and the source of truth is untouched.
- Drift disappears. Every change traces back to a versioned document instead of a forgotten Slack thread or a half-remembered hallway conversation, so the system that ships is the system you agreed to build.
- Decisions are auditable. When QA, security, or a regulator asks why something was built a certain way, the spec is the paper trail — not a reconstruction after the fact.
But the deeper value shows up across the whole delivery organization:
- Governance and risk control. AI-generated code stops being an ungovernable black box and takes on the same review, sign-off, and traceability posture as traditional software delivery — the single biggest reason a CxO can let agents anywhere near production.
- Predictability and estimation. A spec broken into discrete, scoped tasks is something you can size, sequence, and forecast. "We don't know how long the AI will take" becomes a planned backlog.
- Quality by construction. Acceptance criteria written before code exists give both the agent and your QA team an objective definition of done, which cuts the rework loop that quietly devours AI-coding budgets.
- Parallelism and throughput. Once work is decomposed into well-specified units, multiple agents — or multiple engineers — can execute in parallel without stepping on each other, compressing delivery timelines.
- Knowledge retention and lower key-person risk. The reasoning behind the system lives in documents, not in one senior engineer's head. New hires onboard against the spec, and a departure stops being a crisis.
- Vendor and model independence. Because specs are just markdown and structured data, they are portable. You are never locked into a single IDE, model provider, or framework — you can move the spec and re-run it elsewhere.
- Cost discipline. Knowing scope up front lets you choose the right model for each task and avoid the open-ended token burn of an agent improvising its way through an ambiguous request.
For a CxO, the governance and risk point is the one that unlocks everything else — but the lasting payoff is that SDD converts AI coding from a clever individual productivity trick into a repeatable, auditable, team-scale engineering discipline.
The Four Approaches at a Glance
Read left to right as a dial from "maximum speed, minimum control" to "maximum control, maximum overhead." Most organizations need more than one setting.
Framework Profiles
BMAD — the full simulated software team
BMAD (Breakthrough Method for Agile AI-Driven Development) is the most architecturally ambitious option. It simulates an entire agile team using named, role-scoped AI personas — Analyst, Product Manager, Architect, Product Owner, Developer, QA — each producing a versioned artifact (PRD, architecture doc, sprint stories) before the next picks up the work. As of mid-2026 it sits near 49,000 GitHub stars, is MIT-licensed and free, and ships near-daily, with a V6 line that runs across Claude Code, Cursor, Codex, Copilot, and Windsurf and a new "Skills" module architecture.
Best at: complex, net-new ("greenfield") platforms where being almost right is expensive; teams scaling from a handful to dozens of engineers who benefit from documentation-as-onboarding; and regulated work, where the PRDs and architecture docs double as compliance evidence.
Where it breaks: small jobs. A four-hour bug fix should not generate a PRD and a sprint story. It is the most token-hungry option (reported real-world ranges of roughly $800–$2,000+ per developer per month in frontier-model API costs, with outlier weeks far higher), and it shows friction on messy legacy codebases despite a dedicated brownfield mode.
OpenSpec — the minimalist for legacy systems
OpenSpec is the lean option, sitting around 52,000 GitHub stars by mid-2026. Instead of documenting an entire system up front, it uses delta specs — you describe only what's changing. Completed changes archive into a growing source-of-truth document, so the spec evolves alongside the code. A strict three-phase state machine (propose → apply → archive) keeps it disciplined without ceremony, and an AGENTS.md "README for robots" lets even AI tools without native OpenSpec support follow the workflow.
Best at: brownfield modernization and legacy refactors, where heavyweight frameworks bog down trying to document a ten-year-old monolith; and speed-first teams that want governance without overhead.
Where it breaks: when you genuinely need explicit role handoffs (PM → Architect → Dev), OpenSpec's leanness feels like missing scaffolding rather than welcome simplicity.
GitHub Spec Kit — the safe default for a scaling team
Spec Kit has by far the strongest distribution story — over 110,000 GitHub stars by mid-2026, helped by GitHub's reach, and templates for 30+ AI agents. Its workflow is a four-phase loop (specify → plan → tasks → implement), and its signature feature is the constitution: a project-wide ruleset every spec inherits, so conventions are written once instead of re-typed into every prompt.
Best at: standardizing AI output quality across an existing team, and medium-sized features where rigor matters but you don't need a full simulated org chart.
Where it breaks: setup is opinionated and front-loaded ("a lot of questions"), and it rewards investing real time in a strong constitution. It also moves fast — a recent release removed an entire flag family, breaking older tutorials and scripts, so teams must track upstream changes.
GSD — disciplined speed, one rung above vibe coding
GSD ("Get Shit Done") is a lightweight meta-prompting and context-engineering system built primarily on Claude Code, around 59,000 stars by mid-2026. Its core trick addresses context rot — the quality decay as an agent fills its context window — by spawning a fresh subagent with a clean context for each task, so task 50 is as sharp as task 1. The workflow is essentially two prompts: a planning prompt that does gap analysis and builds a prioritized TODO list, and a building prompt that implements and runs tests in a loop until green.
Best at: solo developers and small teams already living inside Claude Code; fast iteration where requirements are still fluid; and maintenance-mode work after a big build ships.
Where it breaks: limited multi-agent orchestration. If you need real role specialization, GSD's leanness becomes a ceiling.
Comparison Table: The Three Spec Frameworks
| Dimension | BMAD | OpenSpec | GitHub Spec Kit |
|---|---|---|---|
| Core model | Simulated 12+ agent agile team | Delta specs (only what changes) | Constitution + 4-phase loop |
| Workflow | Analyst → PM → Architect → Dev → QA | Propose → Apply → Archive | Specify → Plan → Tasks → Implement |
| Maturity (GitHub stars, mid-2026) | ~49k | ~52k | ~111k |
| Sweet spot | Complex greenfield, regulated | Brownfield / legacy refactor | Scaling team standardization |
| Documentation output | Heavy (audit-grade) | Light, evolves with code | Medium, convention-driven |
| Relative running cost | High ($800–$2,000+/dev/mo) | Low | Low–Medium |
| Speed on small tasks | Slow | Fast | Medium |
| Brownfield fit | Fair (dedicated mode, some friction) | Excellent | Good |
| Compliance evidence | Strongest | Weak by default | Moderate |
| Setup effort | High | Low | Medium–High |
| Tooling portability | Multi-IDE / multi-model | CLI-agnostic | 30+ agents |
A note on the numbers: GitHub star counts and cost figures move fast and vary by project size and model pricing. Treat them as directional, not precise. The relative ordering — BMAD heaviest and priciest, OpenSpec and GSD lightest and cheapest — has held steady across sources.
Spec-Driven vs. Vibe Coding vs. GSD
This is the comparison most CxOs actually need, because it frames the real decision: how much process is worth it?
| Vibe Coding | GSD | Spec-Driven (BMAD / OpenSpec / Spec Kit) | |
|---|---|---|---|
| Philosophy | Improvise from a prompt | Light spec + fresh-context discipline | Spec is the contract |
| Speed to first output | Fastest | Fast | Slower (planning up front) |
| Speed at week 6+ | Collapses | Sustains | Sustains |
| Code consistency | Low | Medium–High | High |
| Auditability | None | Limited | Strong |
| Onboarding new devs | Painful | Moderate | Documentation does the work |
| Token / API cost | Lowest | Low | Medium–High (BMAD highest) |
| Best use | Throwaway prototypes, spikes | Solo/small, fluid requirements | Production, teams, compliance |
| Main risk | Drift, rework, no paper trail | Limited role specialization | Overhead on small work |
The honest pros and cons of vibe coding: it is unmatched for a weekend prototype, a proof-of-concept to win buy-in, or exploring an unfamiliar API. Its cost is that everything it produces is effectively disposable — projects start fast and stall within weeks as inconsistent code and fragile architecture accumulate, with no audit trail when something breaks. GSD is the pragmatic middle: most of vibe coding's speed, but with enough structure (a plan, a test loop, fresh contexts) to keep quality from rotting. For anything headed to production with more than one engineer touching it, full SDD pays for itself.
The Decision: Which Framework, When
| Scenario | Team size | Project type | Compliance | Recommended |
|---|---|---|---|---|
| Solo Developer | 1 | Anything small | Negligible | GSD — the lean two-prompt loop keeps you moving |
| Regulated (banks, FinTech, healthcare) | Any | Either | Heavily governed (SOC 2, HIPAA, HKMA, MAS & APRA banking supervision) | BMAD — its artifacts stand in as your audit trail |
| Startup to MVP | 1–3 | Greenfield | Light | OpenSpec, or GSD for the quickest start |
| Funded startup, scaling | 4–15 | Greenfield product | Moderate | GitHub Spec Kit — locks in consistent quality as the team grows |
| Complex enterprise platform | 10+ | New big-bet build | Elevated | BMAD — full planning rigor for high-stakes builds |
| Brownfield modernization | Any | Legacy refactor | Moderate | OpenSpec lead, with BMAD brownfield mode where role handoffs matter |
Five Questions to Ask Before You Choose
- Greenfield or brownfield? Legacy tilts to OpenSpec; net-new opens up BMAD and Spec Kit.
- How big is the team, and does it need explicit roles? Solo → GSD/OpenSpec. Scaling → Spec Kit. Multi-team → BMAD.
- What are the compliance requirements? SOC 2, HIPAA, or the EU AI Act push you toward BMAD's audit-friendly artifacts.
- What's the token budget? BMAD is the most expensive to run; OpenSpec and GSD the cheapest. If $2,000/dev/month is a non-starter, that alone filters the list.
- How locked-in are you to one IDE or cloud? Multi-IDE/multi-model favors BMAD, Spec Kit, or OpenSpec, which all stay portable.
Migration Is Cheaper Than You Think
Because these are methodologies producing markdown and JSON — not deeply coupled platforms — specs port across tools with light reformatting. The common paths:
| Starting framework | Move to | What triggers the move | How the spec carries over |
|---|---|---|---|
| OpenSpec | BMAD | A brownfield project succeeds and net-new feature work begins | The archived spec becomes BMAD's Architect input document |
| Spec Kit | BMAD | A startup adds its first PM and needs explicit role separation | The Spec Kit constitution maps onto BMAD's master prompts |
| BMAD | GSD | A shipped project enters maintenance mode | Lean two-prompt loop takes over the bug-fix cadence; artifacts stay as reference |
The strategic implication for a CxO: you are not making a one-way door decision. Start lighter than you think you need, and graduate frameworks as the project's risk profile changes.
Bottom Line
The teams shipping best in 2026 are not the ones who picked the "right" framework. They're the ones who picked the appropriate framework for each project — and changed it when the project changed. BMAD for complex, regulated, greenfield work. OpenSpec for brownfield. Spec Kit as the safe default for a scaling team. GSD as the disciplined floor above vibe coding. And vibe coding itself reserved for the throwaway prototypes it's actually good at.