BMAD vs. OpenSpec vs. Spec Kit: A CxO's Field Guide to Spec-Driven AI Development

June 10, 2026Portia Canlas, FDE10 min readMethodology

An independent perspective — June 2026.

The 60-Second Version (for the executive who only reads this)

Through 2025, "vibe coding" — letting an AI agent improvise code from a chat prompt — produced fast demos and slow disasters. The fix that the industry converged on in 2026 is Spec-Driven Development (SDD): you write a durable specification first, and the spec — not the chat history — becomes the source of truth the AI builds against.

Three frameworks dominate serious engineering conversations: BMAD, OpenSpec, and GitHub Spec Kit. A fourth category — the lightweight Claude Code methodologies GSD ("Get Shit Done") and Superpowers — sits one notch above raw vibe coding as a disciplined alternative. None of them is "best." Each optimizes for a different point on the trade-off curve between rigor and speed.

The CxO takeaway: don't standardize your whole organization on one framework. Match the framework to the project's risk, team size, and compliance exposure — and expect to run two or three across your portfolio.

What "Spec-Driven" Actually Buys You

It is tempting to frame Spec-Driven Development as a fix for one problem — the AI forgetting what you told it. That undersells it badly. A written, versioned specification becomes a durable asset that the chat window can never be, and once you have that asset, a long list of organizational benefits follow that have little to do with the AI at all.

Start with the three failures that sank early AI coding, all of which SDD closes outright:

Context survives. The spec carries intent across sessions, across models, and across people. Swap Claude for GPT, or rotate a developer off the project, and the source of truth is untouched.
Drift disappears. Every change traces back to a versioned document instead of a forgotten Slack thread or a half-remembered hallway conversation, so the system that ships is the system you agreed to build.
Decisions are auditable. When QA, security, or a regulator asks why something was built a certain way, the spec is the paper trail — not a reconstruction after the fact.

But the deeper value shows up across the whole delivery organization:

Governance and risk control. AI-generated code stops being an ungovernable black box and takes on the same review, sign-off, and traceability posture as traditional software delivery — the single biggest reason a CxO can let agents anywhere near production.
Predictability and estimation. A spec broken into discrete, scoped tasks is something you can size, sequence, and forecast. "We don't know how long the AI will take" becomes a planned backlog.
Quality by construction. Acceptance criteria written before code exists give both the agent and your QA team an objective definition of done, which cuts the rework loop that quietly devours AI-coding budgets.
Parallelism and throughput. Once work is decomposed into well-specified units, multiple agents — or multiple engineers — can execute in parallel without stepping on each other, compressing delivery timelines.
Knowledge retention and lower key-person risk. The reasoning behind the system lives in documents, not in one senior engineer's head. New hires onboard against the spec, and a departure stops being a crisis.
Vendor and model independence. Because specs are just markdown and structured data, they are portable. You are never locked into a single IDE, model provider, or framework — you can move the spec and re-run it elsewhere.
Cost discipline. Knowing scope up front lets you choose the right model for each task and avoid the open-ended token burn of an agent improvising its way through an ambiguous request.

For a CxO, the governance and risk point is the one that unlocks everything else — but the lasting payoff is that SDD converts AI coding from a clever individual productivity trick into a repeatable, auditable, team-scale engineering discipline.

The Four Approaches at a Glance

A spectrum of five AI coding approaches arranged from maximum speed to maximum control: Vibe coding (no spec), GSD and Superpowers (light spec), OpenSpec (delta specs), Spec Kit (constitution plus four phases), and BMAD (simulated agile team).

Read left to right as a dial from "maximum speed, minimum control" to "maximum control, maximum overhead." Most organizations need more than one setting.

Framework Profiles

BMAD — the full simulated software team

BMAD (Breakthrough Method for Agile AI-Driven Development) is the most architecturally ambitious option. It simulates an entire agile team using named, role-scoped AI personas — Analyst, Product Manager, Architect, Product Owner, Developer, QA — each producing a versioned artifact (PRD, architecture doc, sprint stories) before the next picks up the work. As of mid-2026 it sits near 49,000 GitHub stars, is MIT-licensed and free, and ships near-daily, with a V6 line that runs across Claude Code, Cursor, Codex, Copilot, and Windsurf and a new "Skills" module architecture.

Best at: complex, net-new ("greenfield") platforms where being almost right is expensive; teams scaling from a handful to dozens of engineers who benefit from documentation-as-onboarding; and regulated work, where the PRDs and architecture docs double as compliance evidence.

Where it breaks: small jobs. A four-hour bug fix should not generate a PRD and a sprint story. It is the most token-hungry option (reported real-world ranges of roughly $800–$2,000+ per developer per month in frontier-model API costs, with outlier weeks far higher), and it shows friction on messy legacy codebases despite a dedicated brownfield mode.

OpenSpec — the minimalist for legacy systems

OpenSpec is the lean option, sitting around 52,000 GitHub stars by mid-2026. Instead of documenting an entire system up front, it uses delta specs — you describe only what's changing. Completed changes archive into a growing source-of-truth document, so the spec evolves alongside the code. A strict three-phase state machine (propose → apply → archive) keeps it disciplined without ceremony, and an AGENTS.md "README for robots" lets even AI tools without native OpenSpec support follow the workflow.

Best at: brownfield modernization and legacy refactors, where heavyweight frameworks bog down trying to document a ten-year-old monolith; and speed-first teams that want governance without overhead.

Where it breaks: when you genuinely need explicit role handoffs (PM → Architect → Dev), OpenSpec's leanness feels like missing scaffolding rather than welcome simplicity.

GitHub Spec Kit — the safe default for a scaling team

Spec Kit has by far the strongest distribution story — over 110,000 GitHub stars by mid-2026, helped by GitHub's reach, and templates for 30+ AI agents. Its workflow is a four-phase loop (specify → plan → tasks → implement), and its signature feature is the constitution: a project-wide ruleset every spec inherits, so conventions are written once instead of re-typed into every prompt.

Best at: standardizing AI output quality across an existing team, and medium-sized features where rigor matters but you don't need a full simulated org chart.

Where it breaks: setup is opinionated and front-loaded ("a lot of questions"), and it rewards investing real time in a strong constitution. It also moves fast — a recent release removed an entire flag family, breaking older tutorials and scripts, so teams must track upstream changes.

GSD & Superpowers — disciplined speed, one rung above vibe coding

These two are the lightweight, Claude Code–native end of the spectrum: not full specs, but enough structure to keep a fast workflow from rotting. We treat them as a single line item because they solve the same problem from slightly different angles, and plenty of teams run them together.

GSD ("Get Shit Done") is a meta-prompting and context-engineering system, around 59,000 GitHub stars by mid-2026. Its core trick addresses context rot — the quality decay as an agent fills its context window — by spawning a fresh subagent with a clean context for each task, so task 50 is as sharp as task 1. The workflow is essentially two prompts: a planning prompt that does gap analysis and builds a prioritized TODO list, and a building prompt that implements and runs tests in a loop until green.

Superpowers (by Jesse Vincent) is an agentic skills framework — among the most-installed plugins in the Claude Code ecosystem, near 93,000 stars by mid-2026. Rather than jump straight to code, it teases a spec out of the conversation, shows you the design in digestible chunks to sign off on, then drives an implementation plan built on real red/green TDD, YAGNI, and DRY, using subagents and a code-review pass. It is the most opinionated way to make a vanilla Claude Code agent behave like a disciplined engineer without standing up a full spec framework.

Best at: solo developers and small teams already living inside Claude Code; fast iteration where requirements are still fluid; and maintenance-mode work after a big build ships. Superpowers leans toward enforced test-and-review discipline; GSD toward raw throughput on a clean context.

Where it breaks: limited multi-agent orchestration and role specialization. If you need explicit PM → Architect → Dev handoffs or audit-grade artifacts, their leanness becomes a ceiling — that's where Spec Kit and BMAD take over.

Comparison Table: The Three Spec Frameworks

Dimension	BMAD	OpenSpec	GitHub Spec Kit
Core model	Simulated 12+ agent agile team	Delta specs (only what changes)	Constitution + 4-phase loop
Workflow	Analyst → PM → Architect → Dev → QA	Propose → Apply → Archive	Specify → Plan → Tasks → Implement
Maturity (GitHub stars, mid-2026)	~49k	~52k	~111k
Sweet spot	Complex greenfield, regulated	Brownfield / legacy refactor	Scaling team standardization
Documentation output	Heavy (audit-grade)	Light, evolves with code	Medium, convention-driven
Relative running cost	High ($800–$2,000+/dev/mo)	Low	Low–Medium
Speed on small tasks	Slow	Fast	Medium
Brownfield fit	Fair (dedicated mode, some friction)	Excellent	Good
Compliance evidence	Strongest	Weak by default	Moderate
Setup effort	High	Low	Medium–High
Tooling portability	Multi-IDE / multi-model	CLI-agnostic	30+ agents

A note on the numbers: GitHub star counts and cost figures move fast and vary by project size and model pricing. Treat them as directional, not precise. The relative ordering — BMAD heaviest and priciest, OpenSpec, GSD, and Superpowers lightest and cheapest — has held steady across sources.

Spec-Driven vs. Vibe Coding vs. GSD & Superpowers

This is the comparison most CxOs actually need, because it frames the real decision: how much process is worth it?

	Vibe Coding	GSD & Superpowers	Spec-Driven (BMAD / OpenSpec / Spec Kit)
Philosophy	Improvise from a prompt	Light spec + fresh-context discipline	Spec is the contract
Speed to first output	Fastest	Fast	Slower (planning up front)
Speed at week 6+	Collapses	Sustains	Sustains
Code consistency	Low	Medium–High	High
Auditability	None	Limited	Strong
Onboarding new devs	Painful	Moderate	Documentation does the work
Token / API cost	Lowest	Low	Medium–High (BMAD highest)
Best use	Throwaway prototypes, spikes	Solo/small, fluid requirements	Production, teams, compliance
Main risk	Drift, rework, no paper trail	Limited role specialization	Overhead on small work

The honest pros and cons of vibe coding: it is unmatched for a weekend prototype, a proof-of-concept to win buy-in, or exploring an unfamiliar API. Its cost is that everything it produces is effectively disposable — projects start fast and stall within weeks as inconsistent code and fragile architecture accumulate, with no audit trail when something breaks. GSD and Superpowers are the pragmatic middle: most of vibe coding's speed, but with enough structure (a plan, a test loop, fresh contexts, enforced TDD and review) to keep quality from rotting. For anything headed to production with more than one engineer touching it, full SDD pays for itself.

The Decision: Which Framework, When

Scenario	Team size	Project type	Compliance	Recommended
Solo Developer	1	Anything small	Negligible	GSD or Superpowers — lean Claude Code workflows keep you moving
Regulated (banks, FinTech, healthcare)	Any	Either	Heavily governed (SOC 2, HIPAA, HKMA, MAS & APRA banking supervision)	BMAD — its artifacts stand in as your audit trail
Startup to MVP	1–3	Greenfield	Light	OpenSpec, or GSD / Superpowers for the quickest start
Funded startup, scaling	4–15	Greenfield product	Moderate	GitHub Spec Kit — locks in consistent quality as the team grows
Complex enterprise platform	10+	New big-bet build	Elevated	BMAD — full planning rigor for high-stakes builds
Brownfield modernization	Any	Legacy refactor	Moderate	OpenSpec lead, with BMAD brownfield mode where role handoffs matter

Five Questions to Ask Before You Choose

Greenfield or brownfield? Legacy tilts to OpenSpec; net-new opens up BMAD and Spec Kit.
How big is the team, and does it need explicit roles? Solo → GSD / Superpowers / OpenSpec. Scaling → Spec Kit. Multi-team → BMAD.
What are the compliance requirements? SOC 2, HIPAA, or the EU AI Act push you toward BMAD's audit-friendly artifacts.
What's the token budget? BMAD is the most expensive to run; OpenSpec, GSD, and Superpowers the cheapest. If $2,000/dev/month is a non-starter, that alone filters the list.
How locked-in are you to one IDE or cloud? Multi-IDE/multi-model favors BMAD, Spec Kit, or OpenSpec, which all stay portable.

Migration Is Cheaper Than You Think

Because these are methodologies producing markdown and JSON — not deeply coupled platforms — specs port across tools with light reformatting. The common paths:

Starting framework	Move to	What triggers the move	How the spec carries over
OpenSpec	BMAD	A brownfield project succeeds and net-new feature work begins	The archived spec becomes BMAD's Architect input document
Spec Kit	BMAD	A startup adds its first PM and needs explicit role separation	The Spec Kit constitution maps onto BMAD's master prompts
BMAD	GSD / Superpowers	A shipped project enters maintenance mode	A lean Claude Code workflow takes over the bug-fix cadence; artifacts stay as reference

The strategic implication for a CxO: you are not making a one-way door decision. Start lighter than you think you need, and graduate frameworks as the project's risk profile changes.

Bottom Line

The teams shipping best in 2026 are not the ones who picked the "right" framework. They're the ones who picked the appropriate framework for each project — and changed it when the project changed. BMAD for complex, regulated, greenfield work. OpenSpec for brownfield. Spec Kit as the safe default for a scaling team. GSD and Superpowers as the disciplined floor above vibe coding. And vibe coding itself reserved for the throwaway prototypes it's actually good at.