From Vague Prompt to Executable Spec: BDD and TDD in the Age of AI-Driven Development
← Back to Articles

From Vague Prompt to Executable Spec: BDD and TDD in the Age of AI-Driven Development

TL;DR — Generative AI produces code that does exactly what you ask. The problem is that what you ask is rarely what you need. Vague instructions work for most cases — simple modules, isolated scopes, obvious behavior. But when complexity involves state interactions, boundary conditions, and temporal behaviors, natural language ambiguity takes its toll. BDD (Given/When/Then) and TDD aren’t overhead when working with AI. They’re the difference between generating code fast and generating correct code fast.


The Promise and the Trap

Generative AI tools have made it possible to produce hundreds — sometimes thousands — of lines of functional code in minutes. And most of the time, it works. Isolated modules, simple logic, CRUD: AI delivers fast and well.

The problem appears when complexity is subtle. When behavior depends on state, on timing, on boundary conditions that don’t fit in a two-line instruction. In these cases, the AI doesn’t get it wrong — it implements exactly what you asked. And what you asked was incomplete.

This post is about how BDD and TDD transform AI code generation results — not as theoretical practices, but as practical tools that change output quality.


The Easy 80%

When the instruction is clear and the scope is limited, AI works surprisingly well. Modules with single responsibility, well-defined interfaces, and predictable behavior come out nearly ready on the first attempt.

Examples of what worked with simple instructions:

  • “Create a cache module with TTL and eviction” — clean implementation, worked first try
  • “Add retry with exponential backoff” — correct logic, no bugs
  • “Implement user settings persistence” — correct and idiomatic code

In these cases, natural language description was sufficient because the scope was small, the behavior was obvious, and there was no complex interaction between components.

AI generates code that does exactly what you ask. The problem is that what you ask is rarely what you need.


The 20% That Costs 80% of the Time

Problems started when complexity involved state interactions, boundary conditions, and temporal behaviors. These are exactly the scenarios where natural language is ambiguous — and where AI interprets ambiguity as literally as possible.

Case 1: Time-windowed processing

I asked for “time-windowed processing” and the code did exactly that — but recalculated the window on every execution cycle, instead of respecting the current phase. Result: unstable behavior. The behavior I wanted was:

GIVEN the process has been running for X seconds in the current phase
WHEN the system recalculates the duty cycle
THEN the process is only interrupted IF the execution time exceeded the new calculated value
AND once interrupted in this phase, it does NOT restart until the next phase

This specification would have eliminated the ambiguity. Without it, the AI implemented the most literal — and technically correct — interpretation of what I asked.

Case 2: Invalid state before initialization

A verification function returned true when configuredTime > 0 && remainingTime == 0 && !running. This was true before the system was started — the user had configured a value but hadn’t pressed Start. Result: infinite deactivation loop.

A test written before implementation would have caught it:

GIVEN the process was configured for 01:30
BUT the user has not started execution
WHEN I check if the cycle has expired
THEN it should return false

Case 3: State recovery after restart

State was saved periodically, but when restarting in less time than the save interval, nothing had been persisted. Test:

GIVEN the system was just activated
WHEN there is an immediate interruption (crash, restart)
THEN the previous state should be recoverable on restart

In all these cases, the bug wasn’t the AI’s fault. The bug was in the specification — or rather, the lack of one.


BDD as a Specification Language for AI

The pattern that emerged was clear: the parts of the project where I used Given/When/Then to describe behavior were the ones that caused the fewest problems. And that’s no coincidence.

BDD closes this gap with “structured intent” — and the syntax that makes it possible is Gherkin. “Time-windowed processing” can mean three different things to three different engineers. But:

GIVEN [initial state]
WHEN [event or condition]
THEN [expected behavior]

…has a single interpretation. And AI respects that uniqueness.

Gherkin works here for the same reason it works across teams: it’s a ubiquitous language. Developers, product, QA — and now AI — read the same specification and understand the same thing. It’s not code, it’s not free-form natural language. It’s a middle ground structured enough to be precise, yet readable enough to be validated by anyone involved in the problem. When the specification is shared without ambiguity across all parties, alignment doesn’t depend on meetings — it depends on the artifact.

More importantly: BDD specifications in Gherkin allow you to test business logic before the AI generates code. You write the scenario, mentally validate whether it covers the correct behavior, and only then request the implementation. This inverts the feedback cycle — instead of generating code, testing, finding bugs, requesting fixes, you specify, validate, and generate correct code on the first attempt.

It’s a “hidden superpower”: the ability to define the WHAT and the WHY before the AI solves the HOW. Specifications serve as living documentation — and as a contract between human and machine.


TDD as Validation of AI Understanding

If BDD is the specification language, TDD is the feedback loop that guarantees correctness.

AI output is non-deterministic. The same prompt can generate different implementations. Tests are the anchor that guarantees that, regardless of how the AI solved the problem, the behavior is correct.

The workflow that works best in practice is:

  1. Write the test first — it’s the executable specification of the desired behavior
  2. Validate the test — if the test looks right, the specification is right
  3. Request the implementation — the AI generates code to pass the test
  4. Run the test — if it passes, the behavior is correct
  5. Refactor — request improvements while keeping tests green

The key point: writing the test first lets you use the test to understand what the AI understood from your request, before it generates the implementation. If the test doesn’t make sense, the problem is in the specification — and you fix it before generating wrong code.

In practice, the test-first workflow produces significantly fewer bugs than test-after. Tests are executable specifications — more precise than natural language prompts.


”Explain Before Implementing”

Beyond BDD and TDD, the most valuable habit I discovered was asking the AI to explain what it’s going to do before doing it.

In one case, I needed an optimization algorithm. Instead of requesting the implementation directly, I asked the AI to explain the approach it would use. In the explanation, I identified that the generated parameters would be too aggressive for the context. We changed the strategy without generating a single line of wrong code.

In another case, I requested an audit of which variables weren’t syncing between the local system and the remote service. The AI found that none of the local changes were being propagated. We fixed it before it became a bug in production.

This pattern — explain, question, implement — isn’t intuitive. The natural tendency is to request code directly. But AI is a better analyst than implementer when you give it the right direction.


The Pattern That Emerged

Looking at the practice as a whole, the workflow that produces the best results is:

StepDescription
ExplainAsk the AI to explain the approach before implementing
SpecifyDescribe the behavior with Given/When/Then
TestWrite (or request) the test before the implementation
ImplementRequest the implementation with the test as reference
FeelTest in practice, feel the friction, observe edge cases
IterateAdjust the specification and repeat

In practice, the portion of code that receives structured specification (BDD/TDD) consumes more preparation time — but prevents the vast majority of bugs. The rest — generated with vague instructions — works, but produces most of the problems that need fixing.

The disproportion is revealing: investing time in specification is the most efficient way to use AI for code generation.


Delivering Fast vs. Sustaining Long-Term

AI doesn’t replace software engineering — it amplifies it. The same practices that make an engineer effective without AI — problem decomposition, clear specification, testing before implementation, questioning assumptions — are exactly what make AI usage dramatically more efficient. BDD and TDD aren’t overhead. They’re the difference between “generating code fast” and “generating correct code fast”.

But the question goes beyond code quality. Any combination of engineer and AI can deliver working software. The real difference shows up after — when the code needs to be maintained, evolved, operated. That’s the distinction that matters: delivering software vs. delivering software with long-term operations in mind. Those who specify before implementing aren’t being slower — they’re avoiding the technical debt that turns initial velocity into permanent friction.

The engineer’s repertoire — knowing what to ask, noticing when something is heading in the wrong direction, sensing that an architectural decision will cost you later — doesn’t come from the tool. It comes from experience. AI is a clear multiplier. But without the repertoire to question what it delivers, it becomes a faster way to make mistakes.

At CERC, this is how we’ve been scaling AI usage in engineering. BDD, TDD, and the habit of specifying before generating code aren’t practices we adopted despite AI — they’re practices we adopted because of it. The result has been consistent: more efficiency, higher quality, and a team that trusts what it delivers.


At CERC, AI isn’t a side tool — it’s part of how we build software. If you want to work in an environment where engineering practices matter and cutting-edge technology solves real problems — we’re hiring.


This post was written by: Vitor Melon | Head of Engineering — Payment Arrangements Platform.