C
Vibe Coding/Workflow/Lesson 07

Vibe Workflows — Spec · Agents · TDD · Git

45 min·theory

Vibe Workflows — Spec · Agents · TDD · Git

🎯 After reading this lesson

After completing this lesson, you will be confident doing the following three things.

  • ✅ The SPEC → Plan → Tasks → Implement flow
  • ✅ Safe use of Agent mode (dry-run · approval gates)
  • ✅ Git commit units + how to write PR descriptions for AI-generated code

Keep the learning objectives as a checklist and close the lesson once you can answer every item.

SPEC vs Vibe — When to spec, when to vibe?

One line: Simple tasks = quick vibe / Complex tasks = SPEC first.

AspectVibe CodingSpec-Driven
Start"Build me a Login API"spec.md · plan.md · tasks.md
SpeedStart immediately5–15 min spec writing
Suited forPrototype · MVP · experimentsProduction · multi-person collaboration
DocsCode onlySpec + code
ChangesIntuitiveUpdate spec first

When SPEC comes first:

  • 🟢 Large features (1 week+ of work)
  • 🟢 Multi-person collaboration (PR reviewers required)
  • 🟢 Security · payments · authentication
  • 🟢 Backend API contracts
  • 🟢 Data schema changes

When Vibe comes first:

  • 🟢 UI components (boilerplate)
  • 🟢 Small bug fixes
  • 🟢 Refactoring · renaming
  • 🟢 Writing tests
  • 🟢 Documentation · comments

Spec-Kit workflow (GitHub, 2025):

code
/specify  → spec.md  (requirements)
/plan     → plan.md  (implementation strategy)
/tasks    → tasks.md (checklist)
/implement→ actual code writing

Can be automated when combined with Anthropic Skills + Claude Code.

Agentic Workflows — "AI that works while you sleep"

Agent = an LLM that autonomously repeats the cycle of calling tools · checking results · deciding the next action.

Basic loop:

code
1. Receive goal
2. Plan (break into subtasks)
3. Call tools (edit · run · test)
4. Evaluate results
5. Decide next action → back to 2 (or terminate)

Tasks agents excel at:

  • 🟢 Repetitive tasks: bulk changes across 100 files
  • 🟢 Exploration: understanding a codebase · validating debugging hypotheses
  • 🟢 Tedious tasks: migrations · writing tests · updating documentation
  • 🟢 Overnight / weekend work: auto-generating PRs while the user sleeps

Limitations:

  • 🔴 Fully autonomous X — user approval every 5–10 minutes recommended
  • 🔴 Expensive calls (Opus 1M context = $90/M)
  • 🔴 Dangerous actions (rm -rf · DB DROP) risk being executed directly
  • 🔴 Hallucinations — fabricated function calls · generated docs

Safety measures:

  • Permission modes: ask (approve every action) → acceptEdits (edits only) → plan (no modifications)
  • Sandbox: Docker · worktree isolation
  • Checkpoints: review after every commit
  • Rollback ready: recoverable via git reset

> 💡 2025 trend: Combining GitHub Actions + Claude Code → the pattern of auto-generating PRs and running tests overnight is spreading.

TDD with AI

TDD cycle: Red (failing test) → Green (passing code) → Refactor (clean up)

AI's role:

StageAI assistance
RedGenerate test cases ("add these input and edge cases too")
GreenPass with minimal code (boilerplate quickly)
RefactorSuggest code cleanup and name improvements

Tests AI excels at:

  • 🟢 Unit tests (simple input/output)
  • 🟢 Edge cases (null · empty arrays · very large numbers)
  • 🟢 Boilerplate (jest · vitest · pytest setup)

Tests AI struggles with:

  • 🔴 Complex integration tests (database · external API · authentication flows)
  • 🔴 Business requirements (domain knowledge required)
  • 🔴 Performance tests (real environment required)
  • 🔴 Diagnosing flaky tests (timing · environment dependencies)

Using Cursor · Claude Code:

code
"Write vitest tests for this function (/src/auth.ts):
- Success case (valid email · password)
- Failure (empty input · short password · invalid email)
- Edge (email 100 chars · password 1000 chars · SQL injection attempt)
- Mocking: db, bcrypt"

> 💡 Iron Law: Tests generated by AI must also be read and understood by you. Never trust them based on pass results alone.

Git in the AI Era — Small commits + automated reviews

Principles:
1. Small commits — 1 commit = 1 change intent. Makes AI reviews effective
2. Clear messages — Conventional Commits (feat · fix · refactor)
3. PR automation — Copilot Review · CodeRabbit and similar tools for automated first-pass review
4. Automated tests — CI must pass before merging

AI Git tools:

ToolFunction
AiderAuto-commit on every change · easy rollback
Claude CodeGit strategy based on CLAUDE.md
GitHub CopilotPR summaries · review comments
CodeRabbitAutomated PR reviews (free for open source)
Linear ↔ CursorIssues → code auto-linking

AI-era PR template:

markdown
## Changes
- What · where · why

## AI collaboration
- Tools used: Cursor Composer / Claude Code
- AI-generated ratio: ~60% (boilerplate), 40% manual
- Verification: integration tests passed · manual dogfood

## Tests
- vitest pass: 12/12
- Manual: verified inputs and exceptions on dev server

Common pitfalls:

  • ❌ Committing all AI-generated code in a single commit — impossible to review
  • ❌ AI making arbitrary changes elsewhere — surprises at review time
  • ❌ Merging based only on compile errors — business logic unverified
  • ✅ Start in Plan mode → small incremental changes → commit immediately

SPEC-Driven — *The token-saving secret*

Why writing a SPEC saves tokens

Old way: every conversation follows "build me this feature → AI asks questions → you answer → AI implements" repeated. The same context is repeated N times.

SPEC-Driven: write spec.md once upfront → all subsequent conversations reference that spec. Context written once + short follow-up messages.

Practical flow

Step 1: Write spec.md together with AI

code
Me: I want to build a payment system. Let's write the spec first.
   - Payment gateway: Toss
   - Products: one-time purchase + recurring subscription
   - Need refund and settlement logic
   Please write the spec.

AI: (writes spec.md)
  # Payment System Specification

  ## Requirements
  1. Payment gateway integration (Toss)
  2. Product types: SINGLE / SUBSCRIPTION
  3. Refund processing (within 7 days)
  4. Settlement (1st of every month)

  ## API Endpoints
  - POST /api/payments/init
  - POST /api/payments/confirm
  - POST /api/payments/refund
  ...

This spec.md is saved at the project root.

Step 2: Implementation — without repeating explanations

code
Me: Implement POST /api/payments/init from spec.md.

AI: (reads spec and implements precisely)

No need to repeat "Toss gateway · subscription · refund policy" every time. Spec written once + short requests only.

Step 3: Code review — against the spec

code
Me: Does the implementation match spec.md? Tell me if there are differences.

AI: (compares spec with code)
  ✅ /init endpoint — matches spec
  ⚠️ /refund — spec says 7 days but code has 14 days
  ❌ /confirm — missing webhook verification step from spec

Spec-Kit (GitHub, 2025)

A SPEC-Driven standard toolset built by GitHub. 4-step automation:

bash
/specify  → auto-generate spec.md (requirements)
/plan     → auto-generate plan.md (implementation strategy)
/tasks    → auto-generate tasks.md (checklist)
/implement → actual code writing

Combined with Claude Code · Cursor, it creates one unified flow from spec to code.

SPEC vs Vibe — Summary

AspectVibe (improvised)SPEC-Driven
Start"Build me a payment system"spec.md first
TokensContext repeated every timeWritten once + short requests
ConsistencyMay vary per responseConsistent per spec
CollaborationDifficult (words only)Shared via spec
TrackingDifficultVerified against spec
Suited forPrototypes · 1-hour tasksProduction · multi-person collaboration

Summary

  • Large tasks → SPEC first, then implement = token savings + consistency
  • Small tasks → plain Vibe is fine too
  • Automate spec → code with Spec-Kit

Agent mode *safe* usage guide (must read)

Core one-liner

Agent = AI that autonomously modifies files and executes commands. Used incorrectly, it can destroy your code · database · production environment. 5 principles for safe use.

⚠️ Never do these

1. Do not use --dangerously-skip-permissions

Among Claude Code options, there is a mode that auto-approves all permissions. It is fast, but the AI can execute dangerous commands like rm -rf without asking. This can destroy production code and databases.

✅ Always start in default mode (Ask or Plan).

2. Permission mode levels — grant gradually

ModeMeaningWhen to use
PlanShows plan only, no executionInitial review · complex tasks
Ask (default)Requests approval for every actionEveryday work
AcceptEditsFile edits auto-approved, commands require approvalFamiliar tasks
BypassPermissionsEverything automatic⚠️ Dangerous. Isolated environments only

Recommended flow: Start with Plan → review → step-by-step with Ask → AcceptEdits when comfortable

3. Commit to Git frequently — rollback points

bash
# Before starting work
git add -A && git commit -m "checkpoint: before agent work"

# After agent work
git diff   # review changes
git add -p # partial staging (exclude unintended changes)
git commit -m "feat(auth): add JWT refresh"

# If things go wrong
git reset --hard HEAD~1   # back to previous commit

Commit after every meaningful unit. Enables partial rollback of only the broken parts.

4. Isolate with a separate worktree

bash
git worktree add ../myapp-experiment my-feature
cd ../myapp-experiment
claude    # work in isolated folder

0% risk to original code. If things break, just run git worktree remove ../myapp-experiment.

5. Docker container — true isolation

bash
docker run --rm -it -v $(pwd):/app -w /app node:20 bash
# run claude inside
# commands execute only inside the container → host stays safe

Dangerous operations (DB migrations · system commands) must be run inside a container.

Dangerous commands blacklist

Double-check before the agent runs these:

  • rm -rf (especially with * or /)
  • git reset --hard (loses commits)
  • git push --force (overwrites remote)
  • DROP TABLE · TRUNCATE (deletes database data)
  • npm publish (accidental publish)
  • curl ... | sh (executes external scripts)
  • Any command with sudo

When something goes wrong — recover fast

1. Stop immediately — Ctrl+C
2. Check changesgit status · git diff
3. Roll backgit reset --hard HEAD or git stash
4. If it is the DB — check the last backup
5. Post-mortem — review transcript in ~/.claude/projects/

Summary

  • Never auto-approve all permissions
  • Gradual steps: Plan → Ask → AcceptEdits
  • Frequent git commits + worktree isolation + Docker are your safety net
  • Double-check dangerous commands

Conventional Commits — *5 real-world examples*

Why follow the convention

code
fix: bug fixed

100 messages like this and you cannot find anything later. Following standard conventions lets you:

  • git log --grep "^feat(auth)" — only new features in the auth area
  • semantic-release — automatic versioning + CHANGELOG
  • CI validation — commitlint blocks PRs with invalid messages

Format

code
<type>(<scope>): <description>

[optional body]

[optional footer]
  • type — feat · fix · refactor · docs · test · chore · perf · style
  • scope — area (auth · api · ui, etc.)
  • description — imperative mood, within 50 characters

5 real-world examples

1. feat — new feature

code
feat(auth): add JWT refresh token rotation

- Access token TTL shortened (60min → 15min)
- Refresh token rotation introduced (revoke all on reuse)
- httpOnly + Secure + SameSite=Lax cookie
- Reflects RFC 8252 recommendations

Closes #142

2. fix — bug fix

code
fix(api): handle null user in /api/me

500 error occurred when calling /api/me while not logged in.
Changed to return explicit 401 when session validation result is null.

Before: TypeError: Cannot read 'id' of null → 500
After:  { error: 'NOT_AUTHENTICATED' } → 401

Fixes #198

3. refactor — refactoring (no behavior change)

code
refactor(db): migrate from Prisma to Drizzle

- Build time 8s → 2s (removed Prisma generate)
- Bundle size 250KB → 50KB
- Improved TypeScript inference accuracy

Migration guide: docs/db-migration.md

4. test — adding/modifying tests

code
test(auth): add edge cases for login endpoint

- Empty password
- SQL injection attempt (`' OR '1'='1`)
- Input over 1000 characters
- Invalid email format (RFC 5322)

Coverage: 78% → 91%

5. chore — build · dependencies · tooling

code
chore(deps): bump next from 14.1 to 14.2

Security: includes patch for CVE-2024-XXXX
Breaking: middleware matcher regex changed

Migration:
- middleware.ts:12 update matcher pattern
- /api/auth/* → /api/auth/(.*)

Team convention examples

Additional types (varies by team):

  • perf — performance improvements
  • style — formatting · semicolons (not CSS)
  • ci — GitHub Actions · workflows
  • build — webpack · vite configuration
  • revert — reverting a previous commit

Automated enforcement with commitlint

bash
npm install --save-dev @commitlint/cli @commitlint/config-conventional
echo "module.exports = { extends: ['@commitlint/config-conventional'] };" > commitlint.config.js

GitHub Actions:

yaml
- uses: wagoid/commitlint-github-action@v6

→ Automatically validates every commit message in a PR. Invalid formats block the merge.

Summary

  • Standard type + scope + description
  • One concise line (under 50 chars) + details in body
  • Rewarded with semantic-release · automated CHANGELOG
  • Enforce team-wide with commitlint

🤖 Try asking AI like this

Knowing the concepts in this lesson lets you give AI specific instructions. Instead of a vague "fix it," you make requests with vocabulary — that is the starting point of token savings.

  • "Based on this SPEC, create the Plan → Tasks → Implement flow"
  • "Explain the dry-run + approval gate pattern for safely using AI agent mode"

Why this reduces tokens

Without the concepts, even after receiving an AI response you have to ask "what does that mean?" again. That follow-up question is what consumes tokens. Learn the concepts once and the conversation ends in a single round.

Vibe Workflow — Spec·Agent·TDD·Git - Vibe Coding