C
Vibe Coding/Intro/Lesson 03

Vibe Coding Fundamentals — Prompts, Pairing, Context & Tokens

45 min·theory
This chapter
1/3

Vibe Coding Fundamentals — Prompts, Pairing, Context & Tokens

🎯 After reading this lesson

After finishing this lesson, you will be able to confidently do the following three things.

  • ✅ Vibe coding = AI pair programming
  • ✅ Save tokens by writing CLAUDE.md / .cursorrules
  • ✅ A 5-item checklist for handling hallucinations

Keep the learning objectives as a checklist and close the lesson once you can answer all of them.

What is Vibe Coding — AI Pair Programming

One line: Vibe coding = the practice of building code together with AI. Code is the output; humans handle intent, validation, and architecture.

3-Role Model:

RoleHumanAI
Direction✅ Decides what and why
Implementation△ Core and complex parts✅ Repetitive, boilerplate
Validation✅ Behavior, security, performance△ First-pass review
Decisions✅ Trade-offs

Where AI excels:

  • ✅ Boilerplate (REST API CRUD, test cases)
  • ✅ Transformation and refactoring (TS↔JS, class↔hook)
  • ✅ Writing docs, comments, and CHANGELOG
  • ✅ Interpreting error messages and forming debugging hypotheses

Where AI falls short:

  • ❌ Business logic outside its domain knowledge
  • ❌ Large-scale architecture decisions
  • ❌ Performance and security evaluation (hypotheses OK, verification requires a human)
  • ❌ Generating plausible-looking code that misses the intent (hallucination)

> 💡 Iron Law: Code produced by AI is still your responsibility. Code review skills become more important than ever.

Saving Tokens — Affects *Cost, Speed, and Accuracy* Alike

What is a Token — A Quick Recap

Token = the unit AI uses to read text. One English word ≈ 1.3 tokens; one Korean character ≈ 1–2 tokens. Every AI response counts and bills both input tokens and output tokens.

Output Tokens Cost More

ModelInput ($/M tokens)Output ($/M tokens)Ratio
Claude Haiku 4.5$1$5
Claude Sonnet 4$3$15
GPT-4o$2.5$10

The longer AI responds, the more costs explode. A vague questionAI plays it safe and explains every possibility → output token explosion.

❌ Bad vs ✅ Good Prompts — Token Difference

Example 1: Code Fix

Bad (estimated output: 2,000 tokens):
> "Take a look at this code"

→ AI tries to rewrite and show the entire file, including parts that didn't need to change.

Good (estimated output: 100 tokens):
> "Fix only the type error on line 47 of auth.ts. Don't touch any other code. Show only the changed lines."

→ AI responds with just that one line. 20× fewer output tokens.

Example 2: Adding a Feature

Bad (estimated output: 3,000 tokens):
> "Build a login feature"

→ AI explains every option (OAuth, JWT, sessions, password hashing, email verification, etc.) and produces a full implementation.

Good (estimated output: 800 tokens):
> "Based on the stack in @CLAUDE.md. POST /api/auth/login.
> Input: zod schema (email, password).
> Processing: bcrypt compare → JWT access 15 min + refresh 7 days.
> Response: httpOnly cookie.
> Include Vitest tests."

→ AI implements exactly the stated requirements.

CLAUDE.md / .cursorrules — Save Repeated Context

You no longer need to repeat "I'm using Next.js 14, TypeScript, and Tailwind" at the start of every conversation.

Create CLAUDE.md or .cursorrules in the project root:

markdown
# Project Guide

## Stack
- Next.js 15 (App Router)
- TypeScript strict mode
- Tailwind CSS + shadcn/ui
- Drizzle ORM + PostgreSQL

## Coding Conventions
- Components: function declaration (no arrow)
- Import order: external → internal → relative
- Korean comments recommended

## Forbidden Patterns
- Do not use var (const/let)
- Do not use any type
- Do not create new functions without tests
- Do not leave "TODO" comments (use issues instead)

AI reads this file automatically every time and incorporates it into responses. Zero repeated explanations.

7 Practical Token-Saving Tips

1. Name specific files: Don't say "look at everything" → say "src/auth.ts only"
2. Limit the change scope: "Edit only this part"
3. Specify the output format: "Show diff only" · "Code only, no explanation"
4. Use CLAUDE.md: Handle repeated context in one shot
5. prompt caching: Reuse the same system prompt via caching (90% discount from Anthropic)
6. Try smaller models first: Start with Haiku → upgrade to Sonnet if insufficient
7. Prune context: Carry only a summary into the next conversation after a long chat

Summary

  • Vague prompt = token bomb
  • Specific prompt = AI responds briefly and accurately
  • CLAUDE.md is the foundation of all token savings

Hallucination — *AI Doesn't Admit When It Doesn't Know*

Core Takeaway

LLMs do not honestly admit ignorance. They make up the most plausible-sounding answer. Function names, library versions, and API responses can all be fabricated.

Why This Happens — The Probability Prediction Mechanism

An LLM predicts "the most likely next token." The concept of genuinely not knowing doesn't exist. It operates on the principle of "if it sounds right, that's the answer."

Example: "What is React's useSnapshot hook?"

  • Fact: That hook does not exist (though Valtio has one)
  • AI: "React's useSnapshot is a hook that saves a snapshot of component state. Usage is..."
  • Confidently lying

Checklist for When You Encounter One

✅ 1. Verify the Function or Library Exists

  • grep or Ctrl+F the function name in official documentation
  • Check the version with npm view <pkg> or pip show <pkg>
  • Click the link — if the URL AI gave returns 404, it's fake

✅ 2. Actually Run It

  • Immediately run the received code
  • For TypeScript, check for compile errors
  • Runtime errors are a truth reveal

✅ 3. Ask AI to Verify

> "Does this function really exist? Give me a link to the official docs."

Most of the time, AI will admit: "Upon checking, it does not exist."

✅ 4. When in Doubt, Cross-Check with Another Model

  • Ask the same question to GPT after Claude
  • If the answers differ, one of them is wrong

✅ 5. Specify the Version

> ❌ "In Next.js..." → AI guesses which version

> ✅ "In Next.js 15 App Router..." → explicit context

Top 5 Things AI Frequently Fabricates

1. Non-existent npm packages (plausible-sounding names like react-magic-form)
2. Incorrect import paths (non-existent paths like from 'next/legacy')
3. Non-existent options (options like { strictMode: 'super-strict' } that don't exist)
4. API response fields (absent fields like response.data.user.premiumLevel)
5. Version confusion (calling a Tailwind v3 option a v4 feature)

Real-World Example — Common in Interviews

> Q: "Have you experienced AI hallucination?"
>
> A: "In code generated by v0, a non-existent shadcn/ui component (Slider3D) was used. I caught it by cross-referencing the official docs, and since then I always specify which components are available in the prompt.
>
> Also, Claude confidently recommended npm install zod-extras, a package that doesn't exist. After confirming with npm view, I replaced it with zod's actual superRefine."

Summary

  • AI makes things up when it doesn't know (a limitation of the probability mechanism)
  • Validate with official docs, actual execution, and cross-checking with another model
  • Specify version and exact names to reduce room for guessing

5 Principles of Prompt Engineering

The 5 elements of a good prompt (CRISP):

PrincipleMeaningExample
ContextBackground, tech stack, constraints"Using Next.js 14, TypeScript, Tailwind, PostgreSQL"
RoleAI's role and perspective"From the perspective of a backend developer"
IntentWhat and why"User authentication API. JWT + refresh token"
SpecificsExact inputs/outputs and edge cases"Email format validation, duplicate check, bcrypt cost 12"
PreferencesCode style and testing"Functional, explicit types, include Vitest tests"

Bad prompt vs Good prompt:

❌ "Build a login API"

✅ "Next.js 14 App Router, TypeScript, Drizzle ORM environment.

POST /api/auth/login endpoint:

  • Input: {email: string, password: string} (zod validation)
  • Processing: query users table → bcrypt.compare → issue JWT (15 min) + refresh (7 days)
  • Response: 200 success + httpOnly cookie, 401 failure
  • Include Vitest tests (success, failure, and invalid input cases)"

Additional techniques:

  • One-shot example: "Respond in this format: [example]"
  • Chain-of-thought: "Explain step by step, then write the code"
  • Reasoning: "Analyze the trade-offs of this approach first, then decide"
  • Constraint: "No external libraries · under 50 lines"
  • Verification: "After writing the code, run it yourself and tell me the result"

> 💡 A prompt is code. It's worth version-controlling like a PR (Cursor Rules, CLAUDE.md).

Context Window + Token Economics

Context window = the amount of text (in tokens) an LLM can read in one go.

Context by model:

ModelContextWhat it means
GPT-3.516KShort conversations
GPT-4o128KMid-size codebase
Claude Sonnet 4200K → 1MA small project in full
Claude Opus 41M (extended)Large-scale codebase
Gemini 2.52MAn entire book

1 token ≈ 0.75 English words ≈ 1–2 Korean characters. 1,000 lines of code ≈ roughly 4–8K tokens.

4 Principles of Token Economics:

PrincipleMeaning
Input < OutputOutput tokens cost 5× more (Anthropic baseline)
Use cachingRepeated same context → save 90% via prompt caching
CompressExtract only the relevant portion of long code. Not the entire codebase
Accumulate summariesFor long conversations, save context with a summary

Price comparison (2025) (1M input + 1M output):

ModelInput ($/M)Output ($/M)Total
Haiku 4.5$1$5$6
Sonnet 4$3$15$18
Opus 4$15$75$90
GPT-4o$2.5$10$12.5

> 💡 Fast and cheap: Haiku → Balanced: Sonnet → Quality: Opus. Division of labor is the right approach.

How LLMs Work — Why You Should Know

One line: LLMs predict the probability of the next token. They don't think — they generate the most plausible answer.

4 Fundamentals:
1. Hallucinationmakes up plausible answers rather than admitting ignorance
- Verify function existence (grep in ./docs/api.md), validate doc links, run actual tests
2. Context length limits — even with 1M tokens, later information takes priority
- Place important information at the end of the prompt
3. Probabilistic responses — the same question can yield different answers
- For consistency, use temperature=0 + fixed seed
4. Training data cutoff — Claude Opus 4.7 = 2026-01 cutoff
- For recent information, web search is required (WebSearch tool, Perplexity)

Strengths and Weaknesses of LLMs:

StrengthsWeaknesses
Language patternsMath and logic
Code boilerplateComplex algorithms
Summarization and translationFactual accuracy
WritingVision and audio (partial support in some multimodal models)
BrainstormingTasks requiring a definitive answer

> 💡 LLMs are fast, smart tools — but they require supervision.

🤖 Try Asking AI Like This

Understanding the concepts in this lesson lets you give AI specific instructions. Not a vague "fix this" but a request with vocabulary — that's where token savings begin.

  • "Rewrite this vague prompt using the 4 elements: scope, context, constraints, and output format"
  • "This prompt has a high hallucination risk — add an evidence requirement to it"

Why This Reduces Tokens

Without understanding the concepts, even after receiving an AI response you have to ask "What does that mean?" again. That follow-up question eats tokens. Learn the concepts once and the conversation ends in one shot.

Vibe Coding Fundamentals — Prompts, Pairing, Context & Tokens - Vibe Coding