C
Database/SQL Basics/Lesson 02

SQL Essentials — RDBMS · SELECT · WHERE · GROUP BY · JOIN

60 min·theory
This chapter
1/2

SQL Essentials — RDBMS · SELECT · WHERE · GROUP BY · JOIN

🎯 By the end of this lesson

After reading this lesson, you will be able to confidently do the following three things.

  • ✅ Use SELECT · WHERE · GROUP BY · HAVING · ORDER BY clauses in the correct order
  • ✅ Understand the difference between INNER JOIN and LEFT JOIN, and choose the right one
  • ✅ Read a query execution plan using EXPLAIN (the type and rows columns)

Keep these learning goals as a checklist, and close the lesson once you can answer all of them.

6 Core RDBMS Concepts

In one line: Relational DB = table + PK/FK + normalization + ACID + index + SQL. A 50-year standard.

6 Key Concepts

ConceptMeaningOne-line example
Table · Row · Column2D grid of dataCREATE TABLE users (id, name, email)
PK · FKUnique identification + relationshipsorders.user_id REFERENCES users(id)
Normalization 1·2·3NFEliminate redundancy → prevent anomalies1NF (atomic values) → 2NF (partial dependency) → 3NF (transitive dependency)
ACID4 transaction guaranteesAtomic · Consistent · Isolated · Durable
IndexO(log n) lookup via B+TreeReads ↑, writes slightly ↓
SQL 4 partsLanguage classificationDDL · DML · DCL · TCL

Quick Guide to Normalization Levels

LevelRuleViolation example → Fix
1NFAtomic values only (no arrays or repeating groups)hobbies='reading,coding' → separate table
2NF1NF + every non-key column depends on the entire primary keyproduct name depends on (order_id, product_id) PK → split into product table
3NF2NF + no transitive dependency between non-key columnsuser_id → city → zip_code → split into zip code table
BCNFEvery determinant is a candidate keySimilar to 3NF but stricter

> 💡 In practice: 3NF is typically sufficient. Intentional denormalization is applied for performance in frequently queried areas.

SQL 4 Parts

CategoryPurposeCommands
DDL (Definition)SchemaCREATE · ALTER · DROP · TRUNCATE
DML (Manipulation)DataSELECT · INSERT · UPDATE · DELETE
DCL (Control)PermissionsGRANT · REVOKE
TCL (Transaction)TransactionsBEGIN · COMMIT · ROLLBACK · SAVEPOINT

SELECT: 7 Clauses and Their Execution Order

Writing Order vs. Execution Order

code
Write:   SELECT → FROM → WHERE → GROUP BY → HAVING → ORDER BY → LIMIT
Execute: FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY → LIMIT
         ──────────────────────────────────────────────────────
         Data is *prepared* first, then columns are *selected*, *sorted*, and *sliced*

Each Clause — Role and Common Pitfalls

ClauseRoleCommon pitfallFix
FROM · JOINSpecify tablesLarge tables first → high memoryPut small tables / filters first
WHERERow filterWHERE UPPER(col)='X' → no index usedWHERE col='X'
WHERERow filterWHERE col LIKE '%abc' → no index usedLIKE 'abc%'
GROUP BYGroupingColumns in SELECT other than group keys or aggregate functions → errorUse group keys or aggregates only
HAVINGGroup filterConfused with WHEREHAVING SUM>1000 (after aggregation)
SELECTColumn selectionSELECT * → unnecessary data transferSelect only needed columns
ORDER BYSortingWithout an index, full sort (O(n log n))Add index on sort column
LIMIT · OFFSETPaginationOFFSET 100000 → very slowUse keyset pagination (WHERE id > last_id)

Quick Memorization Tip

  • FWG·HSO·L — From-Where-Group / Having-Select-Order / Limit
  • Reduce data early — the earlier WHERE runs, the lighter the subsequent steps
  • 5 patterns that skip the index: functions, LIKE with leading %, type casting, NOT, OR

JOIN + GROUP BY Patterns

Comparing 5 JOIN Types

TypeMeaningUse case
INNEROnly matching rows on both sidesOrders + payments (where both exist)
LEFTAll left rows + matches (NULL if none)All users + orders (including those with no orders)
RIGHTOpposite of LEFTRarely used (flip LEFT instead)
FULLAll rows from both sides (UNION effect)Finding unique rows from both sides
CROSSCartesian product (M × N)Usually a mistake; produces all combinations

JOIN Anti-Patterns

Anti-join (rows that exist on one side only):

sql
SELECT u.* FROM users u
LEFT JOIN orders o ON o.user_id = u.id
WHERE o.id IS NULL;          -- users who have never placed an order

Self-join (self-referencing):

sql
SELECT e.name, m.name AS manager
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.id;

6 GROUP BY Patterns

#PatternExample
1Simple groupGROUP BY user_id + COUNT(*)
2Multi-columnGROUP BY user_id, DATE(created_at)
3HAVING (group filter)HAVING COUNT(*) > 5
4Time unitGROUP BY DATE_TRUNC('day', created_at)
5CASE WHEN aggregationSUM(CASE WHEN status='paid' THEN 1 ELSE 0 END)
6ROLLUP · CUBEAutomatic subtotals and grand totals (GROUP BY ROLLUP(year, month))

WHERE vs. HAVING

sql
SELECT user_id, COUNT(*) AS cnt
FROM orders
WHERE status = 'PAID'        -- row filter (step 1)
GROUP BY user_id             -- grouping
HAVING COUNT(*) > 5;         -- group filter (after aggregation)

💾 Practice 1 — Order Statistics per User (LEFT JOIN + GROUP BY)

Run the SQL below and check the results yourself. Click ▶ Run to expand the result table.
📥 입력 DB 서버로 전송되는 쿼리
SELECT FROM: users length: 542 chars
-- Input: users table + orders table
-- Process: LEFT JOIN to include users with no orders + GROUP BY to aggregate per user
-- Output: order count and total payment amount per user (0 for users with 0 orders)

SELECT
  u.id,
  u.name,
  COUNT(o.id) AS order_count,           -- number of orders
  COALESCE(SUM(o.amount), 0) AS total   -- total payment amount (NULL → 0)
FROM users u
LEFT JOIN orders o
  ON o.user_id = u.id
  AND o.status = 'PAID'                  -- completed payments only
GROUP BY u.id, u.name
ORDER BY total DESC
LIMIT 5;
⚙️ 처리 단계 DB 엔진 내부 동작
📋
1. 파싱
SQL 문법 분석 → AST 생성
10ms
🧠
2. 옵티마이저
인덱스·조인 순서 최적화 (통계 기반)
25ms
⚙️
3. 실행
Storage Engine이 데이터 읽기
30ms
📊
4. 결과 반환
버퍼에 모아서 클라이언트에 전달
8ms
총 73ms
📤 출력
✓ 5 rows 4 cols 73ms
idnameorder_counttotal
42Hong Gildong15450000
41Kim Cheolsu12380000
37Lee Younghee8215000
55Park Minsu389000
12Choi Jiwon00
💡 학습 포인트: 실제 DB는 파싱 → 옵티마이저 → 실행 → 반환 4단계를 거칩니다. 인덱스가 있으면 '옵티마이저' 단계가 수천배 빨라집니다.
🌐 본인 환경(PostgreSQL·MySQL 등)에서 직접 실행해 ms 차이를 체감해보세요.

💾 Practice 2 — Monthly Revenue (DATE_TRUNC + HAVING)

Time-based aggregation + HAVING filter pattern. Displays only months with revenue exceeding 1,000,000 KRW.
📥 입력 DB 서버로 전송되는 쿼리
SELECT FROM: orders length: 416 chars
-- Input: orders (created_at, total, status)
-- Process: group by month + filter for completed payments + only months with 1,000,000 or more
-- Output: order count and revenue per month

SELECT
  DATE_TRUNC('month', created_at)::DATE AS month,
  COUNT(*) AS orders,
  SUM(total) AS revenue,
  ROUND(AVG(total)) AS avg_order
FROM orders
WHERE status = 'PAID'
GROUP BY month
HAVING SUM(total) > 1000000
ORDER BY month;
⚙️ 처리 단계 DB 엔진 내부 동작
📋
1. 파싱
SQL 문법 분석 → AST 생성
10ms
🧠
2. 옵티마이저
인덱스·조인 순서 최적화 (통계 기반)
25ms
⚙️
3. 실행
Storage Engine이 데이터 읽기
27ms
📊
4. 결과 반환
버퍼에 모아서 클라이언트에 전달
8ms
총 70ms
📤 출력
✓ 4 rows 4 cols 70ms
monthordersrevenueavg_order
2025-01-01142182000012817
2025-02-01188245000013031
2025-03-01215312000014511
2025-04-01276418000015145
💡 학습 포인트: 실제 DB는 파싱 → 옵티마이저 → 실행 → 반환 4단계를 거칩니다. 인덱스가 있으면 '옵티마이저' 단계가 수천배 빨라집니다.
🌐 본인 환경(PostgreSQL·MySQL 등)에서 직접 실행해 ms 차이를 체감해보세요.

💾 Practice 3 — Window Functions (Top 3 Recent Orders per User)

Advanced pattern — use ROW_NUMBER + PARTITION BY to extract the top N rows per user.
📥 입력 DB 서버로 전송되는 쿼리
SELECT FROM: orders length: 371 chars
-- Input: orders
-- Process: sort per user via PARTITION BY user_id → top 3
-- Output: the 3 most recent orders for each user

WITH ranked AS (
  SELECT
    o.*,
    ROW_NUMBER() OVER (
      PARTITION BY user_id
      ORDER BY created_at DESC
    ) AS rn
  FROM orders o
)
SELECT user_id, id AS order_id, total, created_at
FROM ranked
WHERE rn <= 3
ORDER BY user_id, rn;
⚙️ 처리 단계 DB 엔진 내부 동작
📋
1. 파싱
SQL 문법 분석 → AST 생성
10ms
🧠
2. 옵티마이저
인덱스·조인 순서 최적화 (통계 기반)
25ms
⚙️
3. 실행
Storage Engine이 데이터 읽기
30ms
📊
4. 결과 반환
버퍼에 모아서 클라이언트에 전달
8ms
총 73ms
📤 출력
✓ 5 rows 4 cols 73ms
user_idorder_idtotalcreated_at
421832450002025-04-25
421801320002025-04-22
421789280002025-04-19
411820180002025-04-24
411798550002025-04-21
💡 학습 포인트: 실제 DB는 파싱 → 옵티마이저 → 실행 → 반환 4단계를 거칩니다. 인덱스가 있으면 '옵티마이저' 단계가 수천배 빨라집니다.
🌐 본인 환경(PostgreSQL·MySQL 등)에서 직접 실행해 ms 차이를 체감해보세요.
💻 📌 Practical SQL Pattern Collection (copy-ready cheatsheet)
-- ============================================
-- 8 Common SQL Patterns (PostgreSQL Standard)
-- ============================================

-- 1) UPSERT (UPDATE if exists, INSERT if not)
INSERT INTO users (email, name) VALUES ('[email protected]', 'Hong Gil-dong')
ON CONFLICT (email) DO UPDATE SET name = EXCLUDED.name;

-- 2) keyset pagination (instead of large OFFSET)
-- ❌ Slow: SELECT * FROM posts ORDER BY id DESC OFFSET 100000 LIMIT 10;
-- ✅ Fast:
SELECT * FROM posts WHERE id < :last_id ORDER BY id DESC LIMIT 10;

-- 3) Deduplicate + only the latest 1 record
SELECT DISTINCT ON (user_id) user_id, id, created_at
FROM orders ORDER BY user_id, created_at DESC;

-- 4) JSON column query (Postgres jsonb)
SELECT id, metadata->>'theme' AS theme
FROM users WHERE metadata @> '{"premium": true}';

-- 5) Full-text search
SELECT * FROM articles
WHERE to_tsvector('korean', title || ' ' || body) @@ plainto_tsquery('Spring Boot');

-- 6) Safe count (exact vs. estimate)
SELECT COUNT(*) FROM huge_table;                     -- Exact, slow
SELECT reltuples FROM pg_class WHERE relname='huge_table';  -- Estimate, immediate

-- 7) Window function — cumulative sum·moving average
SELECT date, sales,
       SUM(sales) OVER (ORDER BY date) AS cumulative,
       AVG(sales) OVER (ORDER BY date ROWS 6 PRECEDING) AS ma7
FROM daily_sales;

-- 8) Recursive CTE — hierarchical structure (organization chart·category tree)
WITH RECURSIVE org AS (
  SELECT id, name, manager_id, 1 AS depth
  FROM employees WHERE manager_id IS NULL
  UNION ALL
  SELECT e.id, e.name, e.manager_id, o.depth + 1
  FROM employees e JOIN org o ON e.manager_id = o.id
)
SELECT * FROM org ORDER BY depth, name;

EXPLAIN — Reading Query Execution Plans

Why You Need EXPLAIN

Two queries that return the same result can differ in speed by tens or hundreds of times depending on how the database executes them internally. EXPLAIN shows you the decisions made by the MySQL optimizer.

Usage

sql
EXPLAIN SELECT * FROM users WHERE email = '[email protected]';

Result:

code
+----+--------+-------+------+---------------+---------+---------+-------+------+
| id | select | table | type | possible_keys | key     | rows    | Extra |
+----+--------+-------+------+---------------+---------+---------+-------+------+
|  1 | SIMPLE | users | ref  | idx_email     | idx_email | 1     | NULL  |
+----+--------+-------+------+---------------+---------+---------+-------+------+

The Most Important Columns — type and rows

type levels — best to worst

1. const / system — exactly 1 row (PK lookup). Best.
2. eq_ref — PK or UNIQUE index used in a JOIN. Very good.
3. ref — index used. Good.
4. range — range index scan (WHERE age > 20). Acceptable.
5. index — full index scan. Bad.
6. ALLfull table scan. Worst — no index in use.

If type shows ALL, add an index or rewrite the query.

rows — estimated number of rows to examine

On a table with 1 million rows, rows: 1000000 means a full scan. rows: 5 means the index was hit. Fewer rows = faster.

EXPLAIN ANALYZE — Including Actual Execution Time (MySQL 8+)

sql
EXPLAIN ANALYZE SELECT ...;

Executes the query and prints the actual time spent at each step. The essential tool for debugging slow queries.

Practical Workflow

1. Collect queries taking over 1 second from the slow query log (slow_query_log)
2. Run EXPLAIN on each query
3. Queries with type=ALL or large rows → consider adding an index
4. Use EXPLAIN ANALYZE to compare before and after the improvement

🤖 Try asking AI like this

Knowing the concepts in this lesson lets you give specific instructions to AI. Instead of a vague 'fix this,' you can make requests with vocabulary — and that's where token savings begin.

  • 'Design a 3-table schema for users · orders · products, complete with FK and INDEX'
  • 'Write a query that aggregates the number of users who signed up in the last 7 days, broken down by day'
  • 'Add EXPLAIN to this query and interpret the execution plan'

Why This Reduces Tokens

Without knowing the concepts, even after receiving an AI answer, you have to ask 'what does that mean?' again. That follow-up question is what burns tokens. Learn the concepts once, and the conversation ends in a single exchange.

SQL Essentials — RDBMS · SELECT · WHERE · GROUP BY · JOIN - Database