Transactions + Indexes — ACID, Isolation Levels, MVCC, B+Tree
Transactions + Indexes — ACID, Isolation Levels, MVCC, B+Tree
🎯 What you'll be able to do after this lesson
By the end of this lesson, you will be able to confidently do the following three things.
- ▸✅ ACID's 4 properties + the 4 isolation levels (RU/RC/RR/Serializable)
- ▸✅ How B-Tree indexes work + the Leftmost Prefix rule for composite indexes
- ▸✅ When to choose optimistic locking vs. pessimistic locking
Keep these learning goals as a checklist, and close the lesson only when you can answer all of them.
What Is a Transaction — *All or Nothing*
The One-Liner
Transaction = a unit of work that is logically grouped together. Every operation inside must succeed for the changes to be committed; if even one fails, everything is rolled back.
Why It Matters — A Wire Transfer Example
Imagine you're sending a friend $100. Here's what the database needs to do:
1. Deduct $100 from your account
2. Add $100 to your friend's account
What if step 1 succeeds but a system failure causes step 2 to fail? Your $100 vanishes into thin air — your friend never received it and you've already lost it.
With a transaction, both steps must succeed before a commit (finalization) happens. If either fails, a rollback occurs — as if nothing ever happened.
ACID — The 4 Guarantees of a Transaction
These are worth memorizing. They come up constantly in interviews.
A — Atomicity: All or nothing. As in the wire transfer example — partial success does not exist.
C — Consistency: The database's rules (foreign keys, UNIQUE, CHECK constraints) are never broken before or after a transaction. A constraint like "balance cannot be negative" is always enforced.
I — Isolation: Concurrent transactions do not interfere with each other. Even if two people buy the same item at the same time, inventory is decremented exactly right.
D — Durability: Once committed, data is never lost — not even if the power goes out. A success response is only sent after data has been reliably written to disk.
All four properties are required to build systems that handle money. Payments, inventory, reservations — all of these are risky without ACID.
Isolation Levels — How Strictly Should We Isolate?
Perfect isolation (the I in ACID) is the safest but also the slowest. That's why four levels let you choose how strict to be.
- ▸READ UNCOMMITTED: Almost no isolation. Can read data that hasn't been committed yet. Rarely used.
- ▸READ COMMITTED: Default in PostgreSQL and Oracle. Only committed data is visible. The most common choice.
- ▸REPEATABLE READ: Default in MySQL. The same query within a transaction always returns the same result.
- ▸SERIALIZABLE: The strictest level. Transactions behave as if they ran one after another. Safe but slow.
Higher levels mean more safety, less concurrency. READ COMMITTED is sufficient for most web applications. For strict consistency requirements like banking or payments, SERIALIZABLE may be necessary.
MVCC — The Secret Behind Modern Database Concurrency
Older databases blocked reads when writes were happening. If someone was writing a row, nobody else could even read it.
Modern databases — PostgreSQL, Oracle, MySQL InnoDB — use MVCC (Multi-Version Concurrency Control). Each row can have multiple versions:
- ▸Writers → add a new version
- ▸Readers → see the old version (unaffected by the write)
They don't block each other. Concurrency improves dramatically. The downside: old versions accumulate, wasting disk space — which is why background cleanup jobs like autovacuum are necessary.
Common Pitfalls
1. Keeping transactions open too long: Holding a lock for 10 seconds after BEGIN means everyone else has to wait. Keep transactions short.
2. Committing outside a transaction: In Spring, applying @Transactional to an internally called method won't work (proxy limitation). The annotated method must be called from outside the class.
3. Infinite lock waits: Deadlocks or slow queries can cause indefinite waits. Always set a timeout: SET LOCAL lock_timeout = '3s';
Summary
Transactions are foundational for any system that handles money. Know the 4 ACID properties. READ COMMITTED is the default isolation level; MVCC is what makes concurrency possible. In practice, one line — Spring's @Transactional — does the work, but understanding the underlying principles makes a huge difference when debugging.
Indexes — *Like a Book's Index*
What Is an Index?
Imagine searching for a single book in a library by going shelf by shelf from start to finish — 10,000 shelves means 10,000 steps. But with a card catalog, you find the location instantly.
Database indexes work on the same principle. Create an index on a column, and searches on that column become dramatically faster. The difference between searching 1 million rows can be 0.001 seconds vs. 10 seconds — a 10,000x speedup.
B+Tree — The Most Common Index Structure
Most databases use a data structure called B+Tree. It's a balanced tree — every search traverses the same depth, giving a consistent O(log n) time.
Leaf nodes are linked to each other, so range queries (e.g., 25 to 75) are also fast. Find the starting position, then walk sideways along the linked leaves.
When Should You Add an Index?
- ▸✅ Columns that appear frequently in WHERE clauses —
WHERE email = ? - ▸✅ JOIN conditions —
ON orders.user_id = users.id - ▸✅ ORDER BY columns — the index already provides sorted order
- ▸❌ Tables with very frequent writes — every write must update the index too (slower)
- ▸❌ Low-cardinality columns (e.g., gender) — index benefit is minimal
Composite Indexes — Order Matters
Indexing multiple columns together:
This index is matched from left to right (Leftmost Prefix rule).
- ▸
WHERE user_id = ?✅ uses index - ▸
WHERE user_id = ? AND created_at > ?✅ uses index - ▸
WHERE created_at > ?❌ skips index (user_id missing)
Think of a phone book sorted by last name, then first name. You need the last name to look someone up quickly. Knowing only the first name means scanning from the beginning.
5 Pitfalls That Prevent Index Use
Ignore these and your index does nothing, even if it exists.
1. Using a function — WHERE UPPER(email) = '[email protected]'. The index is sorted on the original value — it doesn't know the function's result.
2. Left-side wildcard — WHERE name LIKE '%Hong%'. Can't match from the front, so the index is useless. LIKE 'Hong%' is fine.
3. Type mismatch — WHERE user_id = '42' (string) when the column is numeric. Implicit type conversion breaks index usage.
4. NOT / != — Negation conditions generally can't use an index. By definition, most rows match.
5. OR clauses — WHERE a = 1 OR b = 2. Even if both have indexes, one or both may not be used. Splitting with UNION can help.
> 💡 Making a habit of checking with EXPLAIN ANALYZE is the starting point for query performance debugging.
EXPLAIN — Viewing the Query Execution Plan
This shows you how the database plans to execute your query before it runs.
Things to look for in the output:
- ▸Seq Scan = full table scan. A red flag for large tables.
- ▸Index Scan = index is being used. ✅
- ▸rows=100 estimated vs actual rows=10000 = stale statistics. Run
ANALYZE <table>;to refresh.
The ability to read EXPLAIN output is the foundation of SQL performance debugging.
The N+1 Problem — The Classic ORM Trap
With 100 users, that's 101 queries. Pages become tens of times slower.
Solution: Use the ORM's eager loading feature:
- ▸JPA:
@Query+JOIN FETCH - ▸Django:
prefetch_related - ▸SQLAlchemy:
joinedload - ▸Prisma:
include
A single JOIN query fetches all the data — N+1 becomes 1.
Summary
Indexes are the key to fast reads. But adding them carelessly degrades write performance. Always verify with EXPLAIN, and only add them where they're truly needed. ORM traps like N+1 also require looking at actual queries to detect.
💾 Live Demo 1 — Index Effect (EXPLAIN ANALYZE)
-- Search by email in a table with 1 million rows
-- ❌ No index (Seq Scan)
EXPLAIN ANALYZE
SELECT * FROM users WHERE email = '[email protected]';
-- Seq Scan on users (cost=0..18334)
-- Execution Time: 850 ms
-- ✅ After adding an index
CREATE INDEX idx_users_email ON users(email);
EXPLAIN ANALYZE
SELECT * FROM users WHERE email = '[email protected]';
-- Index Scan using idx_users_email (cost=0..8)
-- Execution Time: 0.5 ms
-- → 1,700x faster
💾 Live Demo 2 — Transaction & Rollback (Wire Transfer Scenario)
-- Initial state: account 1 = 50,000 KRW, account 2 = 0 KRW
BEGIN;
-- Deduct
UPDATE accounts SET balance = balance - 100000 WHERE id = 1;
-- Validate non-negative balance (CHECK constraint)
-- balance >= 0 violation → error
-- → automatic ROLLBACK
-- If it had passed, credit the amount
UPDATE accounts SET balance = balance + 100000 WHERE id = 2;
COMMIT;
-- Verify result
SELECT * FROM accounts WHERE id IN (1, 2);
💾 Live Demo 3 — N+1 vs. JOIN FETCH (Performance Comparison)
-- ❌ N+1 pattern
-- 1 query: SELECT * FROM users;
-- N queries: SELECT COUNT(*) FROM orders WHERE user_id = ? (100 times)
-- → 101 queries, 5 seconds
-- ✅ Single JOIN
SELECT
u.id,
u.name,
COUNT(o.id) AS order_count
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
GROUP BY u.id, u.name;
-- → 1 query, 30ms
Index Design in Practice — Cardinality & Composite Indexes
Cardinality — Number of Distinct Values
- ▸High cardinality:
email(almost all unique). Index benefit is enormous. - ▸Low cardinality:
gender(only M/F). Index benefit is negligible. Better not to create one. - ▸Medium:
country(hundreds of values). Depends on the case.
Creating an index on a low-cardinality column just wastes disk space with no benefit.
5 Principles of Index Design
1. Columns frequently used in WHERE clauses
2. JOIN keys (FKs usually get one automatically)
3. ORDER BY and GROUP BY columns
With an index, the sorting step can be skipped.
4. Composite indexes — order matters
Put the most frequently queried column first. Queries this index can serve:
- ▸
WHERE user_id = ?✅ - ▸
WHERE user_id = ? AND status = ?✅ - ▸
WHERE user_id = ? AND status = ? AND created_at > ?✅ - ▸
WHERE status = ?❌ (leading column missing) - ▸
WHERE created_at > ?❌
Conditions must be present from left to right in order to use the index. This is called the Leftmost Prefix rule.
5. Don't create too many
Too many indexes means INSERT/UPDATE/DELETE slows down — every index must be updated. Balance read frequency against write frequency.
5 Bad Index Patterns
1. Low-cardinality columns — is_deleted, gender alone
2. Very long columns — direct indexing of TEXT (consider a Prefix Index or Full-text search)
3. Rarely queried columns — validate with EXPLAIN before creating
4. Not using function-based indexes — WHERE YEAR(date) = 2026 won't use an index; rewrite as WHERE date >= '2026-01-01'
5. Duplicate indexes — if you have (a) and (a, b), the (a) index is redundant
Checking Index Statistics
The Cardinality column value — the closer it is to the total row count, the better the index.
🤖 Try Asking AI This Way
Knowing the concepts from this lesson lets you give AI specific, precise instructions. Instead of a vague "fix this," you can make vocabulary-rich requests — and that's where token savings begin.
- ▸"Create a composite index for this query (WHERE user_id + status + created_at)"
- ▸"Wrap this operation in a transaction with READ_COMMITTED isolation level"
- ▸"There's a potential deadlock — analyze it from the perspective of INDEX and lock ordering"
Why This Reduces Tokens
Without the concepts, you receive an AI response and have to follow up with "What does that mean?" — and that follow-up question is what burns tokens. Learn the concept once, and the conversation ends in one exchange.