C
Database/Advanced/Lesson 06

DB Advanced — Replication, Sharding, Cache, Connection Pool, Partitioning

45 min·theory

DB Advanced — Replication, Sharding, Cache, Connection Pool, Partitioning

🎯 After Reading This Lesson

Once you finish this lesson, you will be able to confidently do the following 3 things.

  • ✅ Criteria for choosing Range / Hash / List partitioning
  • ✅ Difference between sharding vs. partitioning
  • ✅ Explaining DB backup strategies (Logical vs Physical)

Keep the learning objectives as a checklist and close the lesson once you can answer them all.

Replication — *Scaling Out Your DB*

What Is Replication?

When a web service takes off, read requests explode. A pattern where 10,000 people read a single post is common. What do you do when a single DB can't keep up?

Replication = scaling out your DB across multiple servers to share the load. One server acts as the Primary (writes only), and multiple Replicas (read-only) copy changes from the Primary.

code
   Writes          Read Requests
     │              │  │  │
     ▼              ▼  ▼  ▼
  ┌──────┐      ┌──────┐ ┌──────┐ ┌──────┐
  │Primary├─────►Replica1│ │R 2  │ │R 3  │
  └──────┘ WAL  └──────┘ └──────┘ └──────┘
           stream

Synchronous vs Asynchronous Replication

Synchronous replication — The Primary waits for acknowledgment from the Replica before sending a success response to the client. Zero data loss, but slow.

Asynchronous replication — The Primary responds immediately, and the Replica catches up later. Faster, but the Replica lags slightly behind. If the Primary crashes suddenly, recent data can be lost.

Most services use asynchronous replication. In 99% of cases the lag is in the milliseconds, so it's not a problem, and the performance gains are significant.

Lag — The Most Important Monitoring Metric

How far behind the Replica is is what matters most. If lag reaches 1 second, users will encounter a situation where a post they just wrote doesn't appear.

sql
-- PostgreSQL — run on primary
SELECT client_addr, state,
       pg_wal_lsn_diff(sent_lsn, replay_lsn) AS lag_bytes
FROM pg_stat_replication;

If lag keeps growing → there's a bottleneck somewhere in network, disk, or CPU. Setting alerts is essential.

Application Side — Read/Write Splitting

code
WRITE → Primary
READ  → Replica (load-balanced)

Most ORMs and DataSource routers support this. Read-only queries are automatically routed to Replicas, and writes go to the Primary.

> ⚠️ Pitfall: Reading immediately after a write may return stale data due to lag. In such cases, you need an exception like reading from the Primary for that specific request.

Summary

Replication is the standard approach for read scaling + disaster recovery. Asynchronous is the default, and lag monitoring is the core of operations. By splitting reads and writes, you can significantly reduce the load on the Primary.

Sharding — *Splitting Your Data*

What Is Sharding?

Replication scales reads, but writes still go to a single server. What if writes also explode? — Like Twitter receiving hundreds of thousands of tweets per second, a single server just can't handle it.

Sharding = distributing data across multiple DBs. User IDs 1–1M go to DB1, 1M–2M go to DB2 — this kind of horizontal partitioning.

code
                ┌────────────┐
   User 42 →    │  Shard 1   │  (1–1M)
                ├────────────┤
   User 2M →    │  Shard 2   │  (1M–2M)
                ├────────────┤
   User 5M →    │  Shard 3   │  (2M+)
                └────────────┘

Each shard is an independent DB, so write load is divided by N. The downside is explosive complexity.

Shard Key — The Most Important Decision

Which column you partition on determines your fate.

Good shard keys:

  • user_id — Even distribution per user, few cross-shard queries
  • tenant_id (B2B SaaS) — Isolation per company

Bad shard keys:

  • created_atHot spot on recent data. Old shards sit idle while new shards explode.
  • auto_increment id — Same hot spot issue with range sharding.

Shard keys are nearly impossible to change later. You must decide carefully upfront.

Cross-shard Queries — The Biggest Pitfall

sql
-- User A sends a message to friend B
-- A is on shard 1, B is on shard 2?
SELECT * FROM messages WHERE sender = A AND receiver = B;

Both shards must be queried. JOINs are impossible. Transactions become distributed transactions, which are extremely expensive.

That is why designing your data model and shard key to minimize cross-shard operations is critical.

Resharding — An Operational Nightmare

What if your data grows and you need to expand from 5 shards to 10? You have to redistribute all the data. Doing this without impacting the service is extremely difficult.

That is why it is common to start with enough shards upfront (e.g., start with 64 and distribute across nodes), or use techniques like consistent hashing to reduce the burden.

Managed Options — Best Not to Do It Yourself

Most companies do not implement sharding themselves. It is too hard and risky.

  • DynamoDB / MongoDB Atlas — Auto-sharding NoSQL
  • Vitess — MySQL sharding (built by YouTube, graduated from CNCF)
  • Citus — PostgreSQL extension (Microsoft)

Unless you have a specific reason, replication is usually sufficient. The point where sharding is truly needed is at the scale of hundreds of millions of rows and tens of thousands of writes per second.

Summary

Sharding is a last resort. Try replication → cache → partitioning first, and only turn to sharding if that is still not enough. The shard key decision at the start determines your fate, and avoiding cross-shard queries is the key to performance.

Cache + Connection Pool — *Two Ways to Reduce DB Load*

What Is a Cache?

DB queries are slow (milliseconds). Memory is fast (microseconds). If you are running the same query over and over, storing the result in memory and reusing it makes things 1,000x faster. That is the core idea behind caching.

Redis is the de facto standard — a memory-based key-value store. Extremely fast, with extra features like TTL (auto-expiry) and pub/sub.

The Cache-Aside Pattern

The most common pattern:

code
1. Check cache
2-A. hit → return immediately (fast)
2-B. miss → query DB → store in cache → return

On write: update DB + invalidate cache (or refresh)
python
def get_user(id):
    cached = redis.get(f"user:{id}")
    if cached:
        return cached         # cache hit (μs)

    user = db.query("SELECT * FROM users WHERE id=?", id)  # ms
    redis.setex(f"user:{id}", 300, user)   # 5-min TTL
    return user

For 5 minutes, responses come straight from Redis without touching the DB. The load on the DB drops dramatically.

Cache Invalidation — One of the Two Hard Problems in Computer Science

> "There are only two hard things in Computer Science: cache invalidation and naming things." — Phil Karlton

When data changes, how do you handle stale cache entries?

1. TTL — Auto-expiry after a set time. The simplest approach, but stale data for up to N seconds is possible.

2. Explicit invalidation — Call redis.del('user:42') when data changes. Immediate effect. But you must remember to call it in every change path.

3. Event-based — DB change → publish event → cache subscriber invalidates. Useful in distributed systems.

4. Versioned keys — Change the key itself, like user:v2:42. Old keys expire naturally. Clean when combined with TTL.

> 💡 In practice: TTL + explicit invalidation is 90% of the story. Perfect consistency is expensive and difficult. Allowing a short stale period is the norm.

Connection Pool — Reusing DB Connections

Creating a new DB connection every time is expensive. TCP handshake, authentication, and TLS negotiation happen on every request — around 50ms spent just on the connection.

Connection Pool = pre-creating N connections and reusing them. When a request comes in, it borrows one from the pool, uses it, and returns it. The cost of new connections disappears.

code
┌──── Connection Pool ────┐
│  [conn1][conn2]...[connN]│
└──┬──┬──┬─────────────────┘
   │  │  │
   ▼  ▼  ▼
  Return after handling request

Key Pool Configuration

The most important thing is pool size. Too small → waiting; too large → DB under strain.

Formula: app servers × pool size ≤ DB max_connections

Example: 4 app servers × pool 20 = 80 ≤ Postgres max=100 ✅

HikariCP (Java standard) handles this intelligently. min = max is recommended (predictable behavior).

Common Pitfalls

1. Connection leak — Forgetting conn.close() drains the pool. try-with-resources / context manager is essential.

2. Pool too largeExceeds DB max_connections. Explodes with "too many connections" errors.

3. Idle timeout too short — In environments where firewalls or load balancers cut idle connections, you frequently get new connections. Use max-lifetime to periodically recycle connections (around 30 minutes is recommended).

Summary

Cache + Connection Pool are two standard tools for reducing DB load. Cache cuts repeated query cost, and Connection Pool cuts connection overhead. With both properly configured, you can handle 10x more traffic on the same DB.

Partitioning — *Breaking a Large Table Into Smaller Pieces*

What Is Partitioning?

When a single table grows beyond 100 GB, multiple problems arise. Indexes also become huge, deleting old data is slow, and queries that only touch a subset of data still scan the whole table.

Partitioning = splitting a large table into multiple logical pieces, while making it appear as a single table to the user. Similar to sharding, but the difference is that the split happens within the same DB.

RANGE Partitioning — Best for Time-Series

The most common pattern — partitioned by date range.

sql
CREATE TABLE logs (
    id BIGSERIAL,
    created_at TIMESTAMP NOT NULL,
    data JSONB
) PARTITION BY RANGE (created_at);

CREATE TABLE logs_2025_01 PARTITION OF logs
    FOR VALUES FROM ('2025-01-01') TO ('2025-02-01');

CREATE TABLE logs_2025_02 PARTITION OF logs
    FOR VALUES FROM ('2025-02-01') TO ('2025-03-01');

An INSERT INTO logs automatically routes the row to the correct partition based on the created_at value.

The Real Benefit — Partition Pruning

sql
SELECT * FROM logs WHERE created_at >= '2025-03-01';

This query only scans the logs_2025_03 partition. All other partitions are skipped. Since you only need to look at 1 GB out of 100 GB, it is 100x faster.

The DB automatically infers which partition to look at. This is pruning.

Deleting Old Data Is Fast

Say you want to delete logs that are more than 3 years old.

sql
-- ❌ Very slow (can take hours)
DELETE FROM logs WHERE created_at < '2022-01-01';

-- ✅ Instant
DROP TABLE logs_2021_12;
DROP TABLE logs_2021_11;
-- ...

Dropping a partition is like detaching a chunk of disk, so it is very fast. This is overwhelmingly better for log and event processing.

Automation — pg_partman

Manually creating a new partition every month is an operational burden. pg_partman (a Postgres extension) handles this automatically.

sql
SELECT partman.create_parent(
    p_parent_table => 'public.logs',
    p_control => 'created_at',
    p_type => 'native',
    p_interval => 'monthly',
    p_premake => 3   -- pre-create 3 months ahead
);

TimescaleDB is a time-series-specialized extension on top of PostgreSQL. It manages partitioning automatically and even handles compression.

LIST and HASH Partitioning

LIST — For clear categories like region or category:

sql
CREATE TABLE users (...) PARTITION BY LIST (country);
CREATE TABLE users_kr PARTITION OF users FOR VALUES IN ('KR');
CREATE TABLE users_jp PARTITION OF users FOR VALUES IN ('JP');

HASH — Even distribution (Postgres 11+):

sql
CREATE TABLE orders (...) PARTITION BY HASH (user_id);
CREATE TABLE orders_p0 PARTITION OF orders FOR VALUES WITH (modulus 4, remainder 0);
-- ...

When to Use Partitioning?

  • ✅ Single table 100 GB+
  • Time-based data (logs, metrics, events)
  • Bulk deletion of old data required
  • ❌ Small tables — only adds overhead

Summary

Partitioning dramatically reduces the performance and operational burden of large tables. For time-series data, it is the default option. Automating with pg_partman or TimescaleDB also makes operations easy. It is a much simpler and safer option than sharding, and well worth considering first.

💾 Live Demo — Cache Hit vs Miss Performance Difference

A comparison of fetching the same data via *direct DB query* vs *Redis cache*.
📥 입력 DB 서버로 전송되는 쿼리
SELECT FROM: users length: 362 chars
-- Measurement results (average in production environment)

-- Direct DB query (using index)
SELECT * FROM users WHERE id = 42;
-- Average: 8 ms

-- Redis cache hit
-- GET user:42
-- Average: 0.3 ms

-- 1000 iterations:
-- DB only:    8 ms × 1000 = 8,000 ms (8 seconds)
-- Cache+DB:   1st request 8ms + 999 requests × 0.3ms = 308 ms (0.3 seconds)
-- → 26x faster
⚙️ 처리 단계 DB 엔진 내부 동작
📋
1. 파싱
SQL 문법 분석 → AST 생성
10ms
🧠
2. 옵티마이저
인덱스·조인 순서 최적화 (통계 기반)
25ms
⚙️
3. 실행
Storage Engine이 데이터 읽기
24ms
📊
4. 결과 반환
버퍼에 모아서 클라이언트에 전달
8ms
총 67ms
📤 출력
✓ 3 rows 3 cols 67ms
sourceavg_ms1000 times
DB (index)88,000 ms
Redis hit0.3300 ms
1 miss + 999 hits-308 ms
💡 학습 포인트: 실제 DB는 파싱 → 옵티마이저 → 실행 → 반환 4단계를 거칩니다. 인덱스가 있으면 '옵티마이저' 단계가 수천배 빨라집니다.
🌐 본인 환경(PostgreSQL·MySQL 등)에서 직접 실행해 ms 차이를 체감해보세요.
💻 📌 Cache / Pool / Partition Settings (Reference)
# ============================================
# 1) HikariCP — Spring Standard Connection Pool
# ============================================
spring:
  datasource:
    hikari:
      maximum-pool-size: 20         # Based on CPU·DB max
      minimum-idle: 20               # = max recommended (predictable)
      connection-timeout: 30000      # 30 seconds
      idle-timeout: 600000           # 10 minutes
      max-lifetime: 1800000          # 30 minutes (avoid DB·LB idle disconnections)
      leak-detection-threshold: 60000 # Warning for connections not closed within 60 seconds

# ============================================
# 2) Redis (Spring Boot)
# ============================================
spring:
  data:
    redis:
      host: localhost
      port: 6379
      timeout: 2000
      lettuce:
        pool:
          max-active: 8
          max-idle: 8
          min-idle: 0

# Code:
# @Cacheable(value = "user", key = "#id")
# public User findById(Long id) { return userRepo.findById(id); }
#
# @CacheEvict(value = "user", key = "#user.id")
# public void update(User user) { userRepo.save(user); }

# ============================================
# 3) PostgreSQL Partitioning + pg_partman
# ============================================
-- Install extension
CREATE EXTENSION pg_partman;

-- Parent table + partitioning
CREATE TABLE logs (
    id BIGSERIAL,
    created_at TIMESTAMP NOT NULL,
    level VARCHAR(10),
    message TEXT
) PARTITION BY RANGE (created_at);

-- Automatic management
SELECT partman.create_parent(
    p_parent_table => 'public.logs',
    p_control => 'created_at',
    p_type => 'native',
    p_interval => 'monthly',
    p_premake => 4
);

-- Automatically delete partitions older than 30 days (daily via cron)
-- SELECT partman.run_maintenance();

🤖 How to Ask AI About This

Once you understand the concepts in this lesson, you can give AI specific instructions. Not a vague "fix this," but a request that uses the right vocabulary — that is where token savings start.

  • "Convert this logs table to a YEAR Range partition by created_at"
  • "Rewrite this SELECT query to only select the necessary columns so a covering index kicks in"*
  • "Explain the difference between sharding vs. partitioning and diagnose which one our case needs"

Why Does This Save Tokens?

Without knowing the concepts, even after receiving an AI response you have to ask "what does that mean?" again. That follow-up question is what burns tokens. Learn the concepts once and the conversation ends in a single exchange.

DB Advanced — Replication, Sharding, Cache, Connection Pool, Partitioning - Database