C
Java/JVM/Lesson 06

JVM — Architecture, GC, and String Pool

60 min·theory

JVM — Architecture, GC, and String Pool

🎯 After reading this lesson

After finishing this lesson, you will be able to confidently handle the following three topics.

  • ✅ JVM memory structure (Heap / Stack / Metaspace) + GC algorithms
  • ✅ Five recommended production JVM options (-Xms/-Xmx/-XX:+UseG1GC)
  • ✅ Heap dump analysis workflow when an OOM occurs

Keep the learning objectives as a checklist and close the lesson once you can answer all of them.

What is the JVM — *Write Once, Run Anywhere*

Core Idea in One Line

JVM (Java Virtual Machine) = a virtual machine that executes Java code. "Write Once, Run Anywhere" — Java code written once runs identically on Windows, macOS, and Linux.

How Is That Possible?

Java code does not run directly. It goes through two stages:

1. Compilation: .java file → .class file (bytecode). An intermediate language independent of the OS.
2. Execution: The JVM reads the .class files and translates them into native instructions for each OS, then runs them.

This bytecode + JVM combination is the heart of Java. At write time you don't need to worry about Windows or Mac; at run time the JVM for that OS takes care of everything.

JVM Memory — What Is Stored Where?

The JVM divides memory into several regions. The four important ones:

  • Heapall objects and arrays. The largest region. The GC's stage.
  • Stackmethod call frames and local variables. Separate per thread.
  • Method Areaclass metadata (field and method info). Shared across all threads.
  • PC Registerthe current instruction pointer. Per thread.
java
public void run() {
    int x = 5;                    // x → Stack
    User u = new User("Hong");      // User object → Heap, u (reference) → Stack
}

Primitive values like x go on the stack. Objects created with new go on the heap. Only the reference pointing to that object stays on the stack. This distinction is the basic picture of Java memory.

JIT Compiler — Faster for Frequently Used Code

Initially the JVM interprets bytecode — reading and executing it line by line. This is slow.

However, it tracks which methods are called frequently, and once a threshold is crossed it compiles that method into native code. This is called JIT (Just-In-Time) compilation.

Because JIT optimizes based on profiling, a JVM that has been running longer is faster (warm-up). This is why Java servers start out slow but get faster over time.

Code ahead-of-time compiled like C++ is also fast, but JIT observes actual execution patterns and optimizes accordingly — in some cases even outperforming C++.

Summary

The JVM is not a simple translator. It handles memory management, optimization, and GC all on its own. That is why Java developers do not need to manually allocate and free memory.

Garbage Collection — *Automatic Memory Cleanup*

What GC Does

In C and C++ you must manually free memory with free(). Forgetting causes a memory leak; freeing twice causes a program crash.

Java's Garbage Collector handles this automatically. It finds objects that are no longer referenced and reclaims their memory. Developers only need to create objects — the rest can be forgotten.

The Generational Hypothesis — Most Objects Die Young

The reason the JVM's GC is fast is the generational hypothesis.

> Most objects live briefly and die. Results of new or String.split() inside a method, for example — once the method returns, they are no longer used.

Using this hypothesis, the heap is split into two regions:

  • Young Generation — newly created objects. Most die here.
  • Old Generation — objects that survived Young. More likely to live longer.
code
Young Generation              Old Generation
┌────┬──────┬─────┐          ┌─────────────┐
│Eden│  S0  │ S1  │          │  Tenured    │
└────┴──────┴─────┘          └─────────────┘

New objects enter Eden. Those that survive a Minor GC move to Survivor (S0), then to S1 — objects that survive multiple rounds are promoted to the Old Generation.

Minor GC vs Full GC

  • Minor GC: cleans only the Young region. Frequent and fast (tens of ms).
  • Full GC: cleans Old as well. Rare but slow (hundreds of ms to seconds). All threads pause (Stop-The-World).

In production, frequent Full GC is a serious problem. Response times spike suddenly and users notice the slowdown. It may signal that the Old Gen fills up often or that there is a memory leak.

GC Algorithms — Configurable

The JVM offers several GC algorithms. Which one to use is decided through tuning.

  • Serial GC — single-threaded. Small apps and embedded systems.
  • Parallel GC — multi-threaded. Throughput-first (batch workloads).
  • G1 GC — default since Java 9. Predictable pause times. Good for most cases.
  • ZGC — Java 11+. Pauses under 1 ms. Large heaps and low-latency requirements.
  • Shenandoah — similar to ZGC. Led by RedHat.

You select one via JVM options such as -XX:+UseG1GC. The basic guideline: G1 for small apps, ZGC for large apps or low-latency requirements.

Summary

GC offers the huge convenience of automatic memory management. But it is not free — frequent Full GC causes response latency. Designing memory usage patterns well and tuning JVM options are key practical skills.

String — *The Archetype of Immutable Objects*

String Does Not Change

Java's String is an immutable object. Once created, its contents cannot be changed.

java
String s = "hello";
s.concat(" world");      // returns a new object; s is unchanged
System.out.println(s);   // "hello"

Methods like concat, replace, and toUpperCase all create and return a new String. The original is never touched.

Why Immutable?

Immutability brings significant benefits:

  • Thread safety: multiple threads can access it simultaneously with no risk of modification
  • Safe as a HashMap key: usable as a key because its hash value never changes
  • Security: file paths and URLs cannot be tampered with
  • String Pool optimization: identical values can be shared

String Pool — Sharing Identical Characters

The JVM maintains a special space for Strings — the String Pool. Strings with the same content are stored only once and shared.

java
String a = "hello";          // stored in pool
String b = "hello";          // reused from pool — same object as a!
String c = new String("hello");  // new bypasses pool — a different object

a == b   // true  (same reference)
a == c   // false (different reference)
a.equals(c)  // true  (same value)

Here is the most confusing thing in Java.

== vs equals():

  • == is a reference comparison (is it the same object?)
  • equals() is a value comparison (does it have the same content?)

Always use equals() to compare Strings. Comparing with == may accidentally be correct or incorrect.

StringBuilder — The Trap of String Concatenation

java
// ❌ using + inside a loop — very slow
String result = "";
for (int i = 0; i < 1000; i++) {
    result += i;   // creates a new String object every iteration. O(n²)
}

result += i creates a new String and reassigns it to result. Running 1,000 iterations creates 1,000 temporary objects. GC goes into overdrive.

Using StringBuilder accumulates data in an internal buffer and converts to a String only once at the end.

java
// ✅ StringBuilder
StringBuilder sb = new StringBuilder();
for (int i = 0; i < 1000; i++) {
    sb.append(i);
}
String result = sb.toString();    // O(n)

> 💡 Simple a + b + c concatenation is automatically converted to StringBuilder by the compiler. You only need to use it explicitly inside loops.

Text Block (Java 15+) — Multi-line Strings

java
String html = """
    <html>
        <body>
            <p>Hello</p>
        </body>
    </html>
    """;

Indentation and line breaks are preserved as-is, resulting in clean code. Dramatically convenient for writing JSON, SQL, and HTML.

Summary

String is immutable. Use equals() for comparison, StringBuilder for repeated concatenation, and Text Block for multi-line strings. Knowing these three covers 90% of everyday code safely.

Java 17–21 Modern Features — *Less Code to Write*

record (finalized in Java 14/16) — Immutable Data Class

Writing a simple class that only holds data — constructor, getters, equals, hashCode, toString — takes 50 lines every time. A record does it in one line.

java
public record User(Long id, String name, String email) { }

This single line automatically generates:

  • Constructor: new User(1L, "Hong", "[email protected]")
  • Getters: u.id(), u.name() (note: not getId() — slightly different)
  • equals(), hashCode(), toString()

Perfect for simple data objects like DTOs, events, and value objects. It provides the same role as Lombok's @Value but at the language level.

sealed class (Java 17) — Only Permitted Subclasses

java
public sealed interface Shape permits Circle, Square, Triangle { }
final class Circle implements Shape { }

The classes that can implement Shape are restricted to Circle, Square, and Triangle. Attempting to implement Shape elsewhere causes a compile error.

Why is this useful? Combined with switch pattern matching, the compiler verifies all cases are handled. If a new shape is added, every switch must be updated or it will not compile — mistakes of omission disappear.

switch Expression (Java 14) — Returning a Value

The old switch was a statement and could not return a value. The modern switch is an expression and can:

java
String type = switch (status) {
    case PENDING -> "Pending";
    case PAID, REFUNDED -> "Processed";
    default -> "Unknown";
};

-> arrow syntax + no break needed + returns a value + multiple cases can be grouped. The old switch's fall-through trap is also gone.

Pattern Matching (Java 21) — Goodbye if-instanceof

java
// Old way
if (obj instanceof String) {
    String s = (String) obj;
    if (s.length() > 0) ...
}

// Modern way
if (obj instanceof String s && s.length() > 0) ...

instanceof and variable declaration combined in one line. Even more powerful when combined with switch:

java
String describe(Object o) {
    return switch (o) {
        case Integer i -> "int " + i;
        case String s when s.length() > 0 -> "str " + s;
        case null -> "null";
        default -> "other";
    };
}

Combined with sealed classes, all cases are enforced by the compiler.

Optional (Java 8) — Explicit Representation of null

When you call User getUser(), who knows it might return null? Nobody. Hence NullPointerException explosions.

java
Optional<User> getUser(Long id) { ... }

With Optional in the return type, callers are explicitly made aware that the result may be empty.

java
Optional<User> u = userRepo.findById(id);
u.map(User::getEmail)
 .filter(e -> e.endsWith("@company.com"))
 .ifPresent(this::sendEmail);

String email = u.map(User::getEmail).orElse("unknown");

> 💡 Use Optional only as a return type. Using it for fields or parameters is an anti-pattern.

Virtual Thread (Java 21) — A Revolution in Concurrency

The same feature covered in the Collections + Functional lesson. It handles 10,000+ concurrent tasks without the overhead of OS threads. Automatically yields during I/O waits. The moment Java gained Go/Kotlin-level concurrency.

Summary

The new features in Java 17–21 make code shorter and allow the compiler to verify more. records, sealed classes, switch pattern matching, Optional, and Virtual Threads — strongly recommended for new projects. For legacy projects, adopt incrementally.

🎮 JVM Memory and GC Visualization

Walk through each step: class loading → object creation → GC flow.
📝 Hello.java — 개발자가 작성
public class Hello {
    public static void main(String[] args) {
        System.out.println("안녕, Java!");
    }
}
💡 .java 파일 — 사람이 읽을 수 있는 소스코드
⚙️ javac Hello.java → Hello.class (바이트코드)
Hello.java
📄
사람이 읽는 코드

javac
Hello.class
💾
JVM이 읽는 바이트코드
💡 바이트코드는 어떤 OS에서도 실행 가능 — "Write Once, Run Anywhere"
📦 ClassLoader — .class 파일을 메모리에 적재
Bootstrap ClassLoader → java.lang.* (JDK 기본 클래스)
Extension ClassLoader → javax.*, ext 라이브러리
Application ClassLoader → Hello.class ← 우리 코드!
💡 JVM 메모리: Method Area(클래스 정보) → Heap(객체) → Stack(메서드 호출)
🚀 JIT 컴파일러 — 바이트코드 → 네이티브 코드 (초고속)
바이트코드
느림
JIT
컴파일
네이티브 코드
빠름 ⚡
💡 자주 실행되는 코드(Hot Spot)를 JIT가 감지해 네이티브로 변환 → 처음엔 느리고 나중엔 빨라지는 이유
✅ 실행 결과
$ java Hello
안녕, Java!
.java
소스코드
.class
바이트코드
출력
결과

GC Tuning Options — 5 Commonly Used in Production

Why GC Options Are Necessary

The default JVM is calibrated for small memory and general workloads. Real services often need:

  • Scaling memory from 1 GB up to 8 GB
  • Reducing Stop-the-World time when response latency matters

These are adjusted with JVM options (-X, -XX).

The 5 Most Commonly Used

1. -Xms / -Xmx — Heap Memory Size

bash
java -Xms512m -Xmx2g -jar app.jar
  • -Xms = initial heap size
  • -Xmx = maximum heap size

Practical tip: set -Xms == -Xmx to the same value. This avoids the cost of dynamic expansion and yields predictable performance. Effectively the standard in AWS and k8s environments.

2. -XX:+UseG1GC — G1 Garbage Collector

bash
-XX:+UseG1GC

Default since Java 9 (also the default in Java 17). Provides short response pause times for heaps 4 GB and above. Writing it explicitly is good for clarity.

Java 11+ ZGC and Java 15+ Shenandoah provide even shorter pause times but G1 is sufficient unless your heap is 16 GB or more.

3. -XX:MaxGCPauseMillis=200 — Maximum GC Pause Time Goal

bash
-XX:MaxGCPauseMillis=200

A hint to the JVM: "keep each GC pause under 200 ms." This is a target, not a guarantee. Essential for services with response latency SLAs.

4. -XX:+HeapDumpOnOutOfMemoryError — Automatic Heap Dump on OOM

bash
-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=/var/log/app-heapdump.hprof

When an OOM occurs, a heap dump is automatically written. For post-mortem analysis — a mandatory option for production.

5. -XX:+PrintGCDetails (Java 8) / -Xlog:gc* (Java 9+) — GC Logging

bash
# Java 17
-Xlog:gc*:file=/var/log/gc.log:time,uptime:filecount=10,filesize=10M

Records when, how often, and why GC occurred. Essential during load testing.

Standard Production Combination (Spring Boot)

bash
java \
  -Xms2g -Xmx2g \
  -XX:+UseG1GC \
  -XX:MaxGCPauseMillis=200 \
  -XX:+HeapDumpOnOutOfMemoryError \
  -XX:HeapDumpPath=/var/log/heapdump.hprof \
  -Xlog:gc*:file=/var/log/gc.log:time,uptime:filecount=10,filesize=10M \
  -jar app.jar

You do not need to memorize all of these options. What matters is knowing that these options exist and why they are used — if an interviewer asks "Have you done GC tuning?" you should be able to mention these five.

☕ Try It Yourself — String Pool and == vs equals

The traps of String. Verify the difference between a literal and `new String()` using `==` and `equals`.
☕ Java
✏️ 코드 편집기
📟 출력 결과
▶ Press the Run button
💡 코드를 직접 수정하고 실행해보세요. 변수값을 바꾸거나 println을 추가해 결과를 확인하세요!
☁️ Judge0 API로 서버에서 실행 — Java / Python / JS / C++ 지원

🤖 Try Asking AI Like This

Knowing the concepts from this lesson lets you give AI specific instructions. Not a vague 'fix this' but a request with vocabulary — that is where token savings begin.

  • 'Recommend production JVM options (Xms/Xmx/G1GC/HeapDump) for this Spring Boot app'
  • 'Diagnose the cause of this OutOfMemoryError from a heap dump analysis perspective'
  • 'Add GC logging options in Java 17 format'

Why This Reduces Token Usage

Without the underlying concepts, even after receiving an AI response you have to ask 'What does that mean?' again. Those follow-up questions consume tokens. Learn the concepts once and the conversation ends in one pass.

JVM — Architecture·GC·String Pool·Java 17~21 - Java