C
OS/Basics/Lesson 02

OS Fundamentals + Processes & Threads

45 min·theory

OS Fundamentals + Processes & Threads

🎯 After Reading This Lesson

By the end of this lesson, you will be able to confidently explain the following three things.

  • ✅ The four core responsibilities of an OS (process, memory, file system, I/O)
  • ✅ User Mode vs Kernel Mode + syscall
  • Why Node.js is fast despite being single-threaded

Keep the learning goals as a checklist and close the lesson only once you can answer all of them.

4 Things an OS Does

In one line: Operating System (OS) = a software layer that abstracts processes, memory, files, and I/O on top of hardware.

4 Core Responsibilities of an OS:

DomainMeaning
Process ManagementCreating, switching, and terminating running programs. fork()·exec()·exit()
Memory ManagementVirtual memory, paging, and allocation. Each process has the illusion of independent memory
File SystemDisk abstraction, directories, and permissions. inode-based
I/O ManagementCommunication with keyboard, network, and disk. Device drivers

Boot → Launch KakaoTalk flow (6 steps):
1. Power ON — BIOS/UEFI → bootloader → kernel load
2. Kernel initialization — memory mapping, driver load, scheduler start
3. init process — PID 1, parent of all user processes (systemd / launchd)
4. Login shell — bash/zsh starts, environment variables loaded
5. App launchfork() + exec("/Applications/KakaoTalk.app/...") → new process
6. Event loop — waiting for and handling keyboard/network input

> 💡 Launching KakaoTalk once = the OS silently performs the 6 steps above. Every time, in under 0.5 seconds.

Process vs Thread — Two Models of Concurrency

ItemProcessThread
MemoryCompletely independentShared within the same process
Creation costHeavy (several ms)Lightweight (several μs)
CommunicationIPC (pipes, sockets, shared memory)Shared memory + synchronization
IsolationStrong (one process dying doesn't affect others)Weak (one thread crash → entire process dies)
Context switch costHigh (memory map, cache invalidation)Low

When to use what:

  • Process isolation: stability first (Chrome per-tab process), security isolation (Docker container), inter-language integration
  • Thread usage: lightweight concurrency (per-request thread in web servers), shared data processing (GUI event loop)
  • Modern trends: virtual threads (Java 21), goroutines (Go), async/await (Python, Node) — lightweight concurrency even lighter than threads

Common mistakes:

  • ❌ Spawning 1,000 threads → context switch cost explodes
  • ❌ Modifying shared data without a lock → race condition
  • ✅ Use thread pool (ThreadPoolExecutor) + lock or async/await

Context Switching — The *Real Cost* of Concurrency

6 Steps of Context Switching (CPU switching from process A → B):
1. Interrupt — timer, I/O completion, or system call
2. Save A's state — registers, PC, stack pointer into the PCB (Process Control Block)
3. Invoke scheduler — select the next process to run (CFS, O(1), real-time)
4. Load B's PCB — restore registers and MMU mappings
5. Cache invalidation — partial flush of L1/L2 cache and TLB (Translation Lookaside Buffer)
6. Resume B — continue from where it was paused

Cost:

ResourceCost
Time~1–10 μs (simple switch)
Cache miss+several μs when TLB is flushed
Memory bandwidthPCB I/O

Overhead pitfalls:

  • ❌ Too many threads (1000+) → CPU spends more time switching than doing real work
  • ❌ CPU-bound work inside async code → blocks the main thread
  • ✅ Recommended thread count ≈ number of CPU cores (roughly N + 1)

> 💡 Why is async fast? = Multiple tasks on a single thread with no context switching (event loop).

💻 📌 Learning Process & Thread Commands Through Scenarios
# ============================================================
# Scenario 1: "Huh? The server is slow. What's eating up the CPU?"
# ============================================================
# 1) Find the *CPU-intensive* process with a real-time monitor
htop                          # Color interactive (advanced top, recommended)
top -o %CPU                   # When htop is not available

# 2) Find the PID of the suspicious process
pgrep -fl node                # Output commands containing 'node' + PID
# Result: 28391 /usr/bin/node /app/server.js

# 3) Investigate in detail — memory · status · start time
ps -p 28391 -o pid,ppid,user,stat,%cpu,%mem,start,cmd
# STAT column:
#   R = Running   S = Sleeping   D = Uninterruptible sleep (cannot be killed)
#   Z = Zombie      T = Stopped    < = High priority

# 4) How many *threads* a process is using
ps -T -p 28391                # List of threads
# Or: cat /proc/28391/status | grep Threads

# ============================================================
# Scenario 2: "A process is stuck. How do I kill it?"
# ============================================================
# *Always* try graceful shutdown first — gives a chance to save data and clean up
kill 28391                    # = kill -TERM (SIGTERM, 15)
                              # App can perform graceful shutdown (clean up DB connections, complete ongoing requests)

# If it doesn't die after 5 seconds, force it
kill -9 28391                 # SIGKILL — OS *immediately* removes it from memory
                              # ⚠️ Risk of data corruption. Last resort

# Reload configuration (not restart)
kill -HUP 28391               # SIGHUP — standard for nginx · systemd service reload

# Batch by name
killall -TERM nginx           # Gracefully terminate all nginx processes
pkill -f 'node.*server.js'    # Pattern matching

# ============================================================
# Scenario 3: "A heavy background task is interfering with *other tasks*"
# ============================================================
# Adjust priority — nice value -20 (high) ~ +19 (low). Default 0.
nice -n 10 python heavy_batch.py    # Run new process with lower priority
                                     # Yield to other tasks, run slowly itself

renice -n 5 -p 28391                # Change priority of *running* process
                                     # Positive value = yield. ↑ value = ↓ priority (opposite of intuition!)

# ============================================================
# Scenario 4: "A process died but remains as Z (zombie) in ps"
# ============================================================
# Zombie = child terminated → parent didn't call wait() → only PCB remains
ps aux | awk '$8=="Z"'        # Extract only zombies
# Or: ps -eo stat,pid,ppid,cmd | grep -w Z

# Zombies are *adopted by init (PID 1)* if *parent is killed* → cleaned up
# After confirming parent PID
kill -CHLD <parentPID>          # Signal parent to clean up children
# If not, restarting the parent is recommended

# ============================================================
# Scenario 5: "I want to see what files a process has open"
# ============================================================
lsof -p 28391                 # All files · sockets · libraries opened by that PID
lsof -i :3000                 # Process occupying *port 3000* (reverse direction)
lsof -i TCP:443 -sTCP:LISTEN  # Only those listening on 443

# Check fd limits (frequently encountered 'too many open files')
cat /proc/28391/limits | grep 'open files'
ulimit -n                     # Current shell limit
ulimit -n 65536               # Increase (for permanent, /etc/security/limits.conf)

# ============================================================
# Scenario 6: "Run in background and ignore signals"
# ============================================================
nohup python server.py &      # Doesn't die on logout, output to nohup.out
disown -h %1                  # nohup effect on already running job
# Or register as a systemd service (standard for long-term operation)

System Calls · Interrupts · Context Switching — One Page

User Space vs Kernel Space

OS memory has 2 regions:

  • User Space — where our app runs. Limited privileges. No direct access to memory, disk, or network.
  • Kernel Space — where the OS itself lives. Full hardware privileges.
code
[User Space]    [My App]
     ↓ syscall
[Kernel Space]  [OS Kernel]
     ↓
[Hardware]      [Disk, NIC, ...]

For a user app to open a file — it cannot access the disk directly. It must request the OS to do it. That request is a system call.

System Call

c
// Open file in C
int fd = open("data.txt", O_RDONLY);    // syscall
read(fd, buf, 100);                       // syscall
close(fd);                                 // syscall

open / read / write / close / fork / exec — approximately 300 on Linux. Every I/O or process creation is a syscall.

All file/network calls in high-level languages (Python, Java) internally invoke syscalls.

The Cost of a syscall

The transition from user space → kernel space is itself expensive (on the order of microseconds). That is why:

  • Buffering — instead of a syscall every time, accumulate and flush at once. BufferedReader, the internal buffer of console.log.
  • Async I/O — Node.js, async/await — process other work while waiting for a syscall.

Interrupt vs Polling

Polling — Keep Asking

python
while not done():
    pass    # Is it done? Is it done? Is it done? — CPU 100%

Wastes CPU time. Busy waiting.

Interrupt — Notify Me

When hardware sends a "ready!" signal, the CPU handles it immediately:

  • Keyboard input — the moment a key is pressed, an IRQ fires → kernel converts it to an event → delivers it to the app
  • Network packet arrival — NIC fires an interrupt → kernel buffers it → wakes the app
  • Timer expiry — the foundation of setTimeout

"Polling is inefficient; interrupts are the standard" — nearly all I/O in modern OSes is interrupt-based.

Context Switching

When the CPU switches between processes/threads — it saves the current state (registers, PC, memory mappings) and restores the state of the next task.

Cost

  • Process switching — must also swap memory mappings. Expensive (~several μs).
  • Thread switching — shares the same memory. 5–10x lighter than process switching.
  • Function call — simple stack push. Nanosecond range.

Therefore — multithreading is faster than multi-processing.

Why Node.js Is Fast Despite Being Single-Threaded

Node.js "single thread + event loop":

code
[Main Thread — JS execution]
     ↑
   Event Queue
     ↑
[libuv thread pool] ←— used for async I/O
[Kernel — async syscall (epoll, kqueue)]

Core idea:

1. I/O consumes 99% of the time (DB, API, disk)
2. While waiting on I/O, the main thread handles other requests
3. Context switching cost is nearly zero — there is only one thread

The "one thread per request" model (e.g., Apache): 10,000 requests = 10,000 threads = context switching explosion + memory explosion.

Node model: 10,000 requests = 1 main thread + 4–8 worker threads + event queue. Incomparably more efficient.

Summary — Connecting to Vibe Coding

  • fetch / file read = syscall → expensive → minimize (batching, caching)
  • Node.js / async/await = event-drivenexcels at I/O-heavy servers
  • CPU-intensive work → Worker Thread → don't block the main event loop

If you are asked in an interview "Why is Node.js fast?" — recite the contents of this page and you will pass.

🤖 Try Asking AI Like This

Knowing the concepts in this lesson lets you give AI specific instructions. Instead of a vague "fix this," you make vocabulary-driven requests — that is the starting point of saving tokens.

  • "Show me the command to trace syscalls for this operation using strace (Linux)"
  • "Diagnose whether this code incurs high cost in user space or kernel space"

Why This Reduces Tokens

When you don't know the concepts, even after receiving an AI answer you have to ask "What does that mean?" again. That follow-up question eats up tokens. Learn the concept once, and the conversation ends in a single exchange.

OS Fundamentals + Processes & Threads - OS