Context switching and system calls

The cost of switching between threads, and how user code crosses into the kernel to access hardware.

Context switching overhead

When the OS decides to switch from one thread to another, it performs a context switch:

Save the current thread's register set, program counter, and stack pointer to its thread control block (TCB) in kernel memory.
Load the next thread's registers, PC, and SP from its TCB.
If the threads belong to different processes, also switch the page table pointer — the MMU now translates addresses through the new process's page table.

The raw register save/restore is fast (dozens of cycles). The real cost is indirect: the CPU's TLB (Translation Lookaside Buffer — a cache of recent virtual- to-physical translations) is often flushed on a process switch, causing subsequent memory accesses to pay page-table walk penalties. Cache lines belonging to the previous thread are evicted as the new thread warms its own working set.

Benchmarks typically put the fully-loaded cost of a process context switch at 1–10 µs. At a 10 ms time slice this is a 0.01–0.1% overhead — negligible. But a program that creates thousands of threads, each blocking almost immediately, can spend more time in context switches than doing real work. This is the argument for event-loop concurrency and lightweight coroutines.

System calls as the kernel boundary

User code cannot directly read from a disk, open a network socket, or allocate pages of memory — these are hardware operations gated by the privilege boundary. To cross it, user code executes a system call (a syscall instruction on x86-64 or svc on ARM).

The CPU switches to kernel mode, the kernel validates the request and performs the operation, then switches back to user mode and returns the result. This mode switch is not free: it flushes some CPU state and pays a round-trip through the kernel's dispatch table. A system call costs roughly 50–500 ns on modern hardware.

User space         |  Kernel space
                   |
read(fd, buf, n) --+---> sys_read()
                   |       → file-system layer
                   |       → block device driver
                   |       ← data copied to buf
                   +<--- return n (bytes read)

High-performance I/O libraries (Linux io_uring, Windows IOCP) batch multiple system calls into a single crossing to amortise this cost. The principle is the same: every crossing of the user/kernel boundary has overhead, so cross it as rarely as necessary.

"User space" and "kernel space" are not just an abstraction — they correspond to different CPU privilege levels enforced in hardware. A bug in user code cannot corrupt kernel memory. A bug in kernel code (a kernel module, a driver) can corrupt anything and cause a kernel panic (Windows: BSOD). This is why device drivers are among the most carefully reviewed code in existence.

Where to go next

Processes and threads communicate over networks as well as shared memory. The Networking Fundamentals track covers the layered model that the OS's network stack implements, and how TCP turns raw packet delivery into reliable streams. For a deeper look at how concurrency is managed in a specific language, see the Go track (goroutines + channels) or the Rust track (thread-safety via the type system).

Knowledge check

1.
The most significant hidden cost of a process context switch (beyond saving and restoring registers) is:
The kernel recompiling the process's code for the new CPU stateTLB flushing and cache warm-up as the new thread loads its working setThe scheduler reordering all waiting threads on every switchNetwork packets being dropped during the switch
2.
A system call is necessary whenever a user program wants to perform an operation that requires privileged hardware access, such as reading a file or opening a socket.
TrueFalse

Finished reading? Mark it complete to track your progress.

Context switching and system calls

Context switching overhead

System calls as the kernel boundary

Where to go next

Knowledge check

On this page