Profiling
Find performance bottlenecks with go tool pprof — CPU and memory profiles, the net/http/pprof endpoint, and reading flame graphs.
- Generate a CPU profile with go test -cpuprofile
- Generate a heap profile with go test -memprofile
- Start the pprof HTTP endpoint in a running server
- Use go tool pprof to explore profiles interactively and as flame graphs
- Name three common CPU and memory hotspots to watch for
Go ships profiling support in the standard library and toolchain. You don't need third-party tools to find bottlenecks — go tool pprof and the net/http/pprof package are everything you need to go from "the service is slow" to "line 47 of store.go is the problem". This lesson is a practical guide to that workflow.
CPU profiling with go test
The fastest way to profile is through your test suite:
go test -bench=BenchmarkHot -cpuprofile=cpu.prof ./...
go tool pprof cpu.profInside the pprof REPL, the most useful commands are:
(pprof) top10 # top 10 functions by CPU time
(pprof) list MyFunc # annotated source for MyFunc
(pprof) web # open flame graph in browser (requires graphviz)top10 output looks like:
Showing nodes accounting for 3.2s, 89.44% of 3.58s total
flat flat% sum% cum cum%
1.45s 40.50% 40.50% 1.45s 40.50% runtime.mallocgc
0.82s 22.91% 63.41% 0.82s 22.91% store.processRecord- flat: time spent in this function (excluding callees).
- cum: cumulative time including callees.
When mallocgc dominates, you have excessive heap allocation. Look for functions with high flat time for CPU work, high cum time for call-chain bottlenecks.
Memory profiling
go test -bench=BenchmarkHot -memprofile=mem.prof ./...
go tool pprof -alloc_space mem.profInside pprof:
(pprof) top10 # top allocating functions
(pprof) list MyFunc # lines with allocation countsTwo profile types matter:
-alloc_space— total bytes allocated over the profile period (cumulative).-inuse_space— bytes currently live (snapshot).
Start with -alloc_space when looking for GC pressure. Use -inuse_space when diagnosing a memory leak.
The net/http/pprof endpoint
For running services, import the side-effect-only package:
import _ "net/http/pprof"This registers several endpoints on http.DefaultServeMux:
/debug/pprof/ — index page
/debug/pprof/profile — 30-second CPU profile
/debug/pprof/heap — heap snapshot
/debug/pprof/goroutine — all goroutine stacksCollect a live CPU profile:
go tool pprof http://localhost:8080/debug/pprof/profile?seconds=30Never expose the pprof endpoint on a public-facing port in production. It provides detailed information about your program's memory and goroutines that is a security risk. Bind it to a separate internal port or protect it with authentication.
Reading flame graphs
The web command in pprof generates a flame graph (requires Graphviz). A flame graph shows:
- Width — proportion of time spent in a function (wider = more time).
- Height — call stack depth.
- Colour — no semantic meaning (just visual contrast).
The widest boxes at the top of a hot stack are your targets. A wide box near the top that is not your code usually indicates you're calling the standard library more than necessary.
Common hotspots to watch for
CPU:
runtime.mallocgc— you are allocating too much. Look for per-request slice/map creation that could be pooled.runtime.memeqbody/ string comparisons — tight loops comparing strings.fmt.Sprintfin hot paths — string formatting allocates. Cache or pre-format where possible.
Memory:
- Functions that allocate in a loop without reuse — use
sync.Poolfor expensive objects. - Large intermediate slices — stream or process in chunks.
bytes.Buffergrowing unboundedly — pre-allocate withbytes.NewBuffer(make([]byte, 0, expectedSize)).
Profile before you optimise. Guessing at bottlenecks wastes time and often misses the real issue. Ten minutes with pprof is more valuable than an hour of speculative refactoring. And always re-benchmark after the change to verify the improvement.
Check your understanding
Knowledge check
- 1.A function shows low flat time but high cumulative (cum) time in pprof output. What does this indicate?
- 2.It is safe to expose the net/http/pprof endpoint on a public internet-facing port.
- 3.Your CPU profile shows runtime.mallocgc accounting for 40% of total time. What should you investigate?
Do it yourself
Add the pprof endpoint to a small HTTP server and collect a 5-second CPU profile:
# Terminal 1
go run main.go # server with _ "net/http/pprof" imported
# Terminal 2
go tool pprof http://localhost:8080/debug/pprof/profile?seconds=5Try the top5, list, and web commands.
Where to go next
You can now find bottlenecks. The final lesson covers building and deploying — compile flags, cross-compilation, static binaries, Docker multi-stage builds, and go generate.
JSON encoding
Marshal structs to JSON and decode JSON into structs — struct tags, omitempty, streaming with json.Decoder, and custom marshalling.
Building and deploying
Produce optimised Go binaries with build flags, cross-compile for any platform, create static binaries, build Docker images with multi-stage builds, and use go generate.