A big part of programming is about making choices:
- What data type do I use? A map, a slice, something more fancy?
- Do I implement this using recursion or a loop?
- Should I pass a value or a pointer?
How to determine the best choice?
One way is to measure performance using benchmarks.
Go has built-in support for benchmarking in the testing
package.
In this article you’ll see how benchmarks are structured, what to keep in mind when writing them and how to execute them.
In the end we’ll also look at a real-world inspired example repository.
Human and machine performance
Before we dive in I want to emphasize something.
Human performance is often more important than technical performance. It’s possible to buy/rent more execution time, you can’t get more lifetime.
I’m not saying you should squander computing resources, but when in doubt, focus on aspects that enhance human performance. Things like readability, maintainability and documentation.
When you do optimizing for machine performance:
- Use profiling to make sure that existing code is actually a performance bottleneck before you optimize it.
- Don’t do it prematurely, make sure your code and its requirements are stable and well tested.
With that said, let’s continue on with benchmarking.
How to benchmark
Benchmarks work similarly to regular tests in Go.
You place them in *_test.go
files, and they must have a specific function signature. You can then run them by invoking the go test
command with the right configuration.
Function structure
They all need to start with Benchmark
and accept a *testing.B
parameter:
func BenchmarkXYZ(b *testing.B) {
// ...
}
The b
parameter, of type *testing.B
, functions similarly to the *testing.T
parameter in regular tests. It provides bookkeeping information and allows you to control benchmark execution from inside the function.
For reliable benchmarks, your code needs to run multiple times.
The number of runs is determined by the b.N
field, it’s set by the testing environment when tests are run.
By running a loop based on the b.N
field, we get the basic structure that should be used by all benchmarks:
func BenchmarkXYZ(b *testing.B) {
for range b.N {
// your code here.
}
}
b.N
is an integer. Since Go 1.22 it's possible to range over integers.
Running benchmarks
Benchmarks are run using the same command as regular tests, but with an additional -bench
flag that accepts a regular expression.
For example, to run all benchmarks in a directory:
go test -bench .
The .
is a regular expression that matches everything.
Output could then look something like this:
goos: linux
goarch: amd64
pkg: github.com/willemschots/benchmarkfun
cpu: 11th Gen Intel(R) Core(TM) i7-11370H @ 3.30GHz
Benchmark_example-8 36128 45894 ns/op
PASS
ok github.com/willemschots/benchmarkfun 1.997s
Each iteration of the benchmark loop is referred to as an “op”, or operation.
Here we see that the benchmark loop ran 36128
times and each iteration took 45894
nanoseconds on average.
Customizing benchmark time
By default each benchmark loop iterates until 1 second has passed. You might want to increase this time if your code takes longer to run, this can be done using the -benchtime
flag.
For example:
go test -bench . -benchtime 3s
Will run each benchmark loop for 3 seconds.
Dealing with compiler optimizations
If you’re seeing very low execution times (let’s say < 1 ns/op
) that likely means that the compiler is optimizing away your code.
In cases where “nothing” is done with function results, the compiler is smart enough to remove the entire call. The results aren’t used, after all, so why waste time executing the function?
It’s a bit unfortunate that this happens in benchmarks as well, where we might not care about the results, we do want to run our function.
To sidestep this issue, we need to do “something” with the results. This “something” is assigning to a global variable.
For example, if we’re benchmarking a DoSomething
function that returns an int
:
// global variable to prevent compiler optimization
var global int
func BenchmarkSidestepCompiler(b *testing.B) {
for range b.N {
// store result in the global variable
global = DoSomething()
}
}
Expensive setup code
If your benchmarks require expensive setup code that you donβt want to include in the measured benchmark time, you can reset the timer of the benchmark using the b.ResetTimer()
method.
For example:
func BenchmarkService(b *testing.B) {
svc := ExpensiveSetup()
// reset the timer after expensive setup.
b.ResetTimer()
for range b.N {
// benchmark svc here.
}
}
Memory allocations
By default, benchmarks only print execution time. If you want to also display memory allocations, run your go test
command with the -benchmem
flag.
For example:
go test -bench . -benchmem
Will run all benchmarks in the current directory and print out memory allocations.
The output of this would look something like this:
goos: linux
goarch: amd64
pkg: <packagename>
cpu: 11th Gen Intel(R) Core(TM) i7-11370H @ 3.30GHz
Benchmark_example-8 35233 32910 ns/op 368641 B/op 2 allocs/op
PASS
ok <packagename> 1.507s
Note the two new columns at the end (you might need to scroll horizontally for this):
368641 B/op
: Indicates that per benchmark loop iteration,368641
bytes were allocated on average.2 allocs/op
: Indicates that per benchmark loop iteration,2
allocations were performed.
Memory allocations are relatively slow, so minimizing them is an effective way to improve performance.
Full example
Now that we have seen the elements of a benchmark, let’s see how it all ties together in a real scenario.
In response to my article on sets, my friend Peter Aba suggested a great benchmarking scenario: Finding the intersection of two sets.
A set is a group of unique elements. The intersection of two sets, is the group of common elements between them.
During my interview at Cup ‘O Go, Jonathan Hall suggested that maps of bools were faster than maps of empty structs in some cases.
Let’s combine these two suggestions and benchmark the Intersect
method on different set implementations.
For our demo scenario we’ll also include a slice based implementation, since this has a very different performance profile.
We’ll assume all set elements are strings.
Name | Type | Description |
---|---|---|
SliceSet |
[]string |
Set implemented as a slice of strings. |
BoolMapSet |
map[string]bool |
Set implemented as a map of booleans. |
StructMapSet |
map[string]struct{} |
Set implemented as a map of empty structs. |
You can find the full code on Github.
Example benchmarks
For each set implementation we run a number of benchmarks.
Below you can see the benchmark for the StructMapSet
implementation. But the benchmarks for SliceSet
and BoolMapSet
are very similar.
The benchmark test data is provided by the benchmarks
function.
The benchmark code loops over these benchmarks and runs a sub-benchmark for each item.
Inside the sub-benchmark two sets are created and the benchmark logic is executed.
var structMapSetGlobal int
func Benchmark_StructMapSet(b *testing.B) {
for _, tc := range benchmarks() {
b.Run(tc.name, func(b *testing.B) {
// set up the sets
s1 := benchmarkfun.NewStructMapSet(tc.s1...)
s2 := benchmarkfun.NewStructMapSet(tc.s2...)
// set up is not part of the benchmark.
b.ResetTimer()
// we now need to be careful the compiler
// doesn't optimize away our benchmark.
//
// To do this, we need to assign our results
// to a global variable at the end of the benchmark.
var result benchmarkfun.StructMapSet
// the benchmark
for range b.N {
result = s1.Intersection(s2)
}
// global variable
structMapSetGlobal = len(result)
})
}
}
Running the benchmarks
Running the benchmarks is done using:
go test -bench=. -benchmem
On my machine this gives the following output:
goos: linux
goarch: amd64
pkg: github.com/willemschots/benchmarkfun
cpu: 11th Gen Intel(R) Core(TM) i7-11370H @ 3.30GHz
Benchmark_BoolMapSet/10,_10_no_intersection-8 5561168 225.3 ns/op 48 B/op 1 allocs/op
Benchmark_BoolMapSet/10,_10_full_intersection-8 1902493 616.4 ns/op 531 B/op 3 allocs/op
Benchmark_BoolMapSet/100,_100_no_intersection-8 593454 1796 ns/op 48 B/op 1 allocs/op
Benchmark_BoolMapSet/100,_100_full_intersection-8 113373 9167 ns/op 5855 B/op 11 allocs/op
Benchmark_BoolMapSet/1000,_1000_no_intersection-8 31088 33677 ns/op 48 B/op 1 allocs/op
Benchmark_BoolMapSet/1000,_1000_full_intersection-8 10000 111003 ns/op 96537 B/op 37 allocs/op
Benchmark_SliceSet/10,_10_no_intersection-8 3604327 341.5 ns/op 0 B/op 0 allocs/op
Benchmark_SliceSet/10,_10_full_intersection-8 2098922 568.8 ns/op 496 B/op 5 allocs/op
Benchmark_SliceSet/100,_100_no_intersection-8 40159 28588 ns/op 0 B/op 0 allocs/op
Benchmark_SliceSet/100,_100_full_intersection-8 38872 31385 ns/op 4464 B/op 8 allocs/op
Benchmark_SliceSet/1000,_1000_no_intersection-8 415 2794903 ns/op 0 B/op 0 allocs/op
Benchmark_SliceSet/1000,_1000_full_intersection-8 423 2768507 ns/op 35184 B/op 11 allocs/op
Benchmark_StructMapSet/10,_10_no_intersection-8 5059527 229.0 ns/op 48 B/op 1 allocs/op
Benchmark_StructMapSet/10,_10_full_intersection-8 1988473 616.8 ns/op 483 B/op 3 allocs/op
Benchmark_StructMapSet/100,_100_no_intersection-8 549928 1930 ns/op 48 B/op 1 allocs/op
Benchmark_StructMapSet/100,_100_full_intersection-8 131402 8650 ns/op 5608 B/op 10 allocs/op
Benchmark_StructMapSet/1000,_1000_no_intersection-8 36765 32642 ns/op 48 B/op 1 allocs/op
Benchmark_StructMapSet/1000,_1000_full_intersection-8 10000 117866 ns/op 85688 B/op 35 allocs/op
PASS
ok github.com/willemschots/benchmarkfun 25.613s
Quite a bit of text. Let’s go through it.
In these benchmarks we’re only checking two extremes: No intersection, or full intersection (the two sets are the same).
For all implementations you can see that:
- Execution time increases with set sizes.
- Allocations increase depending on the size of the intersection.
- Allocations are the same when there is no intersection.
It’s clear that SliceSet
had the slowest execution time for larger sets but allocated the least memory overall.
StructMapSet
and BoolMapSet
perform similarly.
But, this is just a single run of our benchmarks and it’s not really statistically relevant.
To do a proper analysis we will look at the benchstat
program in a follow up article.
Be sure to subscribe to my newsletter to get a notification when it’s out.
Summary
In this article, we’ve explored how to write, run, and interpret benchmarks in Go.
Most importantly we’ve discussed:
- The structure of benchmark tests and the benchmark loop.
- How to prevent the compiler optimizing our code away by assigning to a global variable.
- How to run benchmarks using
go test -bench .
. - How to print memory allocations using the
-benchmem
flag.
Happy coding!
Further reading
- Great article by Teiva Harsanyi, which provides advanced insights into writing accurate benchmarks and performance optimization.
- Classic article on Go benchmarks by Dave Cheney.
- List of all flags for the
go test
command.
Get my free newsletter every second week
Used by 500+ developers to boost their Go skills.
"I'll share tips, interesting links and new content. You'll also get a brief guide to time for developers for free."