Benchmarks and performance testing

Avatar of the author Willem Schots
24 Oct, 2024 (Updated 25 Oct, 2024)
~8 min.
RSS

A big part of programming is about making choices:

  • What data type do I use? A map, a slice, something more fancy?
  • Do I implement this using recursion or a loop?
  • Should I pass a value or a pointer?

How to determine the best choice?

One way is to measure performance using benchmarks.

Go has built-in support for benchmarking in the testing package.

In this article you’ll see how benchmarks are structured, what to keep in mind when writing them and how to execute them.

In the end we’ll also look at a real-world inspired example repository.

Human and machine performance

Before we dive in I want to emphasize something.

Human performance is often more important than technical performance. It’s possible to buy/rent more execution time, you can’t get more lifetime.

I’m not saying you should squander computing resources, but when in doubt, focus on aspects that enhance human performance. Things like readability, maintainability and documentation.

When you do optimizing for machine performance:

  • Use profiling to make sure that existing code is actually a performance bottleneck before you optimize it.
  • Don’t do it prematurely, make sure your code and its requirements are stable and well tested.

With that said, let’s continue on with benchmarking.

How to benchmark

Benchmarks work similarly to regular tests in Go.

You place them in *_test.go files, and they must have a specific function signature. You can then run them by invoking the go test command with the right configuration.

Function structure

They all need to start with Benchmark and accept a *testing.B parameter:

func BenchmarkXYZ(b *testing.B) {
	// ...
}

The b parameter, of type *testing.B, functions similarly to the *testing.T parameter in regular tests. It provides bookkeeping information and allows you to control benchmark execution from inside the function.

For reliable benchmarks, your code needs to run multiple times.

The number of runs is determined by the b.N field, it’s set by the testing environment when tests are run.

By running a loop based on the b.N field, we get the basic structure that should be used by all benchmarks:

func BenchmarkXYZ(b *testing.B) {
	for range b.N {
		// your code here.
	}
}

b.N is an integer. Since Go 1.22 it's possible to range over integers.

Running benchmarks

Benchmarks are run using the same command as regular tests, but with an additional -bench flag that accepts a regular expression.

For example, to run all benchmarks in a directory:

go test -bench .

The . is a regular expression that matches everything.

Output could then look something like this:

goos: linux
goarch: amd64
pkg: github.com/willemschots/benchmarkfun
cpu: 11th Gen Intel(R) Core(TM) i7-11370H @ 3.30GHz
Benchmark_example-8        36128             45894 ns/op
PASS
ok      github.com/willemschots/benchmarkfun    1.997s

Each iteration of the benchmark loop is referred to as an “op”, or operation.

Here we see that the benchmark loop ran 36128 times and each iteration took 45894 nanoseconds on average.

Customizing benchmark time

By default each benchmark loop iterates until 1 second has passed. You might want to increase this time if your code takes longer to run, this can be done using the -benchtime flag.

For example:

go test -bench . -benchtime 3s

Will run each benchmark loop for 3 seconds.

Dealing with compiler optimizations

If you’re seeing very low execution times (let’s say < 1 ns/op) that likely means that the compiler is optimizing away your code.

In cases where “nothing” is done with function results, the compiler is smart enough to remove the entire call. The results aren’t used, after all, so why waste time executing the function?

It’s a bit unfortunate that this happens in benchmarks as well, where we might not care about the results, we do want to run our function.

To sidestep this issue, we need to do “something” with the results. This “something” is assigning to a global variable.

For example, if we’re benchmarking a DoSomething function that returns an int:

// global variable to prevent compiler optimization
var global int

func BenchmarkSidestepCompiler(b *testing.B) {
	for range b.N {
		// store result in the global variable
		global = DoSomething()
	}
}

Expensive setup code

If your benchmarks require expensive setup code that you don’t want to include in the measured benchmark time, you can reset the timer of the benchmark using the b.ResetTimer() method.

For example:

func BenchmarkService(b *testing.B) {
	svc := ExpensiveSetup()

	// reset the timer after expensive setup.
	b.ResetTimer()

	for range b.N {
		// benchmark svc here.
	}
}

Memory allocations

By default, benchmarks only print execution time. If you want to also display memory allocations, run your go test command with the -benchmem flag.

For example:

go test -bench . -benchmem

Will run all benchmarks in the current directory and print out memory allocations.

The output of this would look something like this:

goos: linux
goarch: amd64
pkg: <packagename>
cpu: 11th Gen Intel(R) Core(TM) i7-11370H @ 3.30GHz
Benchmark_example-8        35233             32910 ns/op          368641 B/op          2 allocs/op
PASS
ok      <packagename>    1.507s

Note the two new columns at the end (you might need to scroll horizontally for this):

  • 368641 B/op: Indicates that per benchmark loop iteration, 368641 bytes were allocated on average.
  • 2 allocs/op: Indicates that per benchmark loop iteration, 2 allocations were performed.

Memory allocations are relatively slow, so minimizing them is an effective way to improve performance.

Full example

Now that we have seen the elements of a benchmark, let’s see how it all ties together in a real scenario.

In response to my article on sets, my friend Peter Aba suggested a great benchmarking scenario: Finding the intersection of two sets.

A set is a group of unique elements. The intersection of two sets, is the group of common elements between them.

set intersect in a diagram

During my interview at Cup ‘O Go, Jonathan Hall suggested that maps of bools were faster than maps of empty structs in some cases.

Let’s combine these two suggestions and benchmark the Intersect method on different set implementations.

For our demo scenario we’ll also include a slice based implementation, since this has a very different performance profile.

We’ll assume all set elements are strings.

Name Type Description
SliceSet []string Set implemented as a slice of strings.
BoolMapSet map[string]bool Set implemented as a map of booleans.
StructMapSet map[string]struct{} Set implemented as a map of empty structs.

You can find the full code on Github.

Example benchmarks

For each set implementation we run a number of benchmarks.

Below you can see the benchmark for the StructMapSet implementation. But the benchmarks for SliceSet and BoolMapSet are very similar.

The benchmark test data is provided by the benchmarks function.

The benchmark code loops over these benchmarks and runs a sub-benchmark for each item.

Inside the sub-benchmark two sets are created and the benchmark logic is executed.


var structMapSetGlobal int

func Benchmark_StructMapSet(b *testing.B) {
	for _, tc := range benchmarks() {
		b.Run(tc.name, func(b *testing.B) {
			// set up the sets
			s1 := benchmarkfun.NewStructMapSet(tc.s1...)
			s2 := benchmarkfun.NewStructMapSet(tc.s2...)

			// set up is not part of the benchmark.
			b.ResetTimer()

			// we now need to be careful the compiler
			// doesn't optimize away our benchmark.
			//
			// To do this, we need to assign our results
			// to a global variable at the end of the benchmark.
			var result benchmarkfun.StructMapSet

			// the benchmark
			for range b.N {
				result = s1.Intersection(s2)
			}

			// global variable
			structMapSetGlobal = len(result)
		})
	}
}

Running the benchmarks

Running the benchmarks is done using:

go test -bench=. -benchmem

On my machine this gives the following output:

goos: linux
goarch: amd64
pkg: github.com/willemschots/benchmarkfun
cpu: 11th Gen Intel(R) Core(TM) i7-11370H @ 3.30GHz
Benchmark_BoolMapSet/10,_10_no_intersection-8            5561168               225.3 ns/op            48 B/op          1 allocs/op
Benchmark_BoolMapSet/10,_10_full_intersection-8          1902493               616.4 ns/op           531 B/op          3 allocs/op
Benchmark_BoolMapSet/100,_100_no_intersection-8           593454              1796 ns/op              48 B/op          1 allocs/op
Benchmark_BoolMapSet/100,_100_full_intersection-8         113373              9167 ns/op            5855 B/op         11 allocs/op
Benchmark_BoolMapSet/1000,_1000_no_intersection-8          31088             33677 ns/op              48 B/op          1 allocs/op
Benchmark_BoolMapSet/1000,_1000_full_intersection-8        10000            111003 ns/op           96537 B/op         37 allocs/op
Benchmark_SliceSet/10,_10_no_intersection-8              3604327               341.5 ns/op             0 B/op          0 allocs/op
Benchmark_SliceSet/10,_10_full_intersection-8            2098922               568.8 ns/op           496 B/op          5 allocs/op
Benchmark_SliceSet/100,_100_no_intersection-8              40159             28588 ns/op               0 B/op          0 allocs/op
Benchmark_SliceSet/100,_100_full_intersection-8            38872             31385 ns/op            4464 B/op          8 allocs/op
Benchmark_SliceSet/1000,_1000_no_intersection-8              415           2794903 ns/op               0 B/op          0 allocs/op
Benchmark_SliceSet/1000,_1000_full_intersection-8            423           2768507 ns/op           35184 B/op         11 allocs/op
Benchmark_StructMapSet/10,_10_no_intersection-8          5059527               229.0 ns/op            48 B/op          1 allocs/op
Benchmark_StructMapSet/10,_10_full_intersection-8        1988473               616.8 ns/op           483 B/op          3 allocs/op
Benchmark_StructMapSet/100,_100_no_intersection-8         549928              1930 ns/op              48 B/op          1 allocs/op
Benchmark_StructMapSet/100,_100_full_intersection-8       131402              8650 ns/op            5608 B/op         10 allocs/op
Benchmark_StructMapSet/1000,_1000_no_intersection-8        36765             32642 ns/op              48 B/op          1 allocs/op
Benchmark_StructMapSet/1000,_1000_full_intersection-8      10000            117866 ns/op           85688 B/op         35 allocs/op
PASS
ok      github.com/willemschots/benchmarkfun    25.613s

Quite a bit of text. Let’s go through it.

In these benchmarks we’re only checking two extremes: No intersection, or full intersection (the two sets are the same).

For all implementations you can see that:

  • Execution time increases with set sizes.
  • Allocations increase depending on the size of the intersection.
  • Allocations are the same when there is no intersection.

It’s clear that SliceSet had the slowest execution time for larger sets but allocated the least memory overall.

StructMapSet and BoolMapSet perform similarly.

But, this is just a single run of our benchmarks and it’s not really statistically relevant.

To do a proper analysis we will look at the benchstat program in a follow up article.

Be sure to subscribe to my newsletter to get a notification when it’s out.

Summary

In this article, we’ve explored how to write, run, and interpret benchmarks in Go.

Most importantly we’ve discussed:

  • The structure of benchmark tests and the benchmark loop.
  • How to prevent the compiler optimizing our code away by assigning to a global variable.
  • How to run benchmarks using go test -bench ..
  • How to print memory allocations using the -benchmem flag.

Happy coding!

Further reading

Get my free newsletter every second week

Used by 500+ developers to boost their Go skills.

"I'll share tips, interesting links and new content. You'll also get a brief guide to time for developers for free."

Avatar of the author
Willem Schots

Hello! I'm the Willem behind willem.dev

I created this website to help new Go developers, I hope it brings you some value! :)

You can follow me on Bluesky, Twitter/X or LinkedIn.

Thanks for reading!