Key findings
- Slice with preallocation is the fastest option across all operations (Enqueue, Dequeue, Push, Pop) on both amd64 and arm64.
- Preallocating always helps. Passing a
sizeHint to the constructor eliminates growth allocations in the steady state and improves throughput in every case.
- Rankings are consistent across Go 1.18 and Go 1.24, and across amd64 and arm64 architectures. The relative ordering of implementations does not change with Go version or CPU architecture.
- The overall ranking from fastest to slowest: Slice > ListWithInternalPool > List > ListWithSyncPool
Benchmark data
ARM64 (Apple M3 Pro, Go 1.24.1)
| Storage | Queue.Enqueue | Queue.Dequeue | Stack.Push | Stack.Pop | Ranking |
|---|
| Slice prealloc | 2.0 ns/op | 1.4 ns/op | 2.0 ns/op | 1.1 ns/op | 1+1+1+2 = 5 |
| Slice raw | 3.8 ns/op | 2.2 ns/op | 3.4 ns/op | 0.8 ns/op | 3+3+3+1 = 10 |
| ListIP prealloc | 2.2 ns/op | 2.5 ns/op | 2.0 ns/op | 2.6 ns/op | 2+4+1+5 = 12 |
| List | 23.6 ns/op | 1.4 ns/op | 27.9 ns/op | 1.6 ns/op | 4+1+6+4 = 15 |
| ListIP raw | 23.7 ns/op | 6.0 ns/op | 23.8 ns/op | 1.3 ns/op | 5+5+5+3 = 18 |
| ListSP prealloc | 28.1 ns/op | 7.6 ns/op | 9.4 ns/op | 7.6 ns/op | 6+6+4+6 = 22 |
| ListSP raw | 36.5 ns/op | 13.2 ns/op | 36.4 ns/op | 7.9 ns/op | 7+7+7+7 = 28 |
AMD64 (Intel Core i7-4980HQ @ 2.80GHz, Go 1.18.0)
| Storage | Queue.Enqueue | Queue.Dequeue | Stack.Push | Stack.Pop | Ranking |
|---|
| Slice prealloc | 9 ns/op | 10 ns/op | 8 ns/op | 7 ns/op | 1+1+1+2 = 5 |
| Slice raw | 25 ns/op | 11 ns/op | 20 ns/op | 5 ns/op | 3+3+3+1 = 10 |
| ListIP prealloc | 17 ns/op | 14 ns/op | 11 ns/op | 11 ns/op | 2+4+2+4 = 12 |
| List | 144 ns/op | 10 ns/op | 130 ns/op | 56 ns/op | 5+2+6+6 = 19 |
| ListIP raw | 139 ns/op | 84 ns/op | 121 ns/op | 9 ns/op | 4+7+5+3 = 19 |
| ListSP prealloc | 533 ns/op | 52 ns/op | 96 ns/op | 18 ns/op | 7+5+5+5 = 22 |
| ListSP raw | 368 ns/op | 56 ns/op | 385 ns/op | 61 ns/op | 6+6+7+6 = 25 |
Ranking totals sum the per-operation rank across all four operations. A lower total means better overall throughput. >> in the per-operation rankings below indicates a large performance drop.
Per-operation ranking
Enqueue: Slice prealloc > ListIP prealloc > Slice raw >> ListIP raw > List > ListSP raw > ListSP prealloc
Dequeue: Slice prealloc > List > Slice raw > ListIP prealloc >> ListSP prealloc > ListSP raw > ListIP raw
Push: Slice prealloc > ListIP prealloc > Slice raw >> ListSP prealloc > ListIP raw > List > ListSP raw
Pop: Slice raw > Slice prealloc > ListIP raw > ListIP prealloc > ListSP prealloc >> List > ListSP raw
Always provide a size hint
Passing a sizeHint avoids repeated backing-array growth and is the single highest-impact tuning you can do. Even a rough estimate is better than 0.
// Preferred: preallocate for the expected working-set size
q := queue.NewSliceQueue[MyJob](256)
// Acceptable if capacity is truly unknown
q := queue.NewSliceQueue[MyJob](0)
Prefer slice-based implementations
Unless you have profiled a specific allocation bottleneck, start with the slice-backed implementation. On both amd64 and arm64, across Go 1.18 and Go 1.24, it consistently outperforms all linked-list variants when preallocated.
// Queue
q := queue.NewSliceQueue[MyEvent](capacity)
// Stack
s := stack.NewSliceStack[MyItem](capacity)
Avoid sync.Pool-based implementations
ListWithSyncPool (ListSP) ranks last in nearly every benchmark. The sync.Pool GC interaction and coordination overhead outweighs any allocation savings for typical queue and stack workloads.
Do not switch to ListWithSyncPool without first profiling your application and confirming a measurable improvement. Benchmark data shows it is consistently the slowest option.
Set and OrderedMap have no alternatives to tune
Only one implementation exists for each:
- Set:
set.NewBasicMap — map-backed, no alternative implementations.
- OrderedMap:
orderedmap.NewSlice — slice-backed, no alternative implementations.
Both are well-optimized for general use. Providing a size hint is still the primary lever available to you.
// Set with size hint
s := set.NewBasicMap[string](128)
// OrderedMap with size hint
om := orderedmap.NewSlice[string, int](128, true /* stable updates */)
Running your own benchmarks
Run benchmarks against the queue and stack packages:
go test -bench=. -benchmem ./queue/...
go test -bench=. -benchmem ./stack/...
Run benchmarks on your own hardware and Go version. While rankings are consistent across architectures and Go versions, absolute ns/op values vary by CPU and runtime. Use your own measurements to set concrete size hints and capacity targets for your specific environment.