Skip to main content

Key findings

  • Slice with preallocation is the fastest option across all operations (Enqueue, Dequeue, Push, Pop) on both amd64 and arm64.
  • Preallocating always helps. Passing a sizeHint to the constructor eliminates growth allocations in the steady state and improves throughput in every case.
  • Rankings are consistent across Go 1.18 and Go 1.24, and across amd64 and arm64 architectures. The relative ordering of implementations does not change with Go version or CPU architecture.
  • The overall ranking from fastest to slowest: Slice > ListWithInternalPool > List > ListWithSyncPool

Benchmark data

ARM64 (Apple M3 Pro, Go 1.24.1)

StorageQueue.EnqueueQueue.DequeueStack.PushStack.PopRanking
Slice prealloc2.0 ns/op1.4 ns/op2.0 ns/op1.1 ns/op1+1+1+2 = 5
Slice raw3.8 ns/op2.2 ns/op3.4 ns/op0.8 ns/op3+3+3+1 = 10
ListIP prealloc2.2 ns/op2.5 ns/op2.0 ns/op2.6 ns/op2+4+1+5 = 12
List23.6 ns/op1.4 ns/op27.9 ns/op1.6 ns/op4+1+6+4 = 15
ListIP raw23.7 ns/op6.0 ns/op23.8 ns/op1.3 ns/op5+5+5+3 = 18
ListSP prealloc28.1 ns/op7.6 ns/op9.4 ns/op7.6 ns/op6+6+4+6 = 22
ListSP raw36.5 ns/op13.2 ns/op36.4 ns/op7.9 ns/op7+7+7+7 = 28

AMD64 (Intel Core i7-4980HQ @ 2.80GHz, Go 1.18.0)

StorageQueue.EnqueueQueue.DequeueStack.PushStack.PopRanking
Slice prealloc9 ns/op10 ns/op8 ns/op7 ns/op1+1+1+2 = 5
Slice raw25 ns/op11 ns/op20 ns/op5 ns/op3+3+3+1 = 10
ListIP prealloc17 ns/op14 ns/op11 ns/op11 ns/op2+4+2+4 = 12
List144 ns/op10 ns/op130 ns/op56 ns/op5+2+6+6 = 19
ListIP raw139 ns/op84 ns/op121 ns/op9 ns/op4+7+5+3 = 19
ListSP prealloc533 ns/op52 ns/op96 ns/op18 ns/op7+5+5+5 = 22
ListSP raw368 ns/op56 ns/op385 ns/op61 ns/op6+6+7+6 = 25
Ranking totals sum the per-operation rank across all four operations. A lower total means better overall throughput. >> in the per-operation rankings below indicates a large performance drop.

Per-operation ranking

Enqueue: Slice prealloc > ListIP prealloc > Slice raw       >> ListIP raw > List > ListSP raw > ListSP prealloc
Dequeue: Slice prealloc > List            > Slice raw > ListIP prealloc >> ListSP prealloc > ListSP raw > ListIP raw
Push:    Slice prealloc > ListIP prealloc > Slice raw        >> ListSP prealloc > ListIP raw > List > ListSP raw
Pop:     Slice raw      > Slice prealloc  > ListIP raw > ListIP prealloc > ListSP prealloc >> List > ListSP raw

Performance tips

Always provide a size hint

Passing a sizeHint avoids repeated backing-array growth and is the single highest-impact tuning you can do. Even a rough estimate is better than 0.
// Preferred: preallocate for the expected working-set size
q := queue.NewSliceQueue[MyJob](256)

// Acceptable if capacity is truly unknown
q := queue.NewSliceQueue[MyJob](0)

Prefer slice-based implementations

Unless you have profiled a specific allocation bottleneck, start with the slice-backed implementation. On both amd64 and arm64, across Go 1.18 and Go 1.24, it consistently outperforms all linked-list variants when preallocated.
// Queue
q := queue.NewSliceQueue[MyEvent](capacity)

// Stack
s := stack.NewSliceStack[MyItem](capacity)

Avoid sync.Pool-based implementations

ListWithSyncPool (ListSP) ranks last in nearly every benchmark. The sync.Pool GC interaction and coordination overhead outweighs any allocation savings for typical queue and stack workloads.
Do not switch to ListWithSyncPool without first profiling your application and confirming a measurable improvement. Benchmark data shows it is consistently the slowest option.

Set and OrderedMap have no alternatives to tune

Only one implementation exists for each:
  • Set: set.NewBasicMap — map-backed, no alternative implementations.
  • OrderedMap: orderedmap.NewSlice — slice-backed, no alternative implementations.
Both are well-optimized for general use. Providing a size hint is still the primary lever available to you.
// Set with size hint
s := set.NewBasicMap[string](128)

// OrderedMap with size hint
om := orderedmap.NewSlice[string, int](128, true /* stable updates */)

Running your own benchmarks

Run benchmarks against the queue and stack packages:
go test -bench=. -benchmem ./queue/...
go test -bench=. -benchmem ./stack/...
Run benchmarks on your own hardware and Go version. While rankings are consistent across architectures and Go versions, absolute ns/op values vary by CPU and runtime. Use your own measurements to set concrete size hints and capacity targets for your specific environment.