Performance

Key findings

Slice with preallocation is the fastest option across all operations (Enqueue, Dequeue, Push, Pop) on both amd64 and arm64.
Preallocating always helps. Passing a sizeHint to the constructor eliminates growth allocations in the steady state and improves throughput in every case.
Rankings are consistent across Go 1.18 and Go 1.24, and across amd64 and arm64 architectures. The relative ordering of implementations does not change with Go version or CPU architecture.
The overall ranking from fastest to slowest: Slice > ListWithInternalPool > List > ListWithSyncPool

Benchmark data

ARM64 (Apple M3 Pro, Go 1.24.1)

Storage	Queue.Enqueue	Queue.Dequeue	Stack.Push	Stack.Pop	Ranking
Slice prealloc	2.0 ns/op	1.4 ns/op	2.0 ns/op	1.1 ns/op	1+1+1+2 = 5
Slice raw	3.8 ns/op	2.2 ns/op	3.4 ns/op	0.8 ns/op	3+3+3+1 = 10
ListIP prealloc	2.2 ns/op	2.5 ns/op	2.0 ns/op	2.6 ns/op	2+4+1+5 = 12
List	23.6 ns/op	1.4 ns/op	27.9 ns/op	1.6 ns/op	4+1+6+4 = 15
ListIP raw	23.7 ns/op	6.0 ns/op	23.8 ns/op	1.3 ns/op	5+5+5+3 = 18
ListSP prealloc	28.1 ns/op	7.6 ns/op	9.4 ns/op	7.6 ns/op	6+6+4+6 = 22
ListSP raw	36.5 ns/op	13.2 ns/op	36.4 ns/op	7.9 ns/op	7+7+7+7 = 28

AMD64 (Intel Core i7-4980HQ @ 2.80GHz, Go 1.18.0)

Storage	Queue.Enqueue	Queue.Dequeue	Stack.Push	Stack.Pop	Ranking
Slice prealloc	9 ns/op	10 ns/op	8 ns/op	7 ns/op	1+1+1+2 = 5
Slice raw	25 ns/op	11 ns/op	20 ns/op	5 ns/op	3+3+3+1 = 10
ListIP prealloc	17 ns/op	14 ns/op	11 ns/op	11 ns/op	2+4+2+4 = 12
List	144 ns/op	10 ns/op	130 ns/op	56 ns/op	5+2+6+6 = 19
ListIP raw	139 ns/op	84 ns/op	121 ns/op	9 ns/op	4+7+5+3 = 19
ListSP prealloc	533 ns/op	52 ns/op	96 ns/op	18 ns/op	7+5+5+5 = 22
ListSP raw	368 ns/op	56 ns/op	385 ns/op	61 ns/op	6+6+7+6 = 25

Ranking totals sum the per-operation rank across all four operations. A lower total means better overall throughput. >> in the per-operation rankings below indicates a large performance drop.

Per-operation ranking

Enqueue: Slice prealloc > ListIP prealloc > Slice raw       >> ListIP raw > List > ListSP raw > ListSP prealloc
Dequeue: Slice prealloc > List            > Slice raw > ListIP prealloc >> ListSP prealloc > ListSP raw > ListIP raw
Push:    Slice prealloc > ListIP prealloc > Slice raw        >> ListSP prealloc > ListIP raw > List > ListSP raw
Pop:     Slice raw      > Slice prealloc  > ListIP raw > ListIP prealloc > ListSP prealloc >> List > ListSP raw

Performance tips

Always provide a size hint

Passing a sizeHint avoids repeated backing-array growth and is the single highest-impact tuning you can do. Even a rough estimate is better than 0.

// Preferred: preallocate for the expected working-set size
q := queue.NewSliceQueue[MyJob](256)

// Acceptable if capacity is truly unknown
q := queue.NewSliceQueue[MyJob](0)

Prefer slice-based implementations

Unless you have profiled a specific allocation bottleneck, start with the slice-backed implementation. On both amd64 and arm64, across Go 1.18 and Go 1.24, it consistently outperforms all linked-list variants when preallocated.

// Queue
q := queue.NewSliceQueue[MyEvent](capacity)

// Stack
s := stack.NewSliceStack[MyItem](capacity)

Avoid sync.Pool-based implementations

ListWithSyncPool (ListSP) ranks last in nearly every benchmark. The sync.Pool GC interaction and coordination overhead outweighs any allocation savings for typical queue and stack workloads.

Do not switch to ListWithSyncPool without first profiling your application and confirming a measurable improvement. Benchmark data shows it is consistently the slowest option.

Set and OrderedMap have no alternatives to tune

Only one implementation exists for each:

Set: set.NewBasicMap — map-backed, no alternative implementations.
OrderedMap: orderedmap.NewSlice — slice-backed, no alternative implementations.

Both are well-optimized for general use. Providing a size hint is still the primary lever available to you.

// Set with size hint
s := set.NewBasicMap[string](128)

// OrderedMap with size hint
om := orderedmap.NewSlice[string, int](128, true /* stable updates */)

Running your own benchmarks

Run benchmarks against the queue and stack packages:

go test -bench=. -benchmem ./queue/...
go test -bench=. -benchmem ./stack/...

Run benchmarks on your own hardware and Go version. While rankings are consistent across architectures and Go versions, absolute ns/op values vary by CPU and runtime. Use your own measurements to set concrete size hints and capacity targets for your specific environment.

Get Started

Data Structures

Guides

Key findings

Benchmark data

ARM64 (Apple M3 Pro, Go 1.24.1)

AMD64 (Intel Core i7-4980HQ @ 2.80GHz, Go 1.18.0)

Per-operation ranking

Performance tips

Always provide a size hint

Prefer slice-based implementations

Avoid sync.Pool-based implementations

Set and OrderedMap have no alternatives to tune

Running your own benchmarks

​Key findings

​Benchmark data

​ARM64 (Apple M3 Pro, Go 1.24.1)

​AMD64 (Intel Core i7-4980HQ @ 2.80GHz, Go 1.18.0)

​Per-operation ranking

​Performance tips

​Always provide a size hint

​Prefer slice-based implementations

​Avoid sync.Pool-based implementations

​Set and OrderedMap have no alternatives to tune

​Running your own benchmarks

Key findings

Benchmark data

ARM64 (Apple M3 Pro, Go 1.24.1)

AMD64 (Intel Core i7-4980HQ @ 2.80GHz, Go 1.18.0)

Per-operation ranking

Performance tips

Always provide a size hint

Prefer slice-based implementations

Avoid sync.Pool-based implementations

Set and OrderedMap have no alternatives to tune

Running your own benchmarks