SIMD - Single Input Multiple Data¶
SIMD describes the ability the process a single operation simultaneously on multiple data. This can give a big edge on independent data which can be processed in parallel. More information can be found here.
Compare two lists with SIMD¶
To speed-up the process of two lists, we can vectorize those lists and compare multiple chunks in parallel. To vectorize a list we can use SSE (S IMD S treaming E xtensions) in combination with some helper tools from the .NET Framework itself.
❌ Bad Use LINQ queries, which can be slow.
var isEqual = list1.SequenceEquals(list2);
✅ Good Use SSE to vectorize the list and check in parallel if the chunks are the same.
public bool SIMDContains()
{
// Lists are not equal when the count is different
if (list1.Count != list2.Count)
return false;
// Create a Span<Vector<int>> from our original list.
// We don't have to worry about the internal size, etc...
var list1AsVector = MemoryMarshal.Cast<int, Vector<int>>(CollectionsMarshal.AsSpan(list1));
var list2AsVector = MemoryMarshal.Cast<int, Vector<int>>(CollectionsMarshal.AsSpan(list2));
for (var i = 0; i < list1AsVector.Length; i++)
{
// Compare the chunks of list1 and list2
if (!Vector.EqualsAll(list1AsVector[i], list2AsVector[i]))
return false;
}
return true;
}
💡 Info: Check with
Vector.IsHardwareAccelerated
if SSE is available. If not the emulated / software version can be slower than LINQ due to the overhead.
Benchmark¶
public class Benchmark
{
private readonly List<int> list1 = Enumerable.Range(0, 1_000).ToList();
private readonly List<int> list2 = Enumerable.Range(0, 1_000).ToList();
[Benchmark(Baseline = true)]
public bool LINQSequenceEquals() => list1.SequenceEqual(list2);
[Benchmark]
public bool SIMDEquals()
{
if (list1.Count != list2.Count)
return false;
var list1AsVector = MemoryMarshal.Cast<int, Vector<int>>(CollectionsMarshal.AsSpan(list1));
var list2AsVector = MemoryMarshal.Cast<int, Vector<int>>(CollectionsMarshal.AsSpan(list2));
for (var i = 0; i < list1AsVector.Length; i++)
{
if (!Vector.EqualsAll(list1AsVector[i], list2AsVector[i]))
return false;
}
return true;
}
}
Results:
| Method | Mean | Error | StdDev | Ratio |
|------------------- |-----------:|---------:|---------:|------:|
| LINQSequenceEquals | 3,487.4 ns | 13.72 ns | 10.71 ns | 1.00 |
| SIMDEquals | 137.3 ns | 0.98 ns | 0.91 ns | 0.04 |
Get the sum of a list or array¶
SIMD can be used to divide and conquer the problem of retrieving the sum of a list. The idea is to cut the list in smaller chunks and add them up via SIMD instuctions. This approach can faster the speed significantly.
❌ Bad Using plain old LINQ to get a sum of a list or array of integers.
_list.Sum();
✅ Good Using SIMD instructions to divide and conquer and parallelize the addition of the individual vectors.
public int SumSIMD()
{
var accVector = Vector<int>.Zero;
// For any array use
// var spanOfVectors = MemoryMarshal.Cast<int, Vector<int>>(new Span<int>(myArray));
var spanOfVectors = MemoryMarshal.Cast<int, Vector<int>>(CollectionsMarshal.AsSpan(_list));
foreach (var vector in spanOfVectors)
{
accVector = Vector.Add(accVector, vector);
}
// Scalar-Product of our vector against the Unit vector is its sum
var result = Vector.Dot(accVector, Vector<int>.One);
return result;
}
Benchmark¶
public class Benchmark
{
private readonly List<int> _list = Enumerable.Range(0, 1_000).ToList();
[Benchmark(Baseline = true)]
public int ListSum() => _list.Sum();
[Benchmark]
public int SumSIMD()
{
var accVector = Vector<int>.Zero;
// For any array use
// var spanOfVectors = MemoryMarshal.Cast<int, Vector<int>>(new Span<int>(myArray));
var spanOfVectors = MemoryMarshal.Cast<int, Vector<int>>(CollectionsMarshal.AsSpan(_list));
foreach (var vector in spanOfVectors)
{
accVector = Vector.Add(accVector, vector);
}
// Scalar-Product of our vector against the Unit vector is its sum
var result = Vector.Dot(accVector, Vector<int>.One);
return result;
}
}
| Method | Mean | Error | StdDev | Ratio |
|-------- |-----------:|---------:|---------:|------:|
| ListSum | 4,493.1 ns | 88.84 ns | 83.10 ns | 1.00 |
| SumSIMD | 117.7 ns | 0.44 ns | 0.41 ns | 0.03 |
Getting Minimum and Maximum of a list¶
SIMD can be used to get the smallest and the largest number in a given list.
public (int min, int max) GetMinMaxSIMD()
{
var vectors = MemoryMarshal.Cast<int, Vector<int>>(CollectionsMarshal.AsSpan(_numbers));
var vMin = new Vector<int>(int.MaxValue);
var vMax = new Vector<int>(int.MinValue);
foreach (var vector in vectors)
{
vMin = Vector.Min(vMin, vector);
vMax = Vector.Max(vMax, vector);
}
var min = int.MaxValue;
var max = int.MinValue;
for (var i = 0; i < Vector<int>.Count; i++)
{
min = Math.Min(vMin[i], min);
max = Math.Max(vMax[i], min);
}
return (min, max);
}
The traditional versionl, could look like this:
public (int min, int max) GetMinMax()
{
var min = int.MaxValue;
var max = int.MinValue;
for (var i = 0; i < _numbers.Count; i++)
{
min = Math.Min(_numbers[i], min);
max = Math.Max(_numbers[i], min);
}
return (min, max);
}
Results:
The given Llist (_numbers
) has 10000 entries.
| Method | Mean | Error | StdDev | Ratio | Rank |
|-------------- |----------:|----------:|----------:|------:|-----:|
| GetMinMaxSIMD | 1.544 us | 0.0050 us | 0.0047 us | 0.10 | 1 |
| GetMinMax | 15.763 us | 0.0435 us | 0.0407 us | 1.00 | 2 |