Skip to content

SIMD - Single Input Multiple Data

SIMD describes the ability the process a single operation simultaneously on multiple data. This can give a big edge on independent data which can be processed in parallel. More information can be found here.

Compare two lists with SIMD

To speed-up the process of two lists, we can vectorize those lists and compare multiple chunks in parallel. To vectorize a list we can use SSE (S IMD S treaming E xtensions) in combination with some helper tools from the .NET Framework itself.

Bad Use LINQ queries, which can be slow.

var isEqual = list1.SequenceEquals(list2);

Good Use SSE to vectorize the list and check in parallel if the chunks are the same.

public bool SIMDContains()
{
    // Lists are not equal when the count is different
    if (list1.Count != list2.Count)
        return false;

    // Create a Span<Vector<int>> from our original list.
    // We don't have to worry about the internal size, etc...
    var list1AsVector = MemoryMarshal.Cast<int, Vector<int>>(CollectionsMarshal.AsSpan(list1));
    var list2AsVector = MemoryMarshal.Cast<int, Vector<int>>(CollectionsMarshal.AsSpan(list2));

    for (var i = 0; i < list1AsVector.Length; i++)
    {
        // Compare the chunks of list1 and list2
        if (!Vector.EqualsAll(list1AsVector[i], list2AsVector[i]))
            return false;
    }

    return true;
}

💡 Info: Check with Vector.IsHardwareAccelerated if SSE is available. If not the emulated / software version can be slower than LINQ due to the overhead.

Benchmark

public class Benchmark
{
    private readonly List<int> list1 = Enumerable.Range(0, 1_000).ToList();
    private readonly List<int> list2 = Enumerable.Range(0, 1_000).ToList();

    [Benchmark(Baseline = true)]
    public bool LINQSequenceEquals() => list1.SequenceEqual(list2);

    [Benchmark]
    public bool SIMDEquals()
    {
        if (list1.Count != list2.Count)
            return false;

        var list1AsVector = MemoryMarshal.Cast<int, Vector<int>>(CollectionsMarshal.AsSpan(list1));
        var list2AsVector = MemoryMarshal.Cast<int, Vector<int>>(CollectionsMarshal.AsSpan(list2));

        for (var i = 0; i < list1AsVector.Length; i++)
        {
            if (!Vector.EqualsAll(list1AsVector[i], list2AsVector[i]))
                return false;
        }

        return true;
    }
}

Results:

|             Method |       Mean |    Error |   StdDev | Ratio |
|------------------- |-----------:|---------:|---------:|------:|
| LINQSequenceEquals | 3,487.4 ns | 13.72 ns | 10.71 ns |  1.00 |
|         SIMDEquals |   137.3 ns |  0.98 ns |  0.91 ns |  0.04 |

Get the sum of a list or array

SIMD can be used to divide and conquer the problem of retrieving the sum of a list. The idea is to cut the list in smaller chunks and add them up via SIMD instuctions. This approach can faster the speed significantly.

Bad Using plain old LINQ to get a sum of a list or array of integers.

_list.Sum();

Good Using SIMD instructions to divide and conquer and parallelize the addition of the individual vectors.

public int SumSIMD()
{
    var accVector = Vector<int>.Zero;

    // For any array use
    // var spanOfVectors = MemoryMarshal.Cast<int, Vector<int>>(new Span<int>(myArray));
    var spanOfVectors = MemoryMarshal.Cast<int, Vector<int>>(CollectionsMarshal.AsSpan(_list));
    foreach (var vector in spanOfVectors)
    {
        accVector = Vector.Add(accVector, vector);
    }

    // Scalar-Product of our vector against the Unit vector is its sum
    var result = Vector.Dot(accVector, Vector<int>.One);
    return result;
}

Benchmark

public class Benchmark
{
    private readonly List<int> _list = Enumerable.Range(0, 1_000).ToList();

    [Benchmark(Baseline = true)]
    public int ListSum() => _list.Sum();

    [Benchmark]
    public int SumSIMD()
    {
        var accVector = Vector<int>.Zero;

        // For any array use
        // var spanOfVectors = MemoryMarshal.Cast<int, Vector<int>>(new Span<int>(myArray));
        var spanOfVectors = MemoryMarshal.Cast<int, Vector<int>>(CollectionsMarshal.AsSpan(_list));
        foreach (var vector in spanOfVectors)
        {
            accVector = Vector.Add(accVector, vector);
        }

        // Scalar-Product of our vector against the Unit vector is its sum
        var result = Vector.Dot(accVector, Vector<int>.One);
        return result;
    }
}
Results:
|  Method |       Mean |    Error |   StdDev | Ratio |
|-------- |-----------:|---------:|---------:|------:|
| ListSum | 4,493.1 ns | 88.84 ns | 83.10 ns |  1.00 |
| SumSIMD |   117.7 ns |  0.44 ns |  0.41 ns |  0.03 |

Getting Minimum and Maximum of a list

SIMD can be used to get the smallest and the largest number in a given list.

public (int min, int max) GetMinMaxSIMD()
{
    var vectors = MemoryMarshal.Cast<int, Vector<int>>(CollectionsMarshal.AsSpan(_numbers));
    var vMin = new Vector<int>(int.MaxValue);
    var vMax = new Vector<int>(int.MinValue);
    foreach (var vector in vectors)
    {
        vMin = Vector.Min(vMin, vector);
        vMax = Vector.Max(vMax, vector);
    }

    var min = int.MaxValue;
    var max = int.MinValue;

    for (var i = 0; i < Vector<int>.Count; i++)
    {
        min = Math.Min(vMin[i], min);
        max = Math.Max(vMax[i], min);
    }

    return (min, max);
}

The traditional versionl, could look like this:

public (int min, int max) GetMinMax()
{
    var min = int.MaxValue;
    var max = int.MinValue;

    for (var i = 0; i < _numbers.Count; i++)
    {
        min = Math.Min(_numbers[i], min);
        max = Math.Max(_numbers[i], min);
    }

    return (min, max);
}

Results:

The given Llist (_numbers) has 10000 entries.

|        Method |      Mean |     Error |    StdDev | Ratio | Rank |
|-------------- |----------:|----------:|----------:|------:|-----:|
| GetMinMaxSIMD |  1.544 us | 0.0050 us | 0.0047 us |  0.10 |    1 |
|     GetMinMax | 15.763 us | 0.0435 us | 0.0407 us |  1.00 |    2 |