trisquel-icecat/icecat/third_party/highway/hwy/contrib/unroller
2025-10-06 02:35:48 -06:00
..
README.md icecat: add release 140.3.1-1gnu1 2025-10-06 02:35:48 -06:00
unroller-inl.h icecat: add release 140.3.1-1gnu1 2025-10-06 02:35:48 -06:00
unroller_test.cc icecat: add release 140.3.1-1gnu1 2025-10-06 02:35:48 -06:00

Unroller

All contents of the unroller folder are experimental and subject to changes.

Unroller is a templated function that automatically implements common optimizations that are usually handled by compilers when writing scalar code. Modern CPUs operate much more efficiently when non-dependent calculations are packed into an instruction pipeline. For scalar code, this often means a compiler will take a one-line loop, and compile it down to hundreds of lines of machine code in order to fully capture these efficiencies.

As of today (2023-07-06), compilers are not nearly as good at implementing these optimizations for code written in SIMD intrinsics. Unroller is a templated function that takes in an UnrollerUnit of SIMD instructions, and then implements unrolling, reordering, hoisting and tail-handling (URHT optimizations) of arrays of data being processed with SIMD intrinsics.

UnrollerUnit

UnrollerUnit and UnrollerUnit2D are a base classes of functions that Unroller needs implemented in order to properly handle URHT. UnrollerUnit has default implementations for all but the Func method, which defines the SIMD operation to be applied. Many examples of how to implement these functions are in the tests.

Doubling values of an array example

struct DoubleUnit : UnrollerUnit<DoubleUnit, int, int> {
  using TT = ScalableTag<int>;
  inline Vec<TT> Func(ptrdiff_t idx, Vec<TT> x, Vec<TT> y) {
    TT d;
    return Mul(x, Set(d, 2));
  }
};

Leaving all other methods in their default state, the following code will double all the values in array a and place them in r

DoubleUnit dblunit;
int r[N];
Unroller(dblunit, a, r, N);