Intel SIMD (AVX) transpose intrinsics
- _shuffle_* (mask)
- permute (mask)
- _unpacklo_, _unpackhi_
- _MM_TRANSPOSE4_PS & c++ - Fastest way to transpose 4x4 byte matrix - Stack Overflow
Example
Intel SIMD (AVX) transpose intrinsics
- _shuffle_* (mask)
- permute (mask)
- _unpacklo_, _unpackhi_
- _MM_TRANSPOSE4_PS & c++ - Fastest way to transpose 4x4 byte matrix - Stack Overflow
Example