Saturday, November 11, 2023
Tuesday, September 26, 2023
Saturday, July 15, 2023
SIMD - transpose
Intel SIMD (AVX) transpose intrinsics
- _shuffle_* (mask)
- permute (mask)
- _unpacklo_, _unpackhi_
- _MM_TRANSPOSE4_PS & c++ - Fastest way to transpose 4x4 byte matrix - Stack Overflow
Example
Saturday, June 24, 2023
Intel vs. AMD architecture ...
Agner
Software optimization resources. C++ and assembly. Windows, Linux, BSD, Mac OS X (agner.org)
microarchitecture.pdf (agner.org)
Chips and Cheese
AMD’s Zen 4 Part 1: Frontend and Execution Engine – Chips and Cheese
CCD / CCX
CCX: core complex = 4 cores
CCD: core complex die = CCX + CCX, i.e., max 8 cores
refer to: AMD CCD and CCX in Ryzen Processors Explained | Hardware Times
SIMD
lec17_x86SIMD.pdf (ntu.edu.tw)
Friday, February 3, 2023
What is the difference between rint() and round() in C or C++?
From the ChatGPT
In C++, rint
is a function from the cmath
library that returns the nearest integer value to a given floating-point number, using rounding towards the nearest even number in case of tie.
On the other hand, round
is a function that also rounds a floating-point number to the nearest integer value, but it is part of the cmath
library only starting from C++11. In earlier versions of C++, round
is not available.
So, if you are using C++11 or later, you can use round
instead of rint
for rounding to the nearest integer value.
Please refer to banker's round
Sample Code: https://onlinegdb.com/q36smlQMz
Sunday, March 20, 2022
Beamforming - Zero Forcing
1. Zero Forcing Model
https://www.sharetechnote.com/html/Communication_ChannelModel_ZF.html
2. Zero Forcing precoding
https://en.wikipedia.org/wiki/Zero-forcing_precoding
https://www.koreascience.or.kr/article/JAKO201334559957395.pdf
Sunday, February 20, 2022
Intel SIMD tutorials
By Intel
1. Intel® Intrinsics Guide
https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html
2. Mirror of Intel® Intrinsics Guide
https://www.laruence.com/sse/
3. Guide to vectorization with Intel C++ compilers
https://www.intel.com/content/dam/www/public/us/en/documents/guides/compiler-auto-vectorization-guide.pdf
By others
1. Faster Set Intersection with SIMD instructions by Reducing ...
http://www.vldb.org/pvldb/vol8/p293-inoue.pdf
2. CS3330: A quick guide to SSE/SIMD
https://www.cs.virginia.edu/~cr4bd/3330/F2018/simdref.html
3. Improving performance with SIMD intrinsics in three use cases
https://stackoverflow.blog/2020/07/08/improving-performance-with-simd-intrinsics-in-three-use-cases/
* 4. Matrix Multiplication Revisited | Richard Startin's Blog
https://richardstartin.github.io/posts/mmm-revisited
5. A Fast Matrix-Matrix Multiply Using Intel's SSE Instructions
https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.23.6754&rep=rep1&type=pdf
6. https://shura.shu.ac.uk/18355/1/Kelefouras-Matrix-MatrixMultiplicationMethodologyforSIngleMulti-Core%28AM%29.pdf
7. https://www.uio.no/studier/emner/matnat/ifi/IN3200/v19/teaching-material/avx512.pdf
8. https://compilers.cs.uni-saarland.de/papers/leissa_vecimp_tr.pdf