Saturday, October 26, 2019

How can I find what instruction set extension is supported in my Intel Processor?

https://www.intel.com/content/www/us/en/support/articles/000005779/processors.html

1. Go to https://ark.intel.com/content/www/us/en/ark.html site
2. Processors
3. Select Intel xxx Processor type
4. Select Product Name: Intel® Core™ i7-5500U Processor
5. Find "Advanced Technology"
You can find the supported SIMD instructions
 Intel i7-5500U can support: Intel® SSE4.1, Intel® SSE4.2, Intel® AVX2

And 
How to Identify My Intel® Processor
And gcc x86 options
https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html

Thursday, April 18, 2019

How to add commend in "macro function" or #define ?

int small_func(int a)
{
    // input plus two
    return (a + 2);
}

#define SMALL_FUNC(a) ({ \
    int result;
    do { \
        /* input plus two */ \
        result = a + 2; \
    } while(0); \
    result; \
})

How to Add Comments to Macros

for loop multiple conditions

what is the right?
1. for (i = 0, j = 0; i < loop, j < loop2; i++, j++)
2. for (i = 0, j = 0; (i < loop) && (j < loop2); i++, j++)

The answers are
Are multiple conditions allowed in a for loop?
GeeksforGeeks: Output of C Program | Set 22, question 2

for example
i = 1, 2; is same as i = 2;

Wednesday, April 17, 2019

TI DSP memory & memory map

L2SRAM: very very fast
MSMCSRAM (Multi-core Shared Memory Controller, SRAM): very fast
DDR3: fast

TI Linker Command File Primer

Thursday, March 14, 2019

TI DSP code optimization

Texas Instruments TMS320C6x DSP code optimization

1. Hand-Tuning Loops and Control Code on the TMS320C6000
1.1 loop optimization
 -nrestrict
 - #pragma MUST_ITERATE(lower_bound, upper_bound, factor)
 - _nasserts()
1.2 if statement optimization

2. Introduction to TMS320C6000 DSP Optimization
2.1 Cannot make "Pipelined Loop"
 - exceed 14 executed packets (1 packet is 8 instructions)
 - nested loops
 - conditional branches inside loops
 - function calls inside loops
2.2 "Pipelined Loop" consists of
 - Prolog: above Kernel
 - Kernel: pipeline is fully utilized
 - Epilog: below Kernel
2.3 ii
 - ii = iteration interval
 - software pipeline loop can be approximated with ii * number_of_iterations.
 - ii is bounded below by two factors: the loop carried dependency bound and the partitined resource bound.
 - the loop carried dependency bound: the distance of the largest loop path


* Reference
Hand-Tuning Loops and Control Code on the TMS320C6000
Introduction to TMS320C6000 DSP Optimization
TMS320C6000 DSP Optimization Workshop - Texas Instruments Wiki
TMS320C6000 Programmer's Guide (Rev. K) - Texas Instruments
http://processors.wiki.ti.com/index.php/Software_libraries
http://processors.wiki.ti.com/index.php/Profiler
DSP/BIOS timers and benchmarking Tips SPRA829: Profile



Tuesday, February 19, 2019

Performance / Optimization

1. Practical Performance
 - gprof
 - PAPI
 - Callgrind
 - Compiler Flags

1.1
 - IA-32 (32 bit, intel architecture 32 bit, i386): the 32 bit version of the x86 instruction set
 - AMD64 (64 bit, x64, x86_64, AMD64); the 64 bit version of the x86 instruction set
1.2 SIMD
 SISD (single instruction, single data) vs. SIMD (single instruction, multiple data)
 - SIMD 병렬 프로그래밍