https://www.intel.com/content/www/us/en/support/articles/000005779/processors.html
1. Go to https://ark.intel.com/content/www/us/en/ark.html site
2. Processors
3. Select Intel xxx Processor type
4. Select Product Name: Intel® Core™ i7-5500U Processor
5. Find "Advanced Technology"
You can find the supported SIMD instructions
Intel i7-5500U can support: Intel® SSE4.1, Intel® SSE4.2, Intel® AVX2
And
How to Identify My Intel® Processor
And gcc x86 options
https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html
Saturday, October 26, 2019
Thursday, April 18, 2019
How to add commend in "macro function" or #define ?
int small_func(int a)
{
// input plus two
return (a + 2);
}
#define SMALL_FUNC(a) ({ \
int result;
do { \
/* input plus two */ \
result = a + 2; \
} while(0); \
result; \
})
How to Add Comments to Macros
{
// input plus two
return (a + 2);
}
#define SMALL_FUNC(a) ({ \
int result;
do { \
/* input plus two */ \
result = a + 2; \
} while(0); \
result; \
})
How to Add Comments to Macros
for loop multiple conditions
what is the right?
1. for (i = 0, j = 0; i < loop, j < loop2; i++, j++)
2. for (i = 0, j = 0; (i < loop) && (j < loop2); i++, j++)
The answers are
Are multiple conditions allowed in a for loop?
GeeksforGeeks: Output of C Program | Set 22, question 2
for example
i = 1, 2; is same as i = 2;
1. for (i = 0, j = 0; i < loop, j < loop2; i++, j++)
2. for (i = 0, j = 0; (i < loop) && (j < loop2); i++, j++)
The answers are
Are multiple conditions allowed in a for loop?
GeeksforGeeks: Output of C Program | Set 22, question 2
for example
i = 1, 2; is same as i = 2;
Wednesday, April 17, 2019
TI DSP memory & memory map
L2SRAM: very very fast
MSMCSRAM (Multi-core Shared Memory Controller, SRAM): very fast
DDR3: fast
TI Linker Command File Primer
MSMCSRAM (Multi-core Shared Memory Controller, SRAM): very fast
DDR3: fast
TI Linker Command File Primer
Thursday, March 14, 2019
TI DSP code optimization
Texas Instruments TMS320C6x DSP code optimization
1. Hand-Tuning Loops and Control Code on the TMS320C6000
1.1 loop optimization
-nrestrict
- #pragma MUST_ITERATE(lower_bound, upper_bound, factor)
- _nasserts()
1.2 if statement optimization
2. Introduction to TMS320C6000 DSP Optimization
2.1 Cannot make "Pipelined Loop"
- exceed 14 executed packets (1 packet is 8 instructions)
- nested loops
- conditional branches inside loops
- function calls inside loops
2.2 "Pipelined Loop" consists of
- Prolog: above Kernel
- Kernel: pipeline is fully utilized
- Epilog: below Kernel
2.3 ii
- ii = iteration interval
- software pipeline loop can be approximated with ii * number_of_iterations.
- ii is bounded below by two factors: the loop carried dependency bound and the partitined resource bound.
- the loop carried dependency bound: the distance of the largest loop path
* Reference
Hand-Tuning Loops and Control Code on the TMS320C6000
Introduction to TMS320C6000 DSP Optimization
TMS320C6000 DSP Optimization Workshop - Texas Instruments Wiki
TMS320C6000 Programmer's Guide (Rev. K) - Texas Instruments
http://processors.wiki.ti.com/index.php/Software_libraries
http://processors.wiki.ti.com/index.php/Profiler
DSP/BIOS timers and benchmarking Tips SPRA829: Profile
1. Hand-Tuning Loops and Control Code on the TMS320C6000
1.1 loop optimization
-nrestrict
- #pragma MUST_ITERATE(lower_bound, upper_bound, factor)
- _nasserts()
1.2 if statement optimization
2. Introduction to TMS320C6000 DSP Optimization
2.1 Cannot make "Pipelined Loop"
- exceed 14 executed packets (1 packet is 8 instructions)
- nested loops
- conditional branches inside loops
- function calls inside loops
2.2 "Pipelined Loop" consists of
- Prolog: above Kernel
- Kernel: pipeline is fully utilized
- Epilog: below Kernel
2.3 ii
- ii = iteration interval
- software pipeline loop can be approximated with ii * number_of_iterations.
- ii is bounded below by two factors: the loop carried dependency bound and the partitined resource bound.
- the loop carried dependency bound: the distance of the largest loop path
* Reference
Hand-Tuning Loops and Control Code on the TMS320C6000
Introduction to TMS320C6000 DSP Optimization
TMS320C6000 DSP Optimization Workshop - Texas Instruments Wiki
TMS320C6000 Programmer's Guide (Rev. K) - Texas Instruments
http://processors.wiki.ti.com/index.php/Software_libraries
http://processors.wiki.ti.com/index.php/Profiler
DSP/BIOS timers and benchmarking Tips SPRA829: Profile
Tuesday, February 19, 2019
Performance / Optimization
1. Practical Performance
- gprof
- PAPI
- Callgrind
- Compiler Flags
1.1
- IA-32 (32 bit, intel architecture 32 bit, i386): the 32 bit version of the x86 instruction set
- AMD64 (64 bit, x64, x86_64, AMD64); the 64 bit version of the x86 instruction set
1.2 SIMD
SISD (single instruction, single data) vs. SIMD (single instruction, multiple data)
- SIMD 병렬 프로그래밍
- gprof
- PAPI
- Callgrind
- Compiler Flags
1.1
- IA-32 (32 bit, intel architecture 32 bit, i386): the 32 bit version of the x86 instruction set
- AMD64 (64 bit, x64, x86_64, AMD64); the 64 bit version of the x86 instruction set
1.2 SIMD
SISD (single instruction, single data) vs. SIMD (single instruction, multiple data)
- SIMD 병렬 프로그래밍
Monday, February 18, 2019
Subscribe to:
Posts (Atom)