Introduction to application optimizations with usage of Intel® performance tools
The objectives of this course
The presentation can be downloaded here.
Get a basic understanding of:
- the main factors of the processor performance,
- base performance improvement techniques,
- Intel® tools for performance analysis,
- main options and components of the Intel compiler,
- theoretical foundations of some performance optimizations.
You will be able to
- describe the main problems of the processor performance;
- investigate the application using the VTune ™ Performance Analyzer and find problem areas;
- identify the main problems of an application analyzed;
- develop a strategy to improve application performance;
- describe the main components of the compiler and its functions;
- control the level of optimization with command line options.
Course plan
- Intel microprocessor architecture and main factors affecting processor performance;
- VTune Performance Analyzer usage;
- The role of the compiler in improving application performance;
- Some theoretical concepts. Control flow graph, data-flow analysis;
- Permutation optimizations and their applicability. Dependencies;
- Vectorization;
- Parallelization using OMP directives and auto parallelization;
- The main components of the compiler, their tasks and interconnection.
Intel microprocessor architecture and the main factors affecting the processor performance.
Simplified processor model
- Control Unit, CU
- Arithmetic and Logic Unit, ALU
- System registers
- Front Side Bus, FSB
- Memory
- Peripheral devices
- decodes instructions received from the memory;
- controls ALU;
- performs data transfer between the CPU registers, memory, peripheral devices.
ALU consists of different parts, allowing to perform arithmetic and logical operations on the system registers.
System registers - a piece of memory inside the CPU that is used for temporary storage of an information processed by the processor.
A system bus is used for data transfer between the CPU and memory, as well as between the CPU and peripherals.
High performance is one of the key factors in the competition of the computer systems manufacturer
Processor performance is directly related to the amount of computational work that can be processed at a time.
Performance = Number of instructions / Time
We'll talk about performance on the basis of IA32 and IA32E architectures (IA32 with EM64T).
Factors affecting the processor performance:
- CPU clock frequency;
- Accessible memory amount and speed;
- The performance of the instructions and completeness of the instruction set;
- The internal memory registers usage;
- The quality of pipelining;
- The quality of prediction;
- The quality of the prefetching;
- Superscalarity;
- The quality of vectorization;
- Parallelization and multicore.
Clock rate
Because the processor is made of different components, working with different speeds, there is a processor timer which is providing the synchronization by sending periodic sync. Its frequency is called the clock speed of the processor.
Memory speed and amount
- 8086 - 1 MB of memory.
- 80 286 - A new system registers, and a new mode of memory - 16MB of memory.
- 80 386 - the first 32-bit processor - 4GB
- Technology EM64T (Extended Memory 64 Technology) - ~ 264B
The performance of the instructions and completeness of the instruction set
Performance depends on how well the instructions are implemented, how well the basic instruction set covers all possible tasks.
CISC, RISC (complex, reduced instruction set computing)
Modern Intel processors are a hybrid of CISC and RISC; before executing a processor converts CISC instructions into simpler RISC instruction set.