Китай, 从化 |
Optimizing compiler. Interpocedural optimizations
Compiler options
- /Ob<n> control inline expansion:
- /Qinline-min-size:<n> set size limit for inlining small routines
- /Qinline-min-size- no size limit for inlining small routines
- /Qinline-max-size:<n> set size limit for inlining large routines
- /Qinline-max-size- no size limit for inlining large routines
- /Qinline-max-total-size:<n> maximum increase in size for inline function expansion
- /Qinline-max-total-size- no size limit for inline function expansion
- /Qinline-max-per-routine:<n> maximum number of inline instances in any function
- /Qinline-max-per-routine- no maximum number of inline instances in any function
- /Qinline-max-per-compile:<n> maximum number of inline instances in the current compilation
- /Qinline-max-per-compile- no maximum number of inline instances in the current compilation
- /Qinline-factor:<n> set inlining upper limits by n percentage
- /Qinline-factor- do not set set inlining upper limits
- /Qinline-forceinline treat inline routines as forceinline
- /Qinline-dllimport allow(DEFAULT)/disallow functions declared __declspec(dllimport) to be inlined
- /Qinline-calloc directs the compiler to inline calloc() calls as malloc()/memset()
Procedure cloning
Cloning is a specializing a function to a specific class of call sites
Sometimes specific characteristics of dummy arguments allow to perform a special optimizations for procedure. In this case it is possible to create specialized procedure and change the initial procedure call to new one in all the cases where the actual arguments have these characteristics.
Trivial case is a call of a procedure with a constant argument. For example, if there are several calls of some procedure f in form f(x,y,TRUE) and several calls f(x,y,FALSE) than sometimes it is profitable to create procedures f_TRUE(x,y) and f_FALSE(x,y) and replace initial calls with calls of new procedures.
Partial inlining
Partial inlining is an efficient way of inlining, which inlines only part of the callee function.
Data transformations
Data transformation is a interprocedural optimization which change structure of user data to provide better cash locality during execution.
The following types of data transformation are widely known:
Permutation of structure fields can improve cash locality if the fields which are used together during calculation are located closely. In this case system bus reads fewer cash lines from memory.
Structure splitting leaves hot (frequently used) fields in main structure and removes other fields to special frozen section. After this optimization data will need less memory and will fit cash better.
Compiler need to prove correctness of such transformation. In many cases whole program analysis is needed.
Structure splitting and field reordering example
#ifndef PERF typedef struct { double x; char title[40]; double y; char title2[22]; double z; } VecR; #else typedef struct { char title[40]; char title2[22]; } ColdFields; typedef struct { double x; double y; double z; ColdFields *cold; } VecR; #endif
#include "struct.h" int main() { int i, k; VecR *array = malloc(10000*sizeof(VecR)); #ifdef PERF for(i=0;i<10000;i++) array[i].cold=(ColdFields*)malloc(sizeof(ColdFields)); #endif for (i=0;i<10000;i++){ array[i].x = 1.0; array[i].y = 2.0; array[i].z = 0.0; } for(k=1;k<10000;k++) { for (i=k;i<9999;i++){ array[i].x = array[i-1].y+1.0; array[i].y = array[i+1].x+array[i+1].y; array[i].z = (array[i-1].y - array[i-1].x)/array[i-1].y; } } printf("%f \n",array[100].z); #ifdef PERF for(i=0;i<10000;i++) free(array[i].cold); #endif free(array); }
icc struct.c -fast -o a.out icc struct.c -fast -DPERF -o b.out time ./a.out real 0m0.808s time ./b.out real 0m0.566s
Pointer chasing
Data access through several pointers is one of the most common problem in C++ code. If a program data doesn’t fit in the cash subsystem, then every pointer dereference will cause significant stall in calculation.
This problem can be caused also by wrong data transformation.
All_members+= employer->p->f->members;
Devirtualization for C++ virtual method
C++ - object-oriented language with a high level of abstraction and ability to perform the class methods depending on the type of the object at run time. In this case pointers to different class methods are located in special table and call of virtual function is very expensive for the performance. Sometimes call through table of virtual method can be replaced with call of a specific method
A => B => C
All derived classes override virtual int foo ()
int process (class A * a) { return (a-> foo ()); }
Class A isn’t used in this source, so it is possible to perform devirtualization.
#include <stdio.h> class A { virtual int foo() { return 1; }; friend int process(class A *a); }; class B: public A { virtual int foo() { return 2; }; friend int process(class A *a); }; int process(class A *a) { return(a->foo()); }; void main() { B* pB = new B; int result2 = process(pB); }
icl test.cpp –S mov eax, DWORD PTR [ebx] mov ecx, ebx call DWORD PTR [eax] (call through table) icl test.cpp –Qipo_S –Ob0 -Qipo call ?process.@@YAHPAVA@@@Z