Опубликован: 12.07.2012 | Доступ: свободный | Студентов: 355 / 24 | Оценка: 4.00 / 4.20 | Длительность: 11:07:00
Специальности: Программист
Лекция 8:

Optimizing compiler. Interpocedural optimizations

< Лекция 7 || Лекция 8: 123 || Лекция 9 >

Compiler options

  • /Ob<n> control inline expansion:
    • n=0 disable inlining
    • n=1 inline functions declared with __inline, and perform C++ inlining
    • n=2 inline any function, at the compiler's discretion
  • /Qinline-min-size:<n> set size limit for inlining small routines
  • /Qinline-min-size- no size limit for inlining small routines
  • /Qinline-max-size:<n> set size limit for inlining large routines
  • /Qinline-max-size- no size limit for inlining large routines
  • /Qinline-max-total-size:<n> maximum increase in size for inline function expansion
  • /Qinline-max-total-size- no size limit for inline function expansion
  • /Qinline-max-per-routine:<n> maximum number of inline instances in any function
  • /Qinline-max-per-routine- no maximum number of inline instances in any function
  • /Qinline-max-per-compile:<n> maximum number of inline instances in the current compilation
  • /Qinline-max-per-compile- no maximum number of inline instances in the current compilation
  • /Qinline-factor:<n> set inlining upper limits by n percentage
  • /Qinline-factor- do not set set inlining upper limits
  • /Qinline-forceinline treat inline routines as forceinline
  • /Qinline-dllimport allow(DEFAULT)/disallow functions declared __declspec(dllimport) to be inlined
  • /Qinline-calloc directs the compiler to inline calloc() calls as malloc()/memset()

Procedure cloning

Cloning is a specializing a function to a specific class of call sites

Sometimes specific characteristics of dummy arguments allow to perform a special optimizations for procedure. In this case it is possible to create specialized procedure and change the initial procedure call to new one in all the cases where the actual arguments have these characteristics.

Trivial case is a call of a procedure with a constant argument. For example, if there are several calls of some procedure f in form f(x,y,TRUE) and several calls f(x,y,FALSE) than sometimes it is profitable to create procedures f_TRUE(x,y) and f_FALSE(x,y) and replace initial calls with calls of new procedures.

Partial inlining

Partial inlining is an efficient way of inlining, which inlines only part of the callee function.


Рис. 8.2.

Data transformations

Data transformation is a interprocedural optimization which change structure of user data to provide better cash locality during execution.

The following types of data transformation are widely known:

  • permutation of structure fields
  • structure splitting

Permutation of structure fields can improve cash locality if the fields which are used together during calculation are located closely. In this case system bus reads fewer cash lines from memory.

Structure splitting leaves hot (frequently used) fields in main structure and removes other fields to special frozen section. After this optimization data will need less memory and will fit cash better.

Compiler need to prove correctness of such transformation. In many cases whole program analysis is needed.

Structure splitting and field reordering example

#ifndef PERF
typedef struct {
  double x;
  char title[40];
  double y;
  char title2[22];
  double z;
} VecR;
#else
typedef struct {
  char title[40];
  char title2[22];
} ColdFields;
typedef struct {
  double x;
  double y;
  double z;
  ColdFields *cold;
} VecR;
#endif
#include "struct.h"
int main() {
   int i, k;
   VecR *array = malloc(10000*sizeof(VecR));
#ifdef PERF
   for(i=0;i<10000;i++)
       array[i].cold=(ColdFields*)malloc(sizeof(ColdFields));
#endif 
   for (i=0;i<10000;i++){
      array[i].x = 1.0;  array[i].y = 2.0;  array[i].z = 0.0; }
   for(k=1;k<10000;k++) {
     for (i=k;i<9999;i++){
        array[i].x = array[i-1].y+1.0;
        array[i].y = array[i+1].x+array[i+1].y;
        array[i].z = (array[i-1].y - array[i-1].x)/array[i-1].y; }  }
   printf("%f \n",array[100].z);
#ifdef PERF
   for(i=0;i<10000;i++)
      free(array[i].cold);
#endif 
   free(array);
}

Result of test execution

icc struct.c -fast -o a.out 
icc struct.c -fast -DPERF -o b.out 
time ./a.out 
real    0m0.808s
time ./b.out 
real    0m0.566s

Pointer chasing

Data access through several pointers is one of the most common problem in C++ code. If a program data doesn’t fit in the cash subsystem, then every pointer dereference will cause significant stall in calculation.

This problem can be caused also by wrong data transformation.


Рис. 8.3.
All_members+= employer->p->f->members; 

Devirtualization for C++ virtual method

C++ - object-oriented language with a high level of abstraction and ability to perform the class methods depending on the type of the object at run time. In this case pointers to different class methods are located in special table and call of virtual function is very expensive for the performance. Sometimes call through table of virtual method can be replaced with call of a specific method

A => B => C

All derived classes override virtual int foo ()

int process (class A * a) {
return (a-> foo ());
}

Devirtualization example

Class A isn’t used in this source, so it is possible to perform devirtualization.

#include <stdio.h>
class A {
   virtual int foo() { return 1; };
   friend int process(class A *a);
};
class B: public A {
  virtual int foo() { return 2; };
  friend int process(class A *a);
};
int process(class A *a) {
 return(a->foo());
};
void main() {
  B* pB = new B;
  int result2 = process(pB);
}
icl test.cpp –S
mov       eax, DWORD PTR [ebx]
mov       ecx, ebx
call      DWORD PTR [eax] 
  (call through table)
icl test.cpp –Qipo_S –Ob0 -Qipo 
call      ?process.@@YAHPAVA@@@Z
< Лекция 7 || Лекция 8: 123 || Лекция 9 >