Опубликован: 12.07.2012 | Доступ: свободный | Студентов: 355 / 24 | Оценка: 4.00 / 4.20 | Длительность: 11:07:00
Специальности: Программист
Лекция 9:

Optimizing compiler. Static and dynamic profiler. Memory manager. Code generator

Dynamic profiler and auto vectorization example

#include <stdio.h>
 float ttt(float* vec,int n1, int n2) {
 int i;
 float sum=0;
 for(i=n1;i<n2;i++) 
     sum+= vec[i]; 
 return sum;
}
int main() {
 float zzz[1000];
 int i;
 float sum=0;
 for(i=0;i<1000;i++) 
   zzz[i]=i;
 
 for(i=1;i<1000;i=i+5)
    sum = sum+ttt(zzz,i,i+5);
 for(i=1;i<1000;i=i+6)
    sum = sum+ttt(zzz,i,i+6);
 printf("sum=%f\n",sum); 
} 

Let’s check if compiler is able to estimate vectorization profitability with dynamic profiler.

icl -Ob0 test_vecpgo.c -Qipo -Qvec_report3
…

test_vecpgo.c(6): (col. 2) remark: LOOP WAS VECTORIZED.

icl -Ob0 test_vecpgo.c -Qipo -Qprof_gen 
test_vecpgo.exe
icl -Ob0 test_vecpgo.c -Qipo -Qprof_use -Qvec_report3 
…

test_vecpgo.c(6): (col. 2) remark: loop was not vectorized: vectorization possible but seems inefficient.

cat multip.c 
void matrix_mul_matrix(int n, double *C, float *A, float *B) {
int i,j,k;
  for (i=0; i<n; i++) 
    for (j=0; j<n; j++) 
      for(k=0;k<n;k++)
         C[i*n+j]+=(double)A[i*n+k] * (double)B[k*n+j];
}
cat main.c 
#include <stdio.h>
#define N 2000
extern void matrix_mul_matrix(int,double *,float *,float *);
int main() {
float *A,*B;
double *C;
…
matrix_mul_matrix(N,C,A,B);
printf("%f\n",C[2*N+2]);
}

Let’s check if compiler is able to estimate auto parallelization profitability with dynamic profiler.

icl /Ob0 multip.c main.c /O3 /Qipo /Qparallel  /Qpar_report3 
… 
procedure: matrix_mul_matrix 

multip.c(4): (col. 3) remark: loop was not parallelized: insufficient computational work.

time multip.exe – 3.6s
icl  /Qprof_gen /Ob0 multip.c main.c  /O3 /Qipo /Qparallel  
multip.exe 
icl /Qprof_use /Ob0 multip.c main.c /O3 /Qipo /Qparallel  /Qpar_report3  
  procedure: matrix_mul_matrix 

multip.c(4): (col. 3) remark: LOOP WAS AUTO-PARALLELIZED.

time multip.exe – 0.67s 

Dynamic memory allocation and memory manager

Objects and arrays can be allocated dynamically at runtime with the operators new() and delete(), functions malloc() and free(). The memory manager is part of the application, processing requests for the allocation and freeing of memory.

A typical situations where dynamic memory allocation is necessary are:

  • Creation of a large array which size is unknown at compile time.
  • An array can be very large in order to place it on the stack.
  • Objects must be created at run time if the number of required objects is unknown.

Disadvantages of dynamic memory allocation:

  • Allocating and freeing memory has its overhead.
  • Allocated memory becomes fragmented when objects of different types are allocated and released in unpredictable order.
  • If a size of allocated object should be changed but there is no possibility to extend the memory block, than the memory should be copied from old block to the new.
  • Garbage collection is necessary because memory blocks of required size can be not found because of memory fragmentation.

Important factor of the performance in C++ is a close memory placement of the objects belongs to same linked list. Linked list is less effective than the linear array for the following reasons:

  • Each object allocated separately. Allocation and release of the object has its price.
  • Objects memory placement is not sequential. The probability of cash hit is reduced when traversing lower than for array.
  • Need more memory to store references and information about the allocated memory block.

According to the same reason continuous array is more profitable than array of pointers.

A cash hit probablility can be different for different memory managers because of different method of memory allocation. For example, managers can combine allocated objects according to object size. There are some alternative memory managers such as SmartHeap or dlmalloc, which can provide better performance in some cases.