Weird performance problem in Livermore Loop loop24.sac of for() vs. while()
Extended Description
Created an attachment (id=955)
source code to reproduce fault
sac2c -V
sac2c v1.00-beta (Haggis And Apple)
product rev 18089 linux-gnu_x86_64
(Fri Apr 5 09:54:36 EDT 2013 by sac)
This says it all:
sac@rattler:~/sac/demos/benchmarks/livermore_loops/for_comparison/loop24$ sac2c-d bugfor.sac -v1 -doawlf -nowlf -O3
WARNING: AWLF is enabled: -ecc enabled.
WARNING: AWLF is enabled: -extrema enabled.
WARNING: AWLF is enabled: -maxoptcyc=20
sac@rattler:~/sac/demos/benchmarks/livermore_loops/for_comparison/loop24$ time a.out max24 = 0;
24,35c24,28
< // while (k < n){
< // m = X[k] < X[m] ? k : m;
< // k = k + 1;
< // }
<
< for( k=0; k while (k < n){
> max24 = X[k] < X[max24] ? k : max24;
> k = k + 1;
> }
> return(max24);
The elapsed time for the for() loop is significantly greater than
that of the while() loop. Here are the PAPIEX results:
diff bugfor.inp loop24.inp
sac@rattler:~/sac/demos/benchmarks/livermore_loops/for_comparison/loop24$ rm bugfor*txt
sac@rattler:~/sac/demos/benchmarks/livermore_loops/for_comparison/loop24$ papioneLivermore bugfor.sac
Compiling livermore loop bugfor.sac -O3 -nowlf -doawlf
WARNING: AWLF is enabled: -ecc enabled.
WARNING: AWLF is enabled: -extrema enabled.
WARNING: AWLF is enabled: -maxoptcyc=20
Executing bugfor.sac
sac@rattler:~/sac/demos/benchmarks/livermore_loops/for_comparison/loop24$ cp loop24.inp crud.inp
sac@rattler:~/sac/demos/benchmarks/livermore_loops/for_comparison/loop24$ cp loop24.sac crud.sac
sac@rattler:~/sac/demos/benchmarks/livermore_loops/for_comparison/loop24$ rm crud*txt
sac@rattler:~/sac/demos/benchmarks/livermore_loops/for_comparison/loop24$ papioneLivermore crud.sac
Compiling livermore loop crud.sac -O3 -nowlf -doawlf
WARNING: AWLF is enabled: -ecc enabled.
WARNING: AWLF is enabled: -extrema enabled.
WARNING: AWLF is enabled: -maxoptcyc=20
Executing crud.sac
sac@rattler:~/sac/demos/benchmarks/livermore_loops/for_comparison/loop24$ cat bugfor.*txt
papiex version : 0.99
Executable : /home/sac/sac/demos/benchmarks/livermore_loops/for_comparison/loop24/bugfor.sac.exe.awlf.18089
Arguments :
Processor : AMD Phenom(tm) II X6 1075T Processor
Clockrate : 3000.000000
Hostname : rattler
Options : PAPI_TOT_INS,PAPI_L1_DCM,PAPI_L2_DCM,PAPI_VEC_INS,NO_MPI_GATHER,NO_SCIENTIFIC
Domain : User
Parent process id : 5978
Process id : 5979
Start : Fri Apr 5 10:38:20 2013
Finish : Fri Apr 5 10:38:43 2013
Instructions Completed ....................... 90016357233
Vector Instructions .......................... 60006020056
L1 Data Cache Misses ......................... 15473862
L2 Data Cache Misses ......................... 3234
Real usecs ................................... 22825991
Real cycles .................................. 68785872665
Virtual usecs ................................ 22822293
Virtual cycles ............................... 68466870000
PAPI_TOT_INS ................................. 90016357233
PAPI_L1_DCM .................................. 15473862
PAPI_L2_DCM .................................. 3234
PAPI_VEC_INS ................................. 60006020056
Event descriptions:
PAPI_TOT_INS : Instructions completed
PAPI_L1_DCM : Level 1 data cache misses
PAPI_L2_DCM : Level 2 data cache misses
PAPI_VEC_INS : Vector/SIMD instructions (could include integer)
sac@rattler:~/sac/demos/benchmarks/livermore_loops/for_comparison/loop24$ cat crud*txt
papiex version : 0.99
Executable : /home/sac/sac/demos/benchmarks/livermore_loops/for_comparison/loop24/crud.sac.exe.awlf.18089
Arguments :
Processor : AMD Phenom(tm) II X6 1075T Processor
Clockrate : 3000.000000
Hostname : rattler
Options : PAPI_TOT_INS,PAPI_L1_DCM,PAPI_L2_DCM,PAPI_VEC_INS,NO_MPI_GATHER,NO_SCIENTIFIC
Domain : User
Parent process id : 6027
Process id : 6028
Start : Fri Apr 5 10:39:49 2013
Finish : Fri Apr 5 10:40:07 2013
Instructions Completed ....................... 85008352882
Vector Instructions .......................... 50000520056
L1 Data Cache Misses ......................... 15087997
L2 Data Cache Misses ......................... 3285
Real usecs ................................... 17361987
Real cycles .................................. 52320240093
Virtual usecs ................................ 17358672
Virtual cycles ............................... 52076004000
PAPI_TOT_INS ................................. 85008352882
PAPI_L1_DCM .................................. 15087997
PAPI_L2_DCM .................................. 3285
PAPI_VEC_INS ................................. 50000520056
Event descriptions:
PAPI_TOT_INS : Instructions completed
PAPI_L1_DCM : Level 1 data cache misses
PAPI_L2_DCM : Level 2 data cache misses
PAPI_VEC_INS : Vector/SIMD instructions (could include integer)
This is with:
gcc --version
gcc (Ubuntu/Linaro 4.4.4-14ubuntu5) 4.4.5
Note the differences in both op counts and vector op counts.
This suggests that we are generating different
code for the two versions.
Compiling without -doawlf -nowlf does not affect the results,
which is to be expected, as this is a scalar-loop benchmark.