Adding a WL speeds up loopfsAKD.sac
|
|
Bugzilla Link |
495 |
Created on |
May 17, 2009 19:52 |
Version |
1.00beta |
OS |
Linux |
Architecture |
PC |
Attachments |
crud.sac |
Extended Description
Created an attachment (id=522)
Source code to reproduce fault
The attached code has the interesting property that it runs faster if you
introduce an extra WL into the mix, via #define CRUD.
The resulting code is NOT folded by WLF, so there is an extra WL at the end of phase 11.
However, the resulting code executes about 5% FASTER than if you remove the
extra WL. Very puzzling.
I'm guessing some strangeness in the back end, because eyeballing the code
did not turn up any other differences that I could see.
Perhaps some backendian type can look at this?
PAPI output:
without extra loop:
crud.sac.exe.O3.papiex.rattler.6186:PAPI_TOT_INS: 105104480
with extra loop:
crud.sac.exe.O3.papiex.rattler.6353:PAPI_TOT_INS: 100104533