|
|
Bugzilla Link |
1066 |
Created on |
Apr 19, 2013 21:28 |
Version |
svn |
OS |
Linux |
Architecture |
PC |
Attachments |
loop15.sac |
Extended Description
Created an attachment (id=968)
source code to reproduce fault
I have been looking at the performance, or lack thereof,
of Livermore Loop loop15. It currently runs about 2 minutes,
vs. 6 seconds for the C code.
It contains code like this:
ret2 = with {
([0,0] <= iv < [5,99]) {
if( VF[iv+1] >= VF[iv+[1,0]]) {
if( VH[iv+[2,1]] > VH[iv+1]) {
val = sqrt( VGs[iv+1] + sq( max( VH[iv+1], VH[iv+[2,1]])))
* 0.053d / VF[iv+1];
} else {
val = sqrt( VGs[iv+1] + sq( max( VH[iv+1], VH[iv+[2,1]])))
* 0.073d / VF[iv+1];
}
} else {
if( VH[iv+[2,1]] > VH[iv+1]) {
val = sqrt( VGs[iv+1] + sq( max( VH[iv+[1,0]], VH[iv+[2,0]])))
* 0.053d / VF[iv+1];
} else {
val = sqrt( VGs[iv+1] + sq( max( VH[iv+[1,0]], VH[iv+[2,0]])))
* 0.073d / VF[iv+1];
}
}
} : val;
...
You get the idea...
I think what happens is that NONE of the code in the CONDFUNs is WL-folded.
Furthermore, there is no chance to use WLIDX in the LACFUNs.
The immediate fix for the sac code here is this. Consider
the last IF() code block. That can be written so that the LACFUN
has no indexing, and the indexing stuff can remain in the WL's basic
block:
numer = ( VH[iv+]2,1]] > VH[iv+1]) ? 0.53d : 0.73d;
val = sqrt( VGs[iv+1] + sq( max( VH[iv+[1,0]], VH[iv+[2,0]])))
* numer / VF[iv+1];
This is not, however, a panacea, because other applications are
not so amenable to this sort of refactoring. I.e., consider
binary search, heapsort, and the like.
Some redesigns we might consider, aside from scrapping the whole
LACFUN idea, include:
- pushing wlidx into LACFUNs. (Perhaps this is already done, but
I did not see evidence of it.)
- making LIR fancier for CONDFUNs. I.e., in the above ultimate IF(),
the val= blocks are nearly identical in both legs, so the identical
parts could be moved out of the LACFUN.
I think the latter offers the biggest immediate advantages.
This bug also explains a lot about why many real-world SAC applications
don't work nearly as well as we expect: I.e., our (my) naive expectation
is that scalar-oriented SAC code should perform as well as the
equivalent C code.