Dismal performance of indexed reference in LACFUNs, e.g. Livermore Loop loop15
Bugzilla Link | 1066 |
Created on | Apr 19, 2013 21:28 |
Version | svn |
OS | Linux |
Architecture | PC |
Attachments | loop15.sac |
Extended Description
Created an attachment (id=968) source code to reproduce fault I have been looking at the performance, or lack thereof, of Livermore Loop loop15. It currently runs about 2 minutes, vs. 6 seconds for the C code. It contains code like this: ret2 = with { ([0,0] <= iv < [5,99]) { if( VF[iv+1] >= VF[iv+[1,0]]) { if( VH[iv+[2,1]] > VH[iv+1]) { val = sqrt( VGs[iv+1] + sq( max( VH[iv+1], VH[iv+[2,1]]))) * 0.053d / VF[iv+1]; } else { val = sqrt( VGs[iv+1] + sq( max( VH[iv+1], VH[iv+[2,1]]))) * 0.073d / VF[iv+1]; } } else { if( VH[iv+[2,1]] > VH[iv+1]) { val = sqrt( VGs[iv+1] + sq( max( VH[iv+[1,0]], VH[iv+[2,0]]))) * 0.053d / VF[iv+1]; } else { val = sqrt( VGs[iv+1] + sq( max( VH[iv+[1,0]], VH[iv+[2,0]]))) * 0.073d / VF[iv+1]; } } } : val; ... You get the idea... I think what happens is that NONE of the code in the CONDFUNs is WL-folded. Furthermore, there is no chance to use WLIDX in the LACFUNs. The immediate fix for the sac code here is this. Consider the last IF() code block. That can be written so that the LACFUN has no indexing, and the indexing stuff can remain in the WL's basic block: numer = ( VH[iv+]2,1]] > VH[iv+1]) ? 0.53d : 0.73d; val = sqrt( VGs[iv+1] + sq( max( VH[iv+[1,0]], VH[iv+[2,0]]))) * numer / VF[iv+1]; This is not, however, a panacea, because other applications are not so amenable to this sort of refactoring. I.e., consider binary search, heapsort, and the like. Some redesigns we might consider, aside from scrapping the whole LACFUN idea, include: - pushing wlidx into LACFUNs. (Perhaps this is already done, but I did not see evidence of it.) - making LIR fancier for CONDFUNs. I.e., in the above ultimate IF(), the val= blocks are nearly identical in both legs, so the identical parts could be moved out of the LACFUN. I think the latter offers the biggest immediate advantages. This bug also explains a lot about why many real-world SAC applications don't work nearly as well as we expect: I.e., our (my) naive expectation is that scalar-oriented SAC code should perform as well as the equivalent C code.
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information