Livermore Loop loop09 sparse matrix multiply performance problem w/ -dolacsi
|Created on||Mar 14, 2013 15:27|
Created an attachment (id=952) source code to reproduce fault sac2c -V sac2c v1.00-beta (Haggis And Apple) product rev 18069 linux-gnu_x86_64 (Wed Mar 13 17:40:45 EDT 2013 by sac) I have made the SAC and C versions of loop09 match again, in the sense that they return the same results. I have also reverted loop09 to the earlier, imperative version, and created loop09A, a version that uses the sparse vector-matrix multiply that I use in APEX. Results are good, Almost... Here are the CPU times and cache miss rates for same: sac2c loop09A.sac -O3 -dolacsi -doawlf -nowlf -O3 CPU (usec) L1miss L2miss loop09.c 2210390 20547020 1304 wlf loop09.sac 5228058 20041898 5567 awlf loop09.sac 2388230 19047599 2421 wlf loop09A.sac 10575784 64461221 6619 awlf loop09A.sac 4779216 105809057 3539 It is the last line that is puzzling. IMO, its performance should be extremely close to that of the third line. But, it ain't. What is going on is, essentially, this: - Originally, Cond_0() in the inner loop is passed a 1-element vector V. It uses V0=V as GENERATOR_BOUND2( [V0]) and GENARRAY_SHAPE( [ V0]). - -dolacsi allows elimination of the sel V0 = V, - Someone (SAA?) generates V' = [ V0] as the result shape of the Cond_0 result. - Eventually, we end up with a funcond at the end of Cond_0(), roughly of this form: V' = [ V0]; shp = flat_1 ? V' : V; Both legs of the funcond match, so this is really just: shp = V. That should get shp lifted out of Cond_0(), but nothing has enough smarts to do that. We end up, therefore, with all this baggage in the inner loop, of which the building of V' is causing most of the harm. Possible actions: 1. I do not think we can do much in LACSI about these things. The shp-related code goes through a lot of optimization before we get to the point above. 2. We do happen to have AVIS_SCALARS( V) = [ V0]. It should be possible to make CF or one of its buddies that looks at funconds check to see if one funcond argument is an N_array that matches the AVIS_SCALARS of the other and, if so, replace things appropriately: shp = flat_1 ? V : V; [Premature replacement by: shp = V; is bad, because of the delicate nature of the funcond structure. That will get done elsewhere.] The latter is fairly straightforward, so I'm going to try that approach.
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information