Skip to content
GitLab
  • Menu
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
  • sac2c sac2c
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 403
    • Issues 403
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 12
    • Merge requests 12
  • Deployments
    • Deployments
    • Releases
  • Wiki
    • Wiki
  • External wiki
    • External wiki
  • Activity
  • Graph
  • Create a new issue
  • Commits
  • Issue Boards
Collapse sidebar
  • sac-group
  • sac2csac2c
  • Issues
  • #1245
Closed
Open
Created Mar 14, 2013 by Robert Bernecky@rbeDeveloper

Livermore Loop loop09 sparse matrix multiply performance problem w/ -dolacsi

Bugzilla Link 1052
Created on Mar 14, 2013 15:27
Version svn
OS Linux
Architecture PC
Attachments loop09A.sac

Extended Description

Created an attachment (id=952)
source code to reproduce fault
sac2c -V
sac2c v1.00-beta (Haggis And Apple)
 product rev 18069 linux-gnu_x86_64
 (Wed Mar 13 17:40:45 EDT 2013 by sac)
I have made the SAC and C versions of loop09 match again, in the sense
that they return the same results. I have also reverted loop09 to the
earlier, imperative version, and created loop09A, a version that uses
the sparse vector-matrix multiply that I use in APEX. 
Results are good, Almost... Here are the CPU times and cache miss rates 
for same:
sac2c loop09A.sac -O3 -dolacsi -doawlf -nowlf -O3
   
                 CPU (usec) L1miss    L2miss
     loop09.c     2210390   20547020  1304
wlf  loop09.sac   5228058   20041898  5567
awlf loop09.sac   2388230   19047599  2421
wlf  loop09A.sac 10575784   64461221  6619
awlf loop09A.sac  4779216  105809057  3539
It is the last line that is puzzling. IMO, its performance should be
extremely close to that of the third line. But, it ain't.
What is going on is, essentially, this:
 - Originally, Cond_0() in the inner loop is passed a 1-element vector V. 
   It uses V0=V[0] as GENERATOR_BOUND2( [V0]) and GENARRAY_SHAPE( [ V0]).
 - -dolacsi allows elimination of the sel V0 = V[0], 
 - Someone (SAA?) generates V' = [ V0] as the result shape of the
   Cond_0 result.
 - Eventually, we end up with a funcond at the end of Cond_0(), 
   roughly of this form:
    V' = [ V0];
    shp = flat_1 ? V' : V;
   Both legs of the funcond match, so this is really just: shp = V.
   That should get shp lifted out of Cond_0(), but nothing has
   enough smarts to do that.
   We end up, therefore, with all this baggage in the inner loop,
   of which the building of V' is causing most of the harm.
Possible actions:
1. I do not think we can do much in LACSI about these things. 
   The shp-related code goes through a lot of optimization before
   we get to the point above.
2. We do happen to have AVIS_SCALARS( V) = [ V0]. It should be possible
   to make CF or one of its buddies that looks at funconds check to see
   if one funcond argument is an N_array that matches the AVIS_SCALARS
   of the other and, if so, replace things appropriately:
    shp = flat_1 ? V : V;
   [Premature replacement by: shp = V; is bad, because of the delicate
    nature of the funcond structure. That will get done elsewhere.]
The latter is fairly straightforward, so I'm going to try that approach.
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Assignee
Assign to
Time tracking