Skip to content
GitLab
  • Menu
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
  • sac2c sac2c
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 394
    • Issues 394
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 16
    • Merge requests 16
  • Deployments
    • Deployments
    • Releases
  • Wiki
    • Wiki
  • External wiki
    • External wiki
  • Activity
  • Graph
  • Create a new issue
  • Commits
  • Issue Boards
Collapse sidebar
  • sac-group
  • sac2csac2c
  • Issues
  • #1077
Closed
Open
Created Sep 30, 2010 by Robert Bernecky@rbeDeveloper

First axis reduction of tensor runs 10X slower than reduction of entire tensor

Bugzilla Link 754
Created on Sep 30, 2010 19:46
Version svn
OS Linux
Architecture PC
Attachments crud.sac

Extended Description

Created an attachment (id=758)
source code to reproduce fault
The attached code performs a sum reduction over the first axis of
a rank-3 array, if compiled with:
  sac2c -O3 crud.sac -DSLOW
The resulting code executes in roughly 7 seconds on a 3GHz Opteron.
If I have the Ubuntu system monitor running, I see that memory usage
creeps up DURING the execution of the reduction. This is surprising,
as I would naively expect that all allocations would be done before
we enter the loop.
If compiled with:
  sac2c -O3 crud.sac 
The resulting code performs a sum() over the entire tensor, and executes in 
about 0.85 seconds.
The offending function is likely this one:
inline int[+] plussl1XBIFOLD(bool[+] y)
{ /* first-axis reduce rank-3 or greater matrix */
  yt = transpose(y);
  zrho = drop([-1], shape(yt));
  z = with {
        (. <= iv <= .)
                : sum(toi((yt[iv])));
        } : genarray(zrho, 0);
  return(z);
}
Perhaps there is a better way to express such a reduction?
The idea here is that an argument of shape [ 10,20,30] will
give a result shape of [20,30].
Part of the problem is that the reduction array shape is AKD,
which is causing some WLF opportunities to be missed.
However, declaring the reduce argument this way:
 bool[3000, 15000,3] A_23;
still leaves the -DSLOW code running about 6X slower than
the other code.
This on: product rev 17069:MODIFIED linux-gnu_x86_64
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Assignee
Assign to
Time tracking