First axis reduction of tensor runs 10X slower than reduction of entire tensor
|
|
Bugzilla Link |
754 |
Created on |
Sep 30, 2010 19:46 |
Version |
svn |
OS |
Linux |
Architecture |
PC |
Attachments |
crud.sac |
Extended Description
Created an attachment (id=758)
source code to reproduce fault
The attached code performs a sum reduction over the first axis of
a rank-3 array, if compiled with:
sac2c -O3 crud.sac -DSLOW
The resulting code executes in roughly 7 seconds on a 3GHz Opteron.
If I have the Ubuntu system monitor running, I see that memory usage
creeps up DURING the execution of the reduction. This is surprising,
as I would naively expect that all allocations would be done before
we enter the loop.
If compiled with:
sac2c -O3 crud.sac
The resulting code performs a sum() over the entire tensor, and executes in
about 0.85 seconds.
The offending function is likely this one:
inline int[+] plussl1XBIFOLD(bool[+] y)
{ /* first-axis reduce rank-3 or greater matrix */
yt = transpose(y);
zrho = drop([-1], shape(yt));
z = with {
(. <= iv <= .)
: sum(toi((yt[iv])));
} : genarray(zrho, 0);
return(z);
}
Perhaps there is a better way to express such a reduction?
The idea here is that an argument of shape [ 10,20,30] will
give a result shape of [20,30].
Part of the problem is that the reduction array shape is AKD,
which is causing some WLF opportunities to be missed.
However, declaring the reduce argument this way:
bool[3000, 15000,3] A_23;
still leaves the -DSLOW code running about 6X slower than
the other code.
This on: product rev 17069:MODIFIED linux-gnu_x86_64