Performance bug: typechecker does not infer shape of fold array when the fold operation lives in a module
Consider the following blocked-matrix multiplication code and note that we have only one non-scalar sum, namely the one corresponding to K0B. The compiler can figure out the shape of the elements we sum, but only the shape of the result (which is wrapped in a genarray) when we use our own arr_plus, not Array::plus. To see this, try
sac2c_p -bopt mm.sac > mm.opt
sac2c_p -DSTDLIB -bopt mm.sac > mm_stdlib.opt
The outer shape should be double[64,20,3,125,15,1,6,8], but in the stdlib version that variable becomes double[*] (it is called _pinl_27647__flat_294 / _pinl_24586__mose_9 in my build).
mm.sac:
#ifndef M0B
#define M0B 64
#endif
#ifndef K0B
#define K0B 40
#endif
#ifndef N0B
#define N0B 20
#endif
#define M1B 3
#define K1B 1
#define N1B 125
#define M2B 15
#define K2B 1
#define N2B 1
#define M3B 6
#define K3B 500
#define N3B 8
#define N (N0B * N1B * N2B * N3B)
#define M (M0B * M1B * M2B * M3B)
#define K (K0B * K1B * K2B * K3B)
#define ASZS M0B, K0B, M1B, K1B, M2B, K2B, M3B, K3B
#define BSZS K0B, N0B, K1B, N1B, K2B, N2B, K3B, N3B
use Array: all;
use StdIO: all;
use Benchmarking: all;
use Numerical: all;
inline
double[d:shp] arr_plus(double[d:shp] a, double[d:shp] b)
{
return {iv -> _add_SxS_(a[iv], b[iv])};
}
specialize double[.,.,.,.,.,.,.,.]
mmy(double[ASZS] a, int[0] ia,
double[BSZS] b, int[0] ib);
specialize double[.,.,.,.,.,.]
mmy(double[ASZS] a, int[2] ia,
double[BSZS] b, int[2] ib);
specialize double[.,.,.,.]
mmy(double[ASZS] a, int[4] ia,
double[BSZS] b, int[4] ib);
inline double[.,.]
mmy (double[ASZS] a, int[6] ia,
double[BSZS] b, int[6] ib)
{
return {[i, j] -> with {
([0] <= [p] < [K3B]): a[ia++[i, p]] * b[ib++[p, j]];
}: fold(+, 0d)
| [i, j] < [M3B, N3B]};
}
inline double[*]
mmy(double[ASZS] a, int[d] ia,
double[BSZS] b, int[d] ib)
{
sha = drop([d], shape(a));
shb = drop([d], shape(b));
shc = {[l] -> sha[l] | [0] <= [l] < (shape([ASZS]) - [d]) step [2];
[l] -> shb[l] | [1] <= [l] < (shape([BSZS]) - [d]) step [2]};
shco = take([2], shc);
shci = drop([2], shc);
return {[i,j] -> with {
([0] <= [l] < [sha[1]]): mmy (a, (ia ++ [i,l]),
b, (ib ++ [l,j]));
#if STDLIB
}: fold(+, with {}: genarray(shci, 0d))
#else
}: fold(arr_plus, with {}: genarray(shci, 0d))
#endif
| [i,j] < shco};
}
int main()
{
a2 = {iv -> tod(1) | iv < [M0B, K0B, M1B, K1B, M2B, K2B, M3B, K3B]};
b2 = {iv -> tod(1) | iv < [K0B, N0B, K1B, N1B, K2B, N2B, K3B, N3B]};
c2 = mmy(a2, [], b2, []);
return toi(sum(c2));
}
SaC version is
sac2c 2.1.0-PuurGeluk-205-gbed27-dirty
build-type: RELEASE
built-by: "thomas" at 2026-02-06T10:58:16
dirty because my ext/SHRAYonUCX has new commits, I promise that I triple-checked there are no other changes and did a clean rebuild.