Skip to content
GitLab
  • Menu
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
  • sac2c sac2c
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 417
    • Issues 417
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 17
    • Merge requests 17
  • Deployments
    • Deployments
    • Releases
  • Wiki
    • Wiki
  • External wiki
    • External wiki
  • Activity
  • Graph
  • Create a new issue
  • Commits
  • Issue Boards
Collapse sidebar
  • sac-group
  • sac2csac2c
  • Issues
  • #2536
Closed
Open
Created Feb 05, 2026 by Thomas Koopman@thomasDeveloper

Performance bug: typechecker does not infer shape of fold array when the fold operation lives in a module

Consider the following blocked-matrix multiplication code and note that we have only one non-scalar sum, namely the one corresponding to K0B. The compiler can figure out the shape of the elements we sum, but only the shape of the result (which is wrapped in a genarray) when we use our own arr_plus, not Array::plus. To see this, try

sac2c_p -bopt mm.sac > mm.opt
sac2c_p -DSTDLIB -bopt mm.sac > mm_stdlib.opt

The outer shape should be double[64,20,3,125,15,1,6,8], but in the stdlib version that variable becomes double[*] (it is called _pinl_27647__flat_294 / _pinl_24586__mose_9 in my build).

mm.sac:

#ifndef M0B
    #define M0B  64
#endif
#ifndef K0B
    #define K0B  40
#endif
#ifndef N0B
    #define N0B  20
#endif

#define M1B  3
#define K1B  1
#define N1B  125

#define M2B  15
#define K2B  1
#define N2B  1

#define M3B  6
#define K3B  500
#define N3B  8

#define N   (N0B * N1B * N2B * N3B)
#define M   (M0B * M1B * M2B * M3B)
#define K   (K0B * K1B * K2B * K3B)

#define ASZS  M0B, K0B, M1B, K1B,  M2B, K2B,  M3B, K3B
#define BSZS  K0B, N0B, K1B, N1B,  K2B, N2B,  K3B, N3B

use Array: all;
use StdIO: all;
use Benchmarking: all;
use Numerical: all;

inline
double[d:shp] arr_plus(double[d:shp] a, double[d:shp] b)
{
    return {iv -> _add_SxS_(a[iv], b[iv])};
}

specialize double[.,.,.,.,.,.,.,.]
mmy(double[ASZS] a, int[0] ia,
    double[BSZS] b, int[0] ib);

specialize double[.,.,.,.,.,.]
mmy(double[ASZS] a, int[2] ia,
    double[BSZS] b, int[2] ib);
specialize double[.,.,.,.]
mmy(double[ASZS] a, int[4] ia,
    double[BSZS] b, int[4] ib);

inline double[.,.]
mmy (double[ASZS] a, int[6] ia,
     double[BSZS] b, int[6] ib)
{
    return {[i, j] -> with {
                        ([0] <= [p] < [K3B]): a[ia++[i, p]] * b[ib++[p, j]];
                      }: fold(+, 0d)
                   | [i, j] < [M3B, N3B]};
}

inline double[*]
mmy(double[ASZS] a, int[d] ia,
    double[BSZS] b, int[d] ib)
{
  sha = drop([d], shape(a));
  shb = drop([d], shape(b));

  shc = {[l] -> sha[l] | [0] <= [l] < (shape([ASZS]) - [d]) step [2];
         [l] -> shb[l] | [1] <= [l] < (shape([BSZS]) - [d]) step [2]};
  shco = take([2], shc);
  shci = drop([2], shc);

  return {[i,j] -> with {
                    ([0] <= [l] < [sha[1]]): mmy (a, (ia ++ [i,l]),
                                                  b, (ib ++ [l,j]));
#if STDLIB
                   }: fold(+, with {}: genarray(shci, 0d))
#else
                   }: fold(arr_plus, with {}: genarray(shci, 0d))
#endif
                 | [i,j] < shco};
}

int main()
{
    a2 = {iv -> tod(1) | iv < [M0B, K0B, M1B, K1B, M2B, K2B, M3B, K3B]};
    b2 = {iv -> tod(1) | iv < [K0B, N0B, K1B, N1B, K2B, N2B, K3B, N3B]};

    c2 = mmy(a2, [], b2, []);

    return toi(sum(c2));
}

SaC version is

sac2c 2.1.0-PuurGeluk-205-gbed27-dirty
build-type: RELEASE
built-by: "thomas" at 2026-02-06T10:58:16

dirty because my ext/SHRAYonUCX has new commits, I promise that I triple-checked there are no other changes and did a clean rebuild.

Edited Feb 06, 2026 by Thomas Koopman
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Assignee
Assign to
Time tracking