Distmem backend does not schedule correctly

Suppose we have two nodes and three threads per node. Then we want scheduling

------------------------------
N(0) T(0)
------------------------------
N(0) T(1)
------------------------------
N(0) T(2)
------------------------------
N(1) T(0)
------------------------------
N(1) T(1)
------------------------------
N(1) T(2)
------------------------------

But what currently happens is

    indout ("SAC_WL_SCHEDULE_START( %d) = SAC_CLIP_LB0 (%s);\n",
            0, lower_bound[0]);
    indout ("SAC_WL_SCHEDULE_STOP( %d) = SAC_CLIP_UB0 (%s);\n",
            0, upper_bound[0]);

where lower_bound, upper_bound are computed by the multithreaded backend, and the CLIP macro's intersect [lower_bound, upper_bound) with [ShrayStart, ShrayEnd). So this gives actual scheduling

------------------------------
T(0)
T(0)
------------------------------
T(1)
T(1)
------------------------------
T(2)
T(2)
------------------------------

                                / \
                              /    \
                             /      \
                      Node 0 v       v Node 1
------------------------------     
T(0)
T(0)
------------------------------
T(1)
                                T(1)
                                ----------------------
                                T(2)
                                T(2)
                                ----------------------

which we do not want because it does not use all threads. Instead, we want it the other way around: first clip on the Shray level, and then pass it to the multithreaded scheduler.

Edited Jun 04, 2025 by Thomas Koopman

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information