Distmem backend does not schedule correctly
Suppose we have two nodes and three threads per node. Then we want scheduling
------------------------------
N(0) T(0)
------------------------------
N(0) T(1)
------------------------------
N(0) T(2)
------------------------------
N(1) T(0)
------------------------------
N(1) T(1)
------------------------------
N(1) T(2)
------------------------------
But what currently happens is
indout ("SAC_WL_SCHEDULE_START( %d) = SAC_CLIP_LB0 (%s);\n",
0, lower_bound[0]);
indout ("SAC_WL_SCHEDULE_STOP( %d) = SAC_CLIP_UB0 (%s);\n",
0, upper_bound[0]);
where lower_bound, upper_bound
are computed by the multithreaded backend, and the CLIP
macro's intersect
[lower_bound, upper_bound)
with [ShrayStart, ShrayEnd)
. So this gives actual scheduling
------------------------------
T(0)
T(0)
------------------------------
T(1)
T(1)
------------------------------
T(2)
T(2)
------------------------------
/ \
/ \
/ \
Node 0 v v Node 1
------------------------------
T(0)
T(0)
------------------------------
T(1)
T(1)
----------------------
T(2)
T(2)
----------------------
which we do not want because it does not use all threads. Instead, we want it the other way around: first clip on the Shray level, and then pass it to the multithreaded scheduler.