Merged Thomas Koopman requested to merge thomas/sac2c:schedule_2d into develop Jul 10, 2025

Motivation

The first axis may be too small to use all threads for high-ranked arrays. This merge request supports a 2-dimensional thread space. For example, instead of thread layout

T(0)
--
T(1)
--
T(2)
--
T(3)

we may want a layout

T(0, 0) | T(0, 1)
-----------------
T(1, 0) | T(1, 1)

Problems

The downside of this, is that each thread may have more than one cacheline that is shared with a different thread, leading to false sharing. So this should only be done if necessary.

Implementation

Let X be an T[m, n, d:shp] array and suppose we have p processors. We want a minimal number of rows to minimize false sharing. For this we use the function

/**
 * Returns the largest q | p such that p1 <= m
 **/
int largest_div(int m, int p)
{
    int q         = 1;
    int candidate = 1;

    while (candidate <= m && candidate <= p) {
        if (p % candidate == 0) {
            q = candidate;
        }
        candidate++;
    }

    return q;
}

to factorize p. We then do a block distribution on the first axis with q and the second axis with p / q.

This has as effect that for p <= m, the grid will be p x 1, or the same as before this merge request. The only overhead is calculating the processor grid.

Future work

We (meaning not me) could extend this to arbitrarily many dimensions.

Edited Jul 10, 2025 by Jordy Aaldering

2-dimensional scheduling

Motivation

Problems

Implementation

Future work