Skip to content
GitLab
  • Menu
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
  • sac2c sac2c
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 403
    • Issues 403
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 12
    • Merge requests 12
  • Deployments
    • Deployments
    • Releases
  • Wiki
    • Wiki
  • External wiki
    • External wiki
  • Activity
  • Graph
  • Create a new issue
  • Commits
  • Issue Boards
Collapse sidebar
  • sac-group
  • sac2csac2c
  • Merge requests
  • !427

2-dimensional scheduling

  • Review changes

  • Download
  • Email patches
  • Plain diff
Merged Thomas Koopman requested to merge thomas/sac2c:schedule_2d into develop Jul 10, 2025
  • Overview 1
  • Commits 7
  • Changes 8

Motivation

The first axis may be too small to use all threads for high-ranked arrays. This merge request supports a 2-dimensional thread space. For example, instead of thread layout

T(0)
--
T(1)
--
T(2)
--
T(3)

we may want a layout

T(0, 0) | T(0, 1)
-----------------
T(1, 0) | T(1, 1)

Problems

The downside of this, is that each thread may have more than one cacheline that is shared with a different thread, leading to false sharing. So this should only be done if necessary.

Implementation

Let X be an T[m, n, d:shp] array and suppose we have p processors. We want a minimal number of rows to minimize false sharing. For this we use the function

/**
 * Returns the largest q | p such that p1 <= m
 **/
int largest_div(int m, int p)
{
    int q         = 1;
    int candidate = 1;

    while (candidate <= m && candidate <= p) {
        if (p % candidate == 0) {
            q = candidate;
        }
        candidate++;
    }

    return q;
}

to factorize p. We then do a block distribution on the first axis with q and the second axis with p / q.

This has as effect that for p <= m, the grid will be p x 1, or the same as before this merge request. The only overhead is calculating the processor grid.

Future work

We (meaning not me) could extend this to arbitrarily many dimensions.

Edited Jul 10, 2025 by Jordy Aaldering
Assignee
Assign to
Reviewer
Request review from
Time tracking
Source branch: schedule_2d