Skip to content
GitLab
  • Menu
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
  • sac2c sac2c
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 393
    • Issues 393
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 24
    • Merge requests 24
  • Deployments
    • Deployments
    • Releases
  • Wiki
    • Wiki
  • External wiki
    • External wiki
  • Activity
  • Graph
  • Create a new issue
  • Commits
  • Issue Boards
Collapse sidebar
  • sac-group
  • sac2csac2c
  • Merge requests
  • !322

Draft: Insert machinery for resizing the hive

  • Review changes

  • Download
  • Email patches
  • Plain diff
Closed Jordy Aaldering requested to merge mt-beehive-sleeping-bees into develop Nov 13, 2024
  • Overview 0
  • Commits 13
  • Changes 8

Inserted machinery for resizing the hive (number of worked threads) by putting bees to sleep using a semaphore.

 * Whenever worker bees are created they enter a busy wait, waiting for a signal
 * from the queen to start execution. When they need to work, the queen sends a
 * signal to all bees by flipping a global flag. Bees then do their work and
 * synchronise with the queen, after which they loop back and enter the busy
 * wait again.
 *
 * Worker bees can be put to sleep, which happens through a semaphore, freeing
 * up those threads for other processes on the system. The hive may be resized
 * from the `SAC_MT_PTH_do_spmd_execute` function, in `mt_pth.h`. Resizing the
 * hive is done right before signaling the start-barrier.
 *
 * We have three cases:
 *  1) The hive shrinks; i.e. awake bees are put to sleep.
 *  2) The hive grows; i.e. currently sleeping bees are woken.
 *  3) The hive stays the same; nothing happens.
 *
 * Whenever the hive shrinks, the value of `num_awake_bees` is updated.
 * When bees are signalled and exit their busy wait, they do a check of their
 * thread-index and the value of num_awake_bees. If their index is less than
 * num_awake_bees, they continue as normal and do their allotted work.
 * Otherwise, it means that they should be asleep. In this case they instead
 * wait on a semaphore.
 *
 * When the hive grows, the value of `num_awake_bees` is updated. The semaphores
 * of all bees that were previously asleep, and should now be awake again, are
 * posted. When this happens, those bees become awake again and they loop back
 * into the busy wait, just like the other awake bees. Then, when the queen
 * signals the bees to go back to work, all of those awake bees will start
 * processing again.
 *
 * The beautiful ascii art below shows the happy-flow of this process. There is
 * a single queen, and in this case we have two worker bees. On the y-axis is
 * time. First, the hive is resized and bee 2 is marked as asleep. Then the
 * queen sends a signal and both bees exit their busy wait. Bee 1 starts
 * processing, bee 2 enters the semaphore. When bee 1 completes its work it
 * synchronises with the queen, and loops back to its busy wait. Then the queen
 * resizes the hive again. It posts the semaphore of bee 2. This wakes bee 2 up,
 * and it then also loops back to its busy wait. The queen then sends a signal
 * again, and both bees exit their busy wait and start processing.
 *
 *         Queen  Bee  Bee
 *           |     |    |
 *           |     ⟳   ⟳    Bees enter busy wait.
 *           |
 *           |     W    S
 * Resize(1) |-----∧----∧     Mark bee 2 as asleep.
 *           |
 * Signal    |-----∧----∧     Signal all bees to start execution.
 *           |     |    |
 *           |     |    ☰    Bee 2 reaches sleep semaphore.
 *           |     |
 *           |     |    ∧     Semaphore is posted, bee 2 enters busy wait.
 *           |     |    |
 * Sync      |<----|    |     Bee 1 is done and synchronises.
 *           |     |    |
 *           |     ⟳   |     Bee 1 enters busy wait.
 *           |          |
 *           |     W    W
 * Resize(2) |-----∧----∧     Wake up bee 2 by posting its semaphore.
 *           |          v
 *           |          |
 *           |          ⟳    Bee 2 enters busy wait after semaphore is posted.
 *           |
 * Signal    |-----∧----∧     Signal all bees to start execution.
 *           |     |    |
 *          ...   ...  ...
 *
 * However, this can potentially go wrong! We can have a case where, after bee
 * 2 is woken up, it takes too long for it to loop back to its busy waiting
 * cycle. In then case, if the queen then sends the signal to all bees to start
 * processing, bee 2 will miss the signal, and will get stuck in its busy wait,
 * causing the program to stall at that point.
 *
 *         Queen  Bee  Bee
 *           |     |    |
 *           |     ⟳   ⟳    Bees enter busy wait.
 *           |
 *           |     W    S
 * Resize(1) |-----∧----∧     Mark bee 2 as asleep.
 *           |
 * Signal    |-----∧----∧     Signal all bees to start execution.
 *           |     |    |
 *           |     |   ...
 * Sync      |<----|          Bee 1 is done and synchronises.
 *           |     |
 *           |     ⟳         Bee 1 enters busy wait.
 *           |
 *           |     W    W
 * Resize(2) |-----∧----∧     Wake up bee 2 by posting its semaphore.
 *           |                The semaphore is not locked, so it is set to 1.
 *           |
 * Signal    |-----∧----∧     Signal all bees to start execution.
 *           |     |    X     Bee 2 does not observe the signal!
 *           |     |
 *           |     |   ...
 *           |     |    |
 *           |     |    ☰    Bee 2 reaches sleep semaphore. The value of the
 *                            semaphore is 1, so the value is decremented and
 *                            this bee immediately wakes up again. However we
 *                            are too late, as the queen has already given
 *                            the signal.
 *
 * Luckily the solution is relatively simple. Alongside `num_awake_bees`, we
 * keep track of an atomic value `bees_not_yet_tucked_in`. This value is the
 * number of bees that should be sleeping (i.e. the hive was shrunk), but that
 * have not reached their semaphore yet. When the hive is shrunk, this value is
 * set. Each bee then decrements this value right before entering its semaphore.
 * When the hive grows, we first busy wait for this atomic value to reach zero,
 * ensuring that all bees that should be asleep are actually asleep.
Edited Dec 10, 2024 by Jordy Aaldering
Assignee
Assign to
Reviewer
Request review from
Time tracking
Source branch: mt-beehive-sleeping-bees