Skip to content
GitLab
  • Menu
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
  • sac2c sac2c
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 394
    • Issues 394
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 16
    • Merge requests 16
  • Deployments
    • Deployments
    • Releases
  • Wiki
    • Wiki
  • External wiki
    • External wiki
  • Activity
  • Graph
  • Create a new issue
  • Commits
  • Issue Boards
Collapse sidebar
  • sac-group
  • sac2csac2c
  • Merge requests
  • !387

Support SHRAYonUCX for better performance on systems supporting DRMA

  • Review changes

  • Download
  • Email patches
  • Plain diff
Open Thomas Koopman requested to merge thomas/sac2c:SHRAYonUCX into develop May 01, 2025
  • Overview 0
  • Commits 14
  • Changes 32

GASNet is too high-level to implement Shray efficiently. The algorithm for pinning memory on HCA does not work properly for the SEGMENT_EVERYTHING configuration, which severely limits its performance on systems supporting RDMA.

Both MPICH and OpenMPI are built on a library called UCX, which is a very good low-level library for distributed memory. SHRAYonUCX is a re-implementation of Shray built on top of this. It has as shortcoming that we can only use one rank per physical machine, meaning it is hard to debug. For this reason, this merge request supports both implementations through targets distmem_shray and distmem_ucx

The main difference of distmem_ucx is that it is embedded in an MPI application (for OOB connection and routines like MPI_Bcast). So we generate

MPI_Init_thread
Shray_Init
...
Shray_Finalize
MPI_Finalize
Edited May 09, 2025 by Thomas Koopman
Assignee
Assign to
Reviewer
Request review from
Time tracking
Source branch: SHRAYonUCX