Skip to content
GitLab
  • Menu
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
  • sac2c sac2c
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 393
    • Issues 393
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 24
    • Merge requests 24
  • Deployments
    • Deployments
    • Releases
  • Wiki
    • Wiki
  • External wiki
    • External wiki
  • Activity
  • Graph
  • Create a new issue
  • Commits
  • Issue Boards
Collapse sidebar
  • sac-group
  • sac2csac2c
  • Merge requests
  • !293

Shray-based distributed memory backend

  • Review changes

  • Download
  • Email patches
  • Plain diff
Merged Thomas Koopman requested to merge thomas/sac2c:distmem-minimal into develop Apr 04, 2024
  • Overview 76
  • Commits 120
  • Changes 153

Features

  • Parallel genarray, modarray, foldarray

  • Parallel homogeneous multioperator with-loops

  • Correct (though inefficient) handling of side-effects

  • In-place reshape

Testing

The CFAL benchmarks compile, run, and compute the correct result with one exception: the quickselect in the initalisation of MG exhausts kernel resources as it does 10s of thousands of ShrayMallocs. Quickselect is a horrible algorithm for distributed memory anyway, so I do not think this is a problem.

Gaussian blur and nbody show reasonable speedups on the cluster.

The fancy 2D stencil code (blocked with overlapping tiles) computes the correct result.

The blocked matmul verifies, but does need to run with SHRAY_CACHELINE=10 in order to not exhaust kernel resources due to the sheer number of segfaults. This suggests we may want to make SHRAY_CACHELINE allocation dependend and set the default higher than 1 in Shray. E.g. turn the local part of an allocation into a fixed number of chunks. On my laptop get the expected speedups for Shray: no from 1 -> 2, 2x from 2 -> 4.

We also need to turn of phm as intercepting malloc interferes with GASNet initialisation function in some way

Edited Jun 27, 2024 by Thomas Koopman
Assignee
Assign to
Reviewer
Request review from
Time tracking
Source branch: distmem-minimal