Reimplement the distributed memory backend based on Shray to ensure memory-scalability, and potentially greater performance.
Multithreaded + distributed status:
The ground works is there: all the parallelism is created by restricting the bounds and in principle orthogonal as to how these restricted with-loops are computed. However, the build system needs to be adjusted and probably some work needs to go into proper initialization.
Side-effects
A function is assumed to have side-effects if it is an external function consuming a unique object, or a user-defined function type casting a unique object in its body. In that case, the function is computed only on the source node, and return arguments are broadcast. For now, we also consider functions that take or return hidden objects as having side-effects, though this can be relaxed in the future.
Performance and correctness
This branch should behave correctly for all inputs as we fall back to replicated execution for constructs we cannot parallelize. Performance is outside of the scope of this merge request.
Dependencies
This branch will only build if gasnet
is installed, and environment variable GASNet_ROOT
has been set to the installation. Shray needs the MPI or UDP backend of GASNet, so one of these dependencies also needs to be installed.
TODO
-
implement reshape -
implement multi-operator with-loops -
fix hard-coded nametags in stdlib -
nbody-naive -
FlashAttention alg 1 -
FlashAttention modified algorithm -
VolCalib -
MG -
Adjust CI to deal with GASNet