This merge request makes two conceptual changes.
- The distmem target now depends on the mt_pth target instead of seq.
- Distributedness of memory is now a runtime attribute and no longer reflected in the type (used to be PDI/NDI)
As a result of 1, we can use the ST / MT / SPMD functions to remove the runtime execution mode SAC_DISTMEM_exec_mode_t SAC_DISTMEM_exec_mode;
We already stored the distributedness of an array in the descriptor (SAC_ND_A_DESC_IS_DIST
), we now overload the necessary functions with a version that checks this flag a runtime. There are no more PDI/NDI attributes in the type needed.
If we need a certain memory type, we enforce this by generating primitive functions (see documentation of distmemify.c
). The phase dist_alloc.c
ensures that almost always this function does not have to do anything. The only exception is memory that was returned from an external function, as we do not distribute the result (e.g. if we read in an array from stdin
, this is only done by node 0
, which then broadcasts the result).
The parallelisation of with-loops for the mt backend has been refactored slightly as outlined in schedule_design.txt
. It intersects the first schedule with SAC_wl_global_start0
and SAC_wl_global_stop0
. Now the only change the distmem backend has to make, is to set these values to ShrayStart
, ShrayStop
(or a facsimile for fold-loops).
Fold-loops fold their local index range, and then a primitive function _fold_nodes_(scalar)
generates the code for our version of an MPI_Allreduce
.