Currently in the distributed memory backend, a with-loop is either parallelised over all nodes in the cluster, or not at all. This merge requests makes this more fine grained to not parallelise, parallelise only within a node, or parallelise both within nodes and across nodes, for genarray and modarray. We do this by adding an option -mindistsize
that can be used in conjunction with -minmtsize
.
We do not do this for fold because distribution over node is treated as a property of the memory, not parallel execution. This makes it a pain in the ass to implement it for fold also. As this is not performance critical in any application I can think of, I am not going to implement this.