This MR implements with-loop modulo partitioning, which is a technique best explained in the file wl_modulo_partitioning.c
that is part of this MR.
The short version: It removes modulo calls in with-loops at the cost of duplicating partitions. This makes modulo a first-class citizen in SaC, allowing its expressiveness to be used without slowing down the runtime.
Example transformation:
grid = iota(10);
with {
([0] <= iv < [10]): grid[_aplmod_SxS_ (_add_SxS_ (iv, -2), 10)];
}: genarray ([10], 0);
Becomes
grid = iota(10);
with {
([0] <= iv < [ 2]): grid[_add_SxS_ (iv, 8)];
([2] <= iv < [10]): grid[_add_SxS_ (iv, -2)];
}: genarray([10], 0);
The optimization doesn't require complete compile-time information.
In particular, modulo calls can be eliminated even when the function below isn't inlined.
int[*] rotate(int[+] rotations, int[*] grid)
{
return {iv -> grid[mod(iv - rotations, shape(grid))] | iv < shape(grid)};
}
Together with this optimization, this oneliner fully replaces the three functions that are currently dedicated to rotate
in the stdlib without sacrificing performance.