reallocation within outer loop (situation with livermore loop 8)
The following example reallocation-example.sac, performs some array scalar modification within a double loop nesting. The outer loop forces a repetition of the array modification which is happening in the inner loop. When we compile with flags -DBODY -bmem
, we can see that for the outermost loop:
int[101] _MAIN::_dup_587_main__Loop_4( int{1} y { ,NN } , int[101] u1 { ,NN } , int i { ,NN } )
{
...
_emlr_6056_y = _alloc_( 1, _dim_A_( y), _shape_A_( y));
_emlr_6057_y = _fill_( _copy_( y), _emlr_6056_y);
_emlr_6054_u1 = _alloc_( 1, _dim_A_( u1), _shape_A_( u1));
_emlr_6055_u1 = _fill_( _copy_( u1), _emlr_6054_u1);
_pinl_588_u1n__SSA0_1 = _MAIN::_dup_589_main__Loop_2( u1, _emlr_6055_u1, _emlr_6057_y) ;
...
}
on each iteration we copy u1
to _emlr_6055_u1
, meaning on each iteration we create an additional allocation. We see this situation within the Livermore Loop 8 C variant, where we have a triple nesting of loops. Because we are in essence doing a = op (a)
for the inner loop, we should be able to avoid the extra allocation by performing a buffer swap.
For the current example, this can be done manually be defining our inner loop operation as b = op (a); a = b;
, where b
is defined outside the outer loop. If we compile with -DBODY -DLIFT -bmem
, we get:
int[101] _MAIN::_dup_587_main__Loop_4( int{1} y { ,NN } , int[101] u1 { ,NN } , int[101] u1n { ,NN } , int i { ,NN } )
{
...
_emlr_6056_y = _alloc_( 1, _dim_A_( y), _shape_A_( y));
_emlr_6057_y = _fill_( _copy_( y), _emlr_6056_y);
_emlr_6054_u1n = u1n;
_emlr_6055_u1n = _fill_( _noop_( u1n), _emlr_6054_u1n);
_pinl_588_u1n__SSA0_1 = _MAIN::_dup_589_main__Loop_2( u1, _emlr_6055_u1n, _emlr_6057_y) ;
...
}
No we no longer perform an allocation/copy, but pass in the extra buffer and reuse it within the outer loop.
It would be nice to do this automatically within the compiler. The attached example also shows a similar case with the inner loop occurring within a separate function. Here though the manual trick from above does not work.