once more non-scalar fold on mt ... weird behaviour on test for issue 1165!
The test-mt-issue-1165.sac
breaks on arch linux but works fine on other machines!
Here the code:
noinline
int[.] ++( int[.] arr_a, int[.] arr_b)
{
return( arr_a);
}
int main () {
e2 = with {
([0] <= [iv] < [10]) : [42,42,42];
} : fold (++, [1,2,3]);
return _sel_VxA_( [2], e2);
}
Symptom: on both systems, we allocate two arrays: SACp_emal_2688__flat_13
and SACp_emal_2689__flat_4
when compiling for seq execution, we get the following memory trace:
TR-> ND_ALLOC__DESC( SACp_emal_2689__flat_4, 1) at addr: 0x6000003481c0
TR-> ND_ALLOC__DATA( SACp_emal_2689__flat_4) at addr: 0x60000144c030
TR-> 3 array elements allocated, total now: 3.
TR-> ND_ALLOC__DESC( SACp_emal_2688__flat_13, 1) at addr: 0x600000348240
TR-> ND_ALLOC__DATA( SACp_emal_2688__flat_13) at addr: 0x60000144c040
TR-> 3 array elements allocated, total now: 6.
TR-> ND_FREE__DATA( SACp_emal_2688__flat_13, ) at addr: 0x60000144c040
TR-> 3 array elements deallocated, total now: 3.
TR-> ND_FREE__DESC( SACp_emal_2688__flat_13) at addr: 0x600000348240
TR-> ND_FREE__DATA( SACl_e2, ) at addr: 0x60000144c030
TR-> 3 array elements deallocated, total now: 0.
TR-> ND_FREE__DESC( SACl_e2) at addr: 0x6000003481c0
which makes perfect sense; we allocate both, pass on the neutral element (flat_4) all the time and finally throw away the [42,42,42]
(flat_13).
As we alias the accused to the result, we wait for the selection into e2
which eventually frees flat_4 as well.
If, however, we run the trace in mt_pth
, we see a different behaviour:
TR: 0:-> ND_ALLOC__DESC( SACp_emal_2689__flat_4, 1) at addr: 0x600003b2c240
TR: 0:-> ND_ALLOC__DATA( SACp_emal_2689__flat_4) at addr: 0x600002c28090
TR: 0:-> 3 array elements allocated, total now: 3.
TR: 0:-> ND_ALLOC__DESC( SACp_emal_2688__flat_13, 1) at addr: 0x600003b2c280
TR: 0:-> ND_ALLOC__DATA( SACp_emal_2688__flat_13) at addr: 0x600002c280a0
TR: 0:-> 3 array elements allocated, total now: 6.
TR: 0:-> shadowing desc of SACp_flat_13 at addr: 0x600003b2c280 in MT_RECEIVE_PARAM_in
TR: 0:-> allocating stack desc for SACp_flat_13 at addr: 0x7ff7bde1f590 in MT_RECEIVE_PARAM_in
TR: 0:-> shadowing desc of SACp_flat_4 at addr: 0x600003b2c240 in MT_RECEIVE_PARAM_in
TR: 0:-> allocating stack desc for SACp_flat_4 at addr: 0x7ff7bde1f550 in MT_RECEIVE_PARAM_in
TR: 1:-> shadowing desc of SACp_flat_13 at addr: 0x600003b2c280 in MT_RECEIVE_PARAM_in
TR: 1:-> allocating stack desc for SACp_flat_13 at addr: 0x7000055c1e90 in MT_RECEIVE_PARAM_in
TR: 1:-> shadowing desc of SACp_flat_4 at addr: 0x600003b2c240 in MT_RECEIVE_PARAM_in
TR: 1:-> allocating stack desc for SACp_flat_4 at addr: 0x7000055c1e50 in MT_RECEIVE_PARAM_in
TR: 2:-> shadowing desc of SACp_flat_13 at addr: 0x600003b2c280 in MT_RECEIVE_PARAM_in
TR: 2:-> allocating stack desc for SACp_flat_13 at addr: 0x700005644e90 in MT_RECEIVE_PARAM_in
TR: 2:-> shadowing desc of SACp_flat_4 at addr: 0x600003b2c240 in MT_RECEIVE_PARAM_in
TR: 2:-> allocating stack desc for SACp_flat_4 at addr: 0x700005644e50 in MT_RECEIVE_PARAM_in
TR: 3:-> shadowing desc of SACp_flat_13 at addr: 0x600003b2c280 in MT_RECEIVE_PARAM_in
TR: 3:-> allocating stack desc for SACp_flat_13 at addr: 0x7000056c7e90 in MT_RECEIVE_PARAM_in
TR: 3:-> shadowing desc of SACp_flat_4 at addr: 0x600003b2c240 in MT_RECEIVE_PARAM_in
TR: 3:-> allocating stack desc for SACp_flat_4 at addr: 0x7000056c7e50 in MT_RECEIVE_PARAM_in
TR: 0:-> ND_FREE__DATA( SACp_emal_2689__flat_4, ) at addr: 0x600002c28090
TR: 0:-> 3 array elements deallocated, total now: 3.
TR: 0:-> ND_FREE__DESC( SACp_emal_2689__flat_4) at addr: 0x600003b2c240
TR: 0:-> ND_FREE__DATA( SACp_emal_2688__flat_13, ) at addr: 0x600002c280a0
TR: 0:-> 3 array elements deallocated, total now: 0.
TR: 0:-> ND_FREE__DESC( SACp_emal_2688__flat_13) at addr: 0x600003b2c280
Same allocation, then shadowing of flat_4 and flat_13 when entering the WL; so far so good.
After the WL, we now free flat_4 and flat_13 but we do not free e2
. THAT is weird!
Looking at -bmem
, for both targets we see:
_emal_2690__flat_6 = _alloc_( 1, 0, [:int]);
_flat_6 = _fill_( 2, _emal_2690__flat_6);
_emal_2689__flat_4 = _alloc_( 1, 1, [ 3 ]);
_flat_4 = _fill_( [ 1, 2, 3 ], _emal_2689__flat_4);
_emal_2688__flat_13 = _alloc_( 1, 1, [ 3 ]);
_flat_13 = _fill_( [ 42, 42, 42 ], _emal_2688__flat_13);
iv = _alloc_( 1, 0, [:int]);
e2 = with2 (_flat_12=[iv])
/********** operators: **********/
op_0 =
{
_inc_rc_( _flat_13, 1);
_ea_68_e2 = _accu_( _flat_12, _flat_4);
_ea_69__flat_13 = ( _ea_68_e2 _MAIN::++_flat_13) ;
_ufiv_2664__ea_69__flat_13 = _unshare_( _ea_69__flat_13, _flat_12);
} : _ufiv_2664__ea_69__flat_13 ;
/********** segment 0: **********
* index domain: [ 0 ] -> [ 10 ]
* bv: [ 1 ], [ 1 ], [ 1 ]
* ubv: [ 1 ]
* sv: [ 1 ]
* homsv: [ 1 ]
*/
(0 -> 10), step0[0] 1
(0 --> 1): op_0
/********** conexpr: **********/
fold( _MAIN::++(), _flat_4);
_free_( iv);
_free_( _flat_13);
_emal_2687__flat_17 = _alloc_( 1, 0, [:int]);
_flat_17 = _fill_( _idx_sel_( _flat_6, e2), _emal_2687__flat_17);
_dec_rc_( e2, 1);
_free_( _flat_6);
return( _flat_17);
Note the _dec_rc_( e2, 1);
.....!