When applying the EMR loop optimisation, which lifts allocations out of loop functions, the CUDA backend previously would cause
cudaFree calls to be made for these lifted allocations - in effect negating the optimisation.
This MR includes a new traversal for the CUDA backend call the Minimize EMR Transfers (MEMRT) optimisation which finds functions that have had allocations lifted out (via EMRL), and lifts out
host2device primitives which reference EMRL lifted variables. The effect is that we only perform one allocation on the device per lifted allocation, and perform no memory transfers within the loop. The MEMRT traversal is run after all other CUDA transfer minimization (see MTRAN -