Recently EMRL was refactored to 'correctly' maintain SSA representation but I forgot to update the mem:alloc code that specifically handles EMRL in the CUDA backend, meaning we were introduceing an additional alloc() for the same LHS leading to all sorts of runtime errors.