This mainly required 3 upgrades:
- allow mo-WLs to be marked CUDARIZABLE
- make sure all results from cudarised WLs are transferred out (device2host)
- adjust the CUDA kernel generation to properly deal with shared wlidxes if these are shared
This mainly required 3 upgrades: