There were two distinct problems.
- the return values of mows were lifted in the wrong order in cuknl.
- the initialisation of fold-wls inside of CUDA kernels went wrong
Both of these issues are fixed here and one test for each has been added.
There were two distinct problems.
Both of these issues are fixed here and one test for each has been added.