This is top get my extra CUDA transfer mech stuff merged in - additionally we also now support generating CUDA modules. There are also some additions to the HWLOC stuff for pinning to cores nearest the IO bus etc.
Some work is still needed regarding headers, as most of the work was done before the sac.h split. There are several commits marked for this purpose (not anymore).
As far as I can tell, everything works. I did notice though that nvcc
seems to pull in libstdc++
stuff when building object files, meaning that we either:
- get
nvcc
to not pull in anything from thec++
libs, or - use
c++
for the linking stage iff we are compiling cuda code (this is what we do now)
TODO:
-
fix header messiness -
make sure hwloc extensions still work -
extend cmake to build cuda-based runtime libs -
fix/improve documentation -
house-keeping, things need tidying... -
tests!!!!