- 22 Nov, 2020 8 commits
-
-
Sven-Bodo Scholz authored
-
Sven-Bodo Scholz authored
-
Sven-Bodo Scholz authored
-
Sven-Bodo Scholz authored
-
Sven-Bodo Scholz authored
-
Sven-Bodo Scholz authored
-
Sven-Bodo Scholz authored
-
Sven-Bodo Scholz authored
-
- 21 Nov, 2020 3 commits
-
-
Sven-Bodo Scholz authored
-
Sven-Bodo Scholz authored
-
Sven-Bodo Scholz authored
-
- 20 Nov, 2020 1 commit
-
-
Sven-Bodo Scholz authored
-
- 19 Nov, 2020 2 commits
-
-
Sven-Bodo Scholz authored
funs done: MakeArgNode MakeBasetypeArg
-
Sven-Bodo Scholz authored
This is important since the additions of Rouland do not allow for eexternals to be nested without screwing the code generation up...... We can tackle this later!
-
- 18 Nov, 2020 1 commit
-
-
Sven-Bodo Scholz authored
actually not much was missing here. However, the treatment of T_hidden received a massive conceptual overhaul. This was triggered by the observation that SACarg needs to be treated like a nested data structure..... When ironing that out in compile.c we can equally well make sure we add proper support for nesting throughout....
-
- 17 Nov, 2020 1 commit
-
-
Sven-Bodo Scholz authored
-
- 08 Nov, 2020 1 commit
-
-
Hans-Nikolai Viessmann authored
Fix mt fold sbs See merge request !129
-
- 07 Nov, 2020 2 commits
-
-
Sven-Bodo Scholz authored
-
Sven-Bodo Scholz authored
streamlined the explanation; added a section on the implementation and extracted the two Handle-functions as helpes...
-
- 06 Nov, 2020 3 commits
-
-
Sven-Bodo Scholz authored
and put quite some more detail into the main comment of MTSPMDF. This is now feature complete....
-
Sven-Bodo Scholz authored
started re-writing SPMDF lifting to include a dec_rc on the neutral element after the lifted function
-
Sven-Bodo Scholz authored
By setting the rc in the stack copy of the descriptor to 2 we are safe now. I also injected tracing info so that it easier to see what is going on just from the trace. As a consequence of this, the MT version now leaks one copy of the neutral element! This needs another fix. Finally, I added some comments in MTRMI to explain what exactly it does and to understand that traversal more quickly :-)
-
- 05 Nov, 2020 4 commits
-
-
Hans-Nikolai Viessmann authored
[hwloc] fix error on no hwloc.h header See merge request !127
-
Sven-Bodo Scholz authored
-
Hans-Nikolai Viessmann authored
The declarations in cpubind.h still need the hwloc.h header file, regardless if we are compiling with HWLOC support or not. This commit fixes this.
-
Sven-Bodo Scholz authored
-
- 04 Nov, 2020 1 commit
-
-
Sven-Bodo Scholz authored
Hotfix for CUDA profiling See merge request !126
-
- 03 Nov, 2020 6 commits
-
-
Hans-Nikolai Viessmann authored
-
Sven-Bodo Scholz authored
Fix cuda mech (for cudaManaged) memcpy ICMs See merge request !109
-
Hans-Nikolai Viessmann authored
-
Hans-Nikolai Viessmann authored
This traversal was replaced by EMR-related traversals.
-
Hans-Nikolai Viessmann authored
-
Hans-Nikolai Viessmann authored
-
- 02 Nov, 2020 7 commits
-
-
Hans-Nikolai Viessmann authored
-
Hans-Nikolai Viessmann authored
and add some better documentation
-
Hans-Nikolai Viessmann authored
We now also add timers (using GPU timer) to measure the time for certain events on the GPU (kernel launches, memcpys, allocs, etc.).
-
Hans-Nikolai Viessmann authored
When moving to an ad-hoc macro cyclical traversal mechanism, the latest counter value(s) were never stored, meaning that we never cycled to a fix-point. This caused several transfers to be left in place which could otherwise have been elided. This bug also affected the algebraic wlfi traversal.
-
Hans-Nikolai Viessmann authored
on some systems (change in linux kernel maybe?), filter over HWLOC_OBJ_OS_DEVICE objects leads to seqfault. For HWLOC we do not rely on any _system_ devices so leaving the filtering to NONE should be fine. Also updated .gitignore
-
Hans-Nikolai Viessmann authored
-
Hans-Nikolai Viessmann authored
-