1. 22 Nov, 2020 2 commits
  2. 21 Nov, 2020 3 commits
  3. 20 Nov, 2020 1 commit
  4. 19 Nov, 2020 2 commits
  5. 18 Nov, 2020 1 commit
    • Sven-Bodo Scholz's avatar
      added full support for nested types in the typechecker · 9eeedd67
      Sven-Bodo Scholz authored
      actually not much was missing here. However, the treatment of T_hidden received a massive conceptual overhaul.
      This was triggered by the observation that SACarg needs to be treated like a nested data structure.....
      When ironing that out in compile.c we can equally well make sure we add proper support for nesting throughout....
      9eeedd67
  6. 17 Nov, 2020 1 commit
  7. 08 Nov, 2020 1 commit
  8. 07 Nov, 2020 2 commits
  9. 06 Nov, 2020 3 commits
  10. 05 Nov, 2020 4 commits
  11. 04 Nov, 2020 1 commit
  12. 03 Nov, 2020 6 commits
  13. 02 Nov, 2020 9 commits
    • Hans-Nikolai Viessmann's avatar
      [debug] add more debug prints · 4cbc58e9
      Hans-Nikolai Viessmann authored
      4cbc58e9
    • Hans-Nikolai Viessmann's avatar
      [support] extended GDB functions · f89cd596
      Hans-Nikolai Viessmann authored
      and add some better documentation
      f89cd596
    • Hans-Nikolai Viessmann's avatar
      [profiler] extend cuda profiling · 9b004e92
      Hans-Nikolai Viessmann authored
      We now also add timers (using GPU timer) to measure the time for
      certain events on the GPU (kernel launches, memcpys, allocs, etc.).
      9b004e92
    • Hans-Nikolai Viessmann's avatar
      [cuda] fix minimise transfer bug · 83555401
      Hans-Nikolai Viessmann authored
      When moving to an ad-hoc macro cyclical traversal mechanism, the latest
      counter value(s) were never stored, meaning that we never cycled to a
      fix-point. This caused several transfers to be left in place which could
      otherwise have been elided.
      
      This bug also affected the algebraic wlfi traversal.
      83555401
    • Hans-Nikolai Viessmann's avatar
      fix hwloc filter issue · d6dbb28f
      Hans-Nikolai Viessmann authored
      on some systems (change in linux kernel maybe?), filter over
      HWLOC_OBJ_OS_DEVICE objects leads to seqfault. For HWLOC we do not rely
      on any _system_ devices so leaving the filtering to NONE should be fine.
      
      Also updated .gitignore
      d6dbb28f
    • Hans-Nikolai Viessmann's avatar
      d455f2e9
    • Hans-Nikolai Viessmann's avatar
      d133c3aa
    • Hans-Nikolai Viessmann's avatar
      [profile] improve elapsed time computing · a00cfec2
      Hans-Nikolai Viessmann authored
      We also add the count of kernel calls (might be useful).
      a00cfec2
    • Hans-Nikolai Viessmann's avatar
      [profile] add cuda timer · dffc68e2
      Hans-Nikolai Viessmann authored
      We can now measure the runtime (wall-clock time) of CUDA kernels using
      sac2c's inbuilt profiling system. We use CUDA events systems (GPU/device
      counters) to make the measurements.
      
      Performing the measurement itself is fairly cheap, and has little effect
      on runtime of main() function. However, we do perform some costly
      summing up within libsac/runtime libraries after we've reached the end
      of the program, which can take up several whole seconds.
      
      At the moment, this CUDA timer feature only provides a total time for
      the entire program run, not on a per-function basis.
      
      NOTE: we store the start/stop values within a linked-list. We do this as
            we don't statically know how many times the kernel function will be
            called (in a conditional loop for example).  As such, if we have a
            programming launching thousands of kernels, this could consume a
            lot of memory. Clearly we could for instance at the point of the
            kernel launch make our measurement, and just store the elapsed
            time. This would require a sync on the device, which could destroy
            any performance gains from using asynchronise or managed backend.
      dffc68e2
  14. 23 Oct, 2020 4 commits