1. 16 Feb, 2020 2 commits
    • Hans-Nikolai Viessmann's avatar
      [cuda-hwloc] improve init function · 60419008
      Hans-Nikolai Viessmann authored
      We no longer error from within the init function, instead we return a
      bool. This way we achieve two things, one is that we fix a space leak in
      cases where we errored, and two we now have an effect unit test in place
      for this function.
    • Hans-Nikolai Viessmann's avatar
      [hwloc] add support for hwloc v2 · 13453e50
      Hans-Nikolai Viessmann authored
      Actually there isn't much we need to change, as much of our HWLOC code
      deals with binding to PUs/CPUs. The main changes in hwloc v2 mainly
      affect how to bind based on cache/memory locality, which we don't use
      here. Further changes involve a change in how topology is orgonised, for
      instance NUMA nodes are no longer treated as logical/real things,
      instead they are treated as _memory children_ instead. The immediate
      affect is that taversing the topology is different. Further
      orgonisational changes to the topology exist, but really are handled by
      the API such that we need only perform minor changes to the existing
  2. 15 Feb, 2020 1 commit
  3. 14 Feb, 2020 7 commits
  4. 13 Feb, 2020 5 commits
  5. 12 Dec, 2019 1 commit
  6. 06 Sep, 2019 2 commits
  7. 05 Sep, 2019 5 commits
    • Hans-Nikolai Viessmann's avatar
      Fix up usage text · d6eb0397
      Hans-Nikolai Viessmann authored
    • Hans-Nikolai Viessmann's avatar
      add -generic flag to sac2c · 8497c5ef
      Hans-Nikolai Viessmann authored
      We have had issues with the build system using mtune=native to compile
      the libsac2c and the runtime libraries, making it impossible to create
      distributable packages without it breaking due to ISA incompatibilities.
      Changing the build system to not use mtune=native is doable, but
      involves having to filter out the mtune=native flag at the right points
      in the build. Additionally, when using different C compilers (such as
      with NVCC) we need to account for this, which may require a different
      solution for each compiler. This is non-trivial to achieve as in some
      instances the compiler is not applicable to build sac2c, but only a
      runtime lib or two.
      This commit uses a different method, which encodes the mtune=native flag
      into sac2crc and adds a sac2c flag (`-generic`) to toggle the effect.
      The advantage here is that we can define different flags for different
      SBIs, without affecting the build system. Furthermore, the changes to
      the build system are minimal, which involves propagating the flags to
      sac2crc and change whether libsac2c is built with mtune=native or not.
    • Artem Shinkarov's avatar
      Properly porapagate BUILDGENERIC into runtime libraries. · cf03b0da
      Artem Shinkarov authored and Hans-Nikolai Viessmann's avatar Hans-Nikolai Viessmann committed
          Previously we didn't compile runtime libraries like libsac,
          sacprelude, etc. with -mtune=generic.  As a consequence, all our
          packages are (very likely) incompatible with cpus other than
          of the host system.
          This commit fixes it.
      Signed-off-by: Hans-Nikolai Viessmann's avatarHans-Nikolai Viessmann <hv15@hw.ac.uk>
    • Hans-Nikolai Viessmann's avatar
      Merge branch 'hans-develop' into 'develop' · 96b2c3c6
      Hans-Nikolai Viessmann authored
      remove -pedantic flag from cmake
      See merge request !110
    • Hans-Nikolai Viessmann's avatar
      disable pedantic warnings for the argcount header · 9dceb0e8
      Hans-Nikolai Viessmann authored
      We make use of a GCC system header marker to indicate that the
      headerfile is not C standard compliant, and thus disable the warnings
      relating to zero-argument variadic macros.
      This commit also fixes a small mistake in config.cmake where we added
      the no warning flag to the wrong var (oops).
  8. 23 Apr, 2019 1 commit
  9. 20 Apr, 2019 1 commit
  10. 19 Apr, 2019 2 commits
    • Hans-Nikolai Viessmann's avatar
      fix spurious CMake build problem · fd1eb007
      Hans-Nikolai Viessmann authored
      This is a known issue with running a CMake built Makefile system
      that eventually calls and External_Project. The various _steps_ in
      the External_Project are **not** considered individual jobs with
      interdependencies. This can therefor cause multiple instances of one
      _step_ to be run, leading to undefined behavior. The issue is
      documented in [1] and affects all versions of CMake > 3.10.
      The workaround for this is to explicitly set dependencies on the _steps_
      and remove DEPENDS from the External_Project.
      [1]: https://gitlab.kitware.com/cmake/cmake/issues/18663
    • Hans-Nikolai Viessmann's avatar
      Fix missing includes · b6931c58
      Hans-Nikolai Viessmann authored
  11. 18 Apr, 2019 3 commits
  12. 16 Apr, 2019 4 commits
  13. 09 Apr, 2019 1 commit
    • Hans-Nikolai Viessmann's avatar
      Fix incorrect traversal doflag check · 54490710
      Hans-Nikolai Viessmann authored
      Completely forgot that EMRL is one by default, therefore whenever we
      entered IWLMEM, we entered EMRTU, which in some very rare cases did
      cause problems. We now check that the EMRCI and EMRTU flag is on as well!
  14. 08 Apr, 2019 1 commit
    • Hans-Nikolai Viessmann's avatar
      refactoring of EMRTU and IWLMEM · e0ecc1fc
      Hans-Nikolai Viessmann authored
      Have moved out a function used in both traversals to the cuda_utils
      Also, I've removed the ISEMRLIFTED flag from EMRL/EMRTU, as it was
      preventing certain cases from being identified (such as cond functions).
  15. 04 Apr, 2019 3 commits
    • Hans-Nikolai Viessmann's avatar
    • Hans-Nikolai Viessmann's avatar
      add EMRTU optimistation · 0db3f254
      Hans-Nikolai Viessmann authored
      This optimisation works together with the CUDA IWLMEM optimisation to introduce
      CUDA memory transfers into the AST. We need this traversal when using the EMR
      optimisation as IWLMEM would otherwise lead to inoptimal code generation.
      IWLMEM, when encountering a lift ERC, would within the loop function create a
      host2device and cause a memory allocation and free on the CUDA device. The
      EMRTU (EMR Type Update) traversal identifies such cases, and correclty
      transforms the lifted ERCs into CUDA device_types *and* fixes the arguments of
      the loop function applications (initial and recursi
    • Hans-Nikolai Viessmann's avatar
      EMRL mark fundef has allocation lifts · c27c2728
      Hans-Nikolai Viessmann authored
      We will use this later for EMRTU (relating to CUDA IWLMEM
  16. 03 Apr, 2019 1 commit