version 1.3.3 at commit 523 revision 1
About
This release brings with a few innovations and improvements, that in particular target CUDA-based systems.
This is a weekly release, which is based on the latest upstream changes. It may be unstable.
Changelog
- add latest changes to EMR optimisation
- add support for CUDA related primitives; reuse memory across memcpys
- major improvements to CUDA backend code generation
- generating managed memory is improved
- add synchronisation; optimal placement of syncs; multiple sync variants
- add
-feedback
flag; get details on effect of optimisations on AST - add
-profile o
flag; get static counting of IOPs and FLOPs - add
-profile c
flag; count CUDA related operations using GPU counters - fix for HWLOC (header related)
- fix for fix-point ad-hoc rewrite cycle (never reached fix-point)
1
Package linksNotes
-
you can also view these packages within the repository and access them via Git-LFS
↩