version 1.3.3 at commit 523 revision 1
This release brings with a few innovations and improvements, that in particular target CUDA-based systems.
This is a weekly release, which is based on the latest upstream changes. It may be unstable.
- add latest changes to EMR optimisation
- add support for CUDA related primitives; reuse memory across memcpys
- major improvements to CUDA backend code generation
- generating managed memory is improved
- add synchronisation; optimal placement of syncs; multiple sync variants
-feedbackflag; get details on effect of optimisations on AST
-profile oflag; get static counting of IOPs and FLOPs
-profile cflag; count CUDA related operations using GPU counters
- fix for HWLOC (header related)
- fix for fix-point ad-hoc rewrite cycle (never reached fix-point)
you can also view these packages within the repository and access them via Git-LFS