Teleconference 2014-01-21
Attendees: Ken Moreland (SNL), Rob Maynard (Kitware), Chris Sewell (LANL), Dave Pugmire (ORNL), Jeremy Meredith (ORNL), James Kress (OU)
CUDA Device Adapter This week Rob merged a branch that contains a CUDA device adapter, so VTK-m now officially supports many-core parallelism. Ken also contributed some low level changes to fix some host-device function issues. The implementation is similar to that of Dax with a few interesting additions.
The new VTK-m device adapter uses pinned memory for error handling. The scheduling mechanism uses a simple string buffer to allow algorithms running on the device to report errors. In Dax, after every kernel call that buffer was transferred back to the host to check for errors. The pinned memory allows VTK-m to check the error without necessarily transferring data or synchronizing the call. In the future this can be used for asynchronous schedule calls (as opposed to the currently blocked calls).
There is also some support for texture memory. It is a compiler flag that will uses texture memory for all input arrays of supported types. According to NVIDIA engineers (as reported by Rob), all linear input arrays should benefit from using texture memory since the only difference is in the caching mechanism. Supposedly the CUDA 7 compiler will automatically use texture memory for any arrays it deems as input-only. However, so far this is not giving the performance boost that Rob expected, so although the code is in the repository, it is disabled.
Data Models Jeremy, Ken, and Chris talked last week in person about the first steps in adding a data model to VTK-m. The current plan is to start with a simple connectivity data set (an array of point indices defining cells). From there we can build a simple map execution that follows cell connectivity to enable some introductory visualization algorithms and prove the concept.