Polymorphic ArrayHandle

From VTKM
Jump to navigation Jump to search

A major issue with the current version of VTK-m is that template programming madness is causing major inconveniences with the compilation. First, compiling VTK-m code takes a looooooong time. Second, even the simplest programs generate remarkably large executables.

The most direct cause of both these issues is the DynamicArrayHandle (along with its sibling DynamicCellSet) and the dynamic cast and call that happens within the dispatcher's Invoke.

The basic idea of the DynamicArrayHandle is that you can hold an ArrayHandle without knowing its type and storage at compile time. This allows you to, for example, store fields in a DataSet without carrying around a list of template arguments specifying each field array type.

In the current design of VTK-m all types have to be known when you get to the execution environment. That means when using one of these dynamic storage classes when executing a worklet, the code must figure out what type of arrays are being used by trying every combination of every possible type. These combinations grow very quickly.

Take for example the execution of everyone's favorite, Marching Cubes. At a minimum, this algorithm requires 3 arguments: a cell set, a field to contour, and the point coordinates. If just considering the "default" types that VTK-m expects, there are 4 common types of cell sets (2D structured, 3D structured, explicit, and explicit single type). The field is probably one of 2 types (Float32 or Float64). The point coordinates also have 2 possible types (Vec 3 of Float32 or Float64) but additionally 5 (!) storage mechanisms (standard array, uniform coordinates, 2 types of composite vectors, and cartesian product). This is a total of 4 x 2 x 2 x 5 = 80 possible combinations.

So, yes, for a simple worklet operating on a cell set with one field and point coordinates, the template mechanism creates 80 versions of the worklet functions that must be compiled. Furthermore, the templating to create all these versions is quite slow to resolve. Add to that that several compilers we use for VTK-m require multiple passes for different architectures (e.g. CUDA) or optimizations.

How polymorphic array handles would work

The proposed solution is to introduce a polymorphic array handle where values can be retrieved without knowing how they are stored. The implementation starts with an array portal with virtual methods.

ArrayHandlePolymorphic is a mouthful. Perhaps it should be called ArrayHandleVirtual. --Kmorel (talk) 11:02, 14 June 2016 (EDT)
template<typename T>
class ArrayPortalPolymorphic
{
public:
  virtual vtkm::Id GetNumberOfValues() const = 0;
  virtual T Get(vtkm::Id index) const = 0;
  virtual void Set(vtkm::Id index, const T &value) = 0;

  virtual ~ArrayPortalPolymorphic() {  }
};

A templated subclass pulls values out of the source array portal where the entire type is known.

template<typename T, typename SrcArrayPortalType>
class ArrayPortalPolymorphicImpl : public ArrayPortalPolymorphic<T>
{
public:
  ArrayPortalPolymorphicImpl(const SrcArrayPortalType &portal)
    : SourceArrayPortal(portal) {  }

  vtkm::Id GetNumberOfValues() const
  {
    return this->SourceArrayPortal.GetNumberOfValues();
  }

  T Get(vtkm::Id index) const
  {
    return T(this->SourceArrayPortal.Get(index));
  }

  void Set(vtkm::Id index, const T &value)
  {
    this->SourceArrayPortal.Set(index, value);
  }

private:
  SrcArrayPortalType SourceArrayPortal;
};
The actual implementation should actually have a wrapper class that holds a pointer to the actual polymorphic class. This is to make sure that the virtual methods point to the right class. --Kmorel (talk) 10:12, 9 June 2016 (EDT)

Once this array portal is designed, an array handle can be built around it that takes an array of any type and wraps it in this one unified type. This should be a straightforward implementation of a derived storage, which is document in the User's Guide draft.

The difference between dynamic array handles and polymorphic array handles

Although they are both designed to hide the type at compile time, the existing dynamic array handles and the new polymorphic array handles resolve the type very differently. To access data in the dynamic array handle, you have to cast it back to an array handle of the same type. That means, to access the data you have to try every storage possibility and generate separate code paths to each. If you are accessing data from many dynamic array handles, the code paths explode combinatorially.

In contrast, the polymorphic array handles defer everything to run time. So there is exactly one code path generated and a function pointer jumps the execution to the right code. This is basically how the zero copy VTK data arrays work.

Changes to dynamic array handle

Once the polymorphic array handle is available, the dynamic array handle should be changed to take advantage of it. It needs one or more methods that will return a polymorphic array handle regardless of the underlying type. This could then be used in place of the template mechanisms to get array handles for the worklet.

Pros and Cons

Pros:

  • Much faster compilation
  • Much smaller executables and libraries
  • More potential for precompiling into libraries
  • Do not need to guess types in type lists

Cons:

  • Memory fetches require virtual method calls
    • Cannot be inlined
    • Might interfere with vectorization

Open Issues

What should the exports be?

VTK-m is currently a header-only template library that requires everything to be declared. As such, the "export" macros attempt to inline everything. What is the right "export" for these virtual methods?

Read only array portals

Array portals that are not meant to be read might not implement a Set method. When used in templates this is OK because the Set method simply should not be used in read-only contexts. But ArrayPortalPolymorphic does not know what context it is in, so always uses the source's Set method whether it is needed or not.

Either we will have to be more vigilant about implementing Set in all array portals or have a special polymorphic version for read only.

Implementing automatic casting

The polymorphic array handle (and portal) can effectively hide the array storage, but it cannot hide the type of value in the array because it is part of the Get/Set interface.

In our experience with writing worklets, supporting different types (such as Float32 vs Float64 for fields is more of a hassle than a benefit. Filters will probably want to choose one type and cast everything to that base type. But it can be tricky to set up the polymorphic array handle to support any type to cast to. All possible casts have to be implemented in advanced.

Managing virtual objects in CUDA

As I understand, CUDA hardware now supports calling virtual methods, so that should not be an issue. But to get them to work, you need to create or allocate the object somewhere and then get a pointer or reference to it. That could complicate the array transfer because you have to build the virtual array portal on the host so that it works on the device. I expect there must be a way to do this, but it might require a special implementation for CUDA devices.

What about cell sets?

This implementation does not help with dynamic cell set. In the end, we might just have the filter iterate over all possibilities. It might be the case that each filter has special paths for the different cell sets.

Optimization for voxels

Accessing data through virtual methods could slow down execution, but likely not by much. However, one place where it will matter is in the specialization of some functions for "voxel" type elements (hexahedral cells with axis aligned point coordinates). Things like interpolation is significantly faster and world-to-parametric coordinate computation. We cannot do this specialization if all arrays are hidden behind polymeric classes. We need a mechanism to reintroduce this specialization.

Alternate Designes

More flexible basic storage

One alternative to completely hiding array handles behind virtual method calls would be to make the basic storage be more flexible in its data layout. For example, EAVL managed arrays by giving each one a stride (for interleaving arrays). It also had some form of division and modulus to support cartesian products and the idea of a "logical array" to support uniform point coordinates.

Although this could potentially cover all the major use cases for field arrays in filters, it is a bit more limiting in its solution. Perhaps a bigger issue is that it introduces extra index arithmetic to each array lookup. There is a good chance that would slow things down more than calling virtual methods. Also, it is unclear how casting would work. What happens when one field is floats and another is double? Do we need to iterate every possible combination? That would be not much better than what we have now.

More Implementation Details

This section is just me running through some of the implementation details. I've been thinking about how this fits together, but we are not ready to implement it yet, so I am capturing my thoughts here. You probably don't have to read this unless you are planning on implementing something. --Kmorel (talk) 11:02, 14 June 2016 (EDT)

The ArrayPortalPolymorphic actually will have to be composed of two different classes, the interface class and an internal implementation class. The implementation class is the truly polymorphic class. This separation can achieve two things. First, it allows the management of the pointer/reference of the polymorphic class to be automatically managed. Second, we can make the implementation of the internal class not rely on the vector length by using arrays instead. This is important so that classes like DynamicArrayHandle can create a finite number of implementations when the array type is not known.

So the internal array portal class would look something like this.

template<typename ComponentType>
class ArrayPortalPolymorphicImplBase
{
public:
  virtual vtkm::IdComponent GetNumberOfComponents() const = 0;

  virtual void Get(vtkm::Id index, ComponentType *data) const = 0;
  virtual void Set(vtkm::Id index, const ComponentType *data) = 0;

  virtual ~ArrayPortalPolymorphicImplBase() {  }
};

template<typename PortalType>
class ArrayPortalPolymorphicImplScalar
  : public ArrayPortalPolymorphicImplBase<typename PortalType::ValueType>
{
public:
  ArrayPortalPolymorphicImplScalar(const PortalType &portal)
    : Portal(portal) {  }

  vtkm::IdComponent GetNumberOfComponents() const { return 1; }

  virtual void Get(vtkm::Id index, ComponentType *data) const
  {
    *data = this->Portal.Get(index);
  }

  virtual void Set(vtkm::Id index, const ComponentType *data)
  {
    this->Portal.Set(index, *data);
  }

private:
  PortalType Portal;
};

template<typename PortalType>
class ArrayPortalPolymorphicImplVec
  : public ArrayPortalPolymorphicImplBase<typename PortalType::ValueType::ComponentType>
{
  typedef typename PortalType::ValueType ValueType;
  typedef typename ValueType::ComponentType ComponentType;

public:
  ArrayPortalPolymorphicImplVec(const PortalType &portal)
    : Portal(portal) {  }

  vtkm::IdComponent GetNumberOfComponents() const 
  {
    return ValueType::NUM_COMPONENTS; 
  }

  virtual void Get(vtkm::Id index, ComponentType *data) const
  {
    ValueType value = this->Portal.Get(index);
    for (vtkm::IdComponent componentIndex = 0;
         componentIndex < ValueType_NUM_COMPONENTS;
         componentIndex++)
    {
      data[componentIndex] = value[componentIndex];
    }
  }

  virtual void Set(vtkm::Id index, const ComponentType *data)
  {
    this->Portal.Set(index, ValueType(data));
  }

private:
  PortalType Portal;
};

SAND 2016-5559 O