Providing Non 1-to-1 Mapping in Worklets
There are currently two types of worklets in VTK-m: a field map that runs on operation on each element of an array and a topology map that can follow connections of cells. One thing that is not directly supported is the ability to create an output array that does not match 1-to-1 the values of the input domain. This comes up many times. For example, a tetrahedralization creates at least 5 tetrahedra for a hexahedron. Marching Cubes generates between 0 and 6 triangles per hexahedron.
One straightforward implementation is to make a new worklet type that allows scatters from the input domain to the output domain. However, this can get combinatorially complex. We would either have to pick a single worklet type or re-implement the feature for all potential worklet types.
Instead, this design proposal adds a feature to WorkletBase that allows for any type of scatter operation as it is scheduling its output.
Contents
The Scatter Management Class
First, we introduce a new set of classes to VTK-m to manage the relationship from the input domain to the output domain. These classes are called Scatter* and provide a common interface to the elements required to manage how values from the input domain are mapped to the output domain and vice versa.
- Note: I'm not enamored with the name Scatter, but it is the best I could think of. I originally called it Gather, but then I realized that each output can only access one input whereas an input can go to any number of outputs. That is a scatter. Other alternatives are Map (too easy to confuse with the map name of the worklets), Remap (marginally better, but not much), or InputToOutput (not bad but perhaps confusing with the from/to element of the topology map worklet). --Kmorel (talk) 19:04, 8 November 2015 (EST)
A Scatter class essentially provides two arrays. The first array is an output to input map that for every element in the output retrieves the corresponding element in the input. (For topology maps, both output and input elements are the 'to' topology elements.) The second array is a visit index that for two output elements that point to the same input element, provides a unique index for that pairing. Here is an example of what a Scatter class' interface would look like.
class ScatterIdentity
{
public:
typedef vtkm::cont::ArrayHandleCounting<vtkm::Id> OutputToInputMapType;
OutputToInputMapType GetOutputToInputMap(vtkm::Id numInputElements) const;
typedef vtkm::cont::ArrayHandleConstant<vtkm::IdComponent> VisitArrayType;
VisitArrayType GetVisitArray(vtkm::Id numInputElements) const;
vtkm::Id GetNumberOfOutputValue(vtkm::Id numInputElements) const;
};
The Scatter classes abstract away everything required to create indices into the two arrays the dispatcher needs to execute the worklet. Initially there will be three classes created: ScatterIdentity, ScatterUniform, and ScatterCount. We can create other types of scatter classes as the need arises.
Identity Scatter
ScatterIdentity produces a 1-to-1 mapping from the input domain to the output domain. This replicates the current behavior of the worklets.
The implementation of ScatterIdentity is simple. The output to input map simply returns the index. This can be done with an ArrayHandleCounting, but we may want to make a second array handle that removes the addition of the "staring value" (which is always zero). The visit array is simply an ArrayHandleConstant with a value of 1.
Uniform Scatter
ScatterUniform produces a 1-to-N mapping from the input domain to the output domain where N is the same for all inputs. The N is specified as a constructor argument. If N is set to 1, then ScatterUniform behaves identically to ScatterIdentity but slightly less efficiently.
The implementation of ScatterUniform is straightforward. The output to input map returns the index times N. This can be done with a class like ArrayHandleCounting that supports a step. (Perhaps we will change ArrayHandleCounting to support that.) The visit array is an ArrayHandleConstant with a value of N.
Counting Scatter
ScatterCount produces a 1-to-N mapping from the input domain to the output domain where N can vary for each input. The constructor takes an ArrayHandle containing for each input the count of outputs being created.
The implementation of ScatterCount can follow the implementation we have basically followed for isosurface algorithms. The basic implementation is as follows. A prefix sum converts the count array to a map from input to output. A parallel find operation can then flip this from output to input. Finally, a segmented prefix sum can generate the visit index.
Using Scatters in Dispatcher Invoke
The DispatcherBase class needs to have the Scatter class added as one of its template arguments. Likewise, an instance of the scatter class is passed as a constructor argument. Classes like DispatcherMapField and DispatcherMapTopology also will have a scatter template argument and constructor argument. The template argument defaults to ScatterIdentity so the behavior will be the same as currently if the scatter class is not specified.
The basic operation of all dispatchers is that DispatcherBase::Invoke calls the derived dispatcher's DoInvoke method. That method in turn calls DispatcherBase::BasicInvoke to launch the job in the execution environment. Before it does so, BasicInvoke will add the output to input map and visit index arrays to the arguments.
A class called WorkletInvokeFunctor is responsible for managing the execution. It calls a Fetch for each one of the execution signature arguments and invokes the function. To fully implement the scattering from input to output, we need to change the interface to Fetch slightly.
When the WorkletInvokeFunctor is invoked, it is given an invocation index that is synonymous with the output index. Most data will be fetched from the input, so we need the input index. We can get the input index from the output to input map. Both the input and output indices should be passed to the Fetch. Most fetch operations will only need the input map, but some special ones like WorkIndex and VisitIndex will need the output index.
Open Questions
Multidimensional Scaling on Structured Cells?
What about 3D scheduling? It is unclear how to handle 2D and 3D indexing. If the scheduling is not 1-to-1, then the scheduling is no longer on a regular grid. Perhaps there should be a third overloaded method that gets the size of the output domain from the size of the input domain. This method can be overloaded for 2D and 3D domains. The identity scatter can return the 2D or 3D domain. The other domains probably do not make much sense as 2D or 3D, so they can convert the size to 1D.
Over-decomposition of the Data?
This approach of having a separate instance on each output value has been shown to work well, particularly for GPUs, as it minimizes the amount of branching that occurs. However, it can also lead to replication of instructions as instances with the same input but different visit indices will likely do much of the same computation.