Simplifying the Use of CellSets in DataSets 2

From VTKM
Jump to navigation Jump to search

The current DataSet class in VTK-m comprises a group of cell sets, a group of fields, and a group of coordinate systems. The management of fields and coordinate systems is pretty straightforward because they behave pretty similarly across different types of cell models. However, the cell set object is handled problematically because the type of this class depends on the type of data structure (for example structured vs. explicit cell connections). Currently, DataSet stores a pointer to an abstract CellSet superclass with some unknown specialization for the particular data structure.

This is currently causing two problems. The first problem is that memory management is not well established. The abstract CellSet object must be stored as a pointer. It is typically allocated outside of DataSet and then passed to that class. Who is responsible for deleting it and when? Currently this is being solved by wrapping the pointer in a boost smart_ptr, but we really want to keep boost out of the VTK-m API.

The second problem is more serious. Once the CellSet is stored in the DataSet, its type is essentially lost. So whenever you use a DataSet, you have to somehow magically know what the CellSet actually is and do a dynamic_cast to the appropriate concrete type. It's a silly system since if you have to know the type anyway, you may as well store it in a static type. If you don't know the type, you have to write a lot of code to try all types it might be.

Ultimately, data sets (and in particular, the cell set part) should be designed in one of two ways. Either the data set should be strongly typed such that any code using it knows specifically what the type should be, or the data set has a polymorphic data set with a means to dynamically schedule it (much like a dynamic array handle). The current implementation is sort of in the middle of these two approaches and does a good job at neither.

Our first design for Simplifying the Use of CellSets in DataSets proposed statically typing the DataSet class to the cell set type. However, this implementation is problematic. Instead, this design keeps the implementation essentially how it is but adds a DynamicCellSet class to manage the data.

Managing Dynamic Cell Sets

The dynamic cell sets will be managed by a new class named DynamicCellSet. This class will manage CellSet objects of unknown types in the same way that DynamicArrayHandle manages ArrayHandle objects of unknown types. This dynamic class serves two purposes: resource management and automatic casting.

Resource Management

DynamicCellSet provides resource management by behaving like a smart pointer. The object copies in a CellSet of a concrete type at creation and stores a reference to the newly created cell set object as the abstract superclass. DynamicCellSet maintains a reference count to the cell set pointer so that copies of the dynamic cell set can be created. All copies point to the same CellSet object and the CellSet is automatically deleted when all copies of DynamicCellSets leave scope. This is easily implemented with a boost::smart_ptr. Since smart_ptr is hidden within private members, it does not break VTK-m API requirements to not rely on boost. (That is, the implementation could be changed without breaking any other code.)

Sometimes you want a new instance of the cell set of the same type rather than a reference to the same object. Thus, DynamicCellSet will also have a NewInstance method that allocates a new CellSet object and returns a new DynamicCellSet containing it.

Automatic Casting

The point of using an abstract version of CellSet is to manage when you don't know (or are not sure about) the type for CellSet. But eventually the abstract object must be cast to a concrete type to invoke an operation on that. Ultimately this means trying some list of possible types.

The DynamicCellSet will manage this casting on itself so that this feature is encapsulated and does not have to be replicated throughout the code. This casting is implemented with a CastAndCall method (similar to the functionality of the method of the same name in DynamicArrayHandle). This method takes a functor object and calls that object with a valid cast to a concrete CellSet type.

The CastAndCall method tries a default set of cell types (which is structured cells of dimensionality of 2 or 3 or explicit cells with all storages of the default type). This list to try can be modified in two ways. The first way is to define VTKM_DEFAULT_CELL_SET_LIST_TAG before including any VTK-m headers. This will globally redefine the default cell sets. The second way is to call ResetTypeList on the DynamicCellSet. This will return a DynamicCellSet-like object that shares the CellSet reference and behaves the same except that CastAndCall tries the new list of CellSet types provided.

The header for dynamic cell sets will also define a specialization for vtkm::cont::internal::DynamicTransformTraits so that whenever a DynamicCellSet is passed to a dispatcher Invoke method it will automatically use this cast and call feature to statically cast it.

Note that if DynamicCellSet is passed to the Invoke method, then it will not be possible to call methods like GetNodeToCellConnectivity on it. At the very least the return type would depend on the cell set type. Instead, this method will be called internally in the dispatcher.

Cleaning up the CellSet Classes

This is a somewhat different topic, but the class structure underneath the cell set classes is a bit inconsistent, which makes it a bit confusing. Here is a breakdown of the classes and what I think I understand about them.

  • CellSet A CellSet holds all the information about the topology. It contains all the information about how topological elements are connected (nodes to edges to faces to cells and all possible combinations). Some of these connections are directly accessible. Others might need some processing. For example, you might need to build the links from cells to nodes in an explicit cell set.
  • Connectivity A Connectivity class describes the connections from one topological element to another. For examples, the connectivity from nodes to cells describes how cells are connected and fields are interpolated. The connectivity from cells to nodes describes all the cells incident on a node and allows cell to point operations.
  • Structure There is really only a Structure class for structured grids and as far as I can tell, it is a convenience class to define methods for different dimensionality and different connectivity directions.

For sake of clarity and convention I suggest the following:

  • The name of the connectivity classes be ConnectivityExplicit and ConnectivityRegular rather than ExplicitConnectivity and RegularConnectivity. This follows the VTK-m convention.
  • The connectivity classes should all be in the vtkm::cont package. Functionality that is identical in both control and execution environment (such as what you find in the current RegularConnectivity and RegularStructure classes) should be in the vtkm::internal package (unless there is a reason to use the functionality outside of the cell set support objects).
    vtkm::internal sounds good. It was put in vtkm mostly because of an unfamiliarity with the directory structure and their uses, etc. --Dpugmire (talk) 09:01, 24 July 2015 (EDT)
  • For each connectivity type in the control environment, there is a matching one in the execution environment (vtkm::exec). This will be used internally by the worklet to get connection information.
    Should the control and execution classes be the same name or different? Which way is more clear? --Kmorel (talk) 10:56, 20 July 2015 (EDT)
    I think the same name should be ok. There is the distinction of vtkm::cont::CLASS vs vtkm::exec::CLASS. I think this should make it clear, and make it easier to remember and navigate around. Or, did you have some other issue in mind? --Dpugmire (talk) 09:08, 24 July 2015 (EDT)
    After diving into this a bit, I am considering just rolling the functionality on the control side into the CellSet class. That means on the control side you manage all topological links from the same class whereas on the execution side you get a class specific to topological links in a single direction (e.g. point to cell or cell to point). To do this, the PrepareForInput method can be overloaded based on the from/to elements. It is a simple matter to specify which one to call in the transport layer. --Kmorel (talk) 17:58, 31 July 2015 (EDT)
    Yes, this is how I ended up implementing it. Making CellSet manage everything on the control side and Connectivity* manage everything on the execution side has cleaned up the code quite a bit. --Kmorel (talk) 10:17, 3 August 2015 (EDT)
  • The TopologyType enum should be changed to a collection of tag structures (e.g. struct TopologyTypeCellTag {};). Tags are a bit less awkward for templates and overloading, and as far as I can tell these enumerations are not used for anything other than template resolution.
  • The user code should not have to call the get connectivity method on the cell set. This should be handled within the worklet (more specifically in the transport). Which type of connectivity to get (node to cell vs. cell to node or any other combination) should be defined by the workout type. This means there will be a different worklet type for each from-to combination. That might lead to worklets templated on the from-to pair.
  • There is inconsistency in the class names for structured grids. The cell set class was named structured but the connectivity class is named regular. I think the appropriate name is structured since if you apply an explicit coordinate system it becomes irregular. So the names should be CellSetStructured and ConnectivityStructured. However, the Make*RegularDataSet method names in MakeTestDataSet.h are correct since that is making it with a uniform coordinate system to make it regular.
  • There is also inconsistency in using the names "point" and "node", both of which are used to refer to the vertices in the cell structure. Generally, VTK-m should use point, which follows the VTK naming convention. However, I propose to use the name "node" to refer to the local index of vertices in the cell. For example, say you have a 10x10 2D structured grid. All quadrilateral cells of this mesh have nodes 0, 1, 2, and 3. The first quadrilateral has points 0, 1, 11, and 10. The second quadrilateral has points 1, 2, 12, and 11.
Should we add the concept of extents to the structured cell sets? VTK uses a 6 id min/max extent rather than a 3 id dimension for structured 3D grids. Dax adopted this simply because that is what VTK did. But everyone finds extents confusing and I don't know if they are important? Do we need them in VTK-m? --Kmorel (talk) 10:56, 20 July 2015 (EDT)
I talked to both Jeremy and Berk about this. Both felt that there is probably little to be gained by expressing the extent over the dimensions, so it is probably not worth the added complexity. --Kmorel (talk) 08:48, 23 July 2015 (EDT)

SAND 2015-6210 O