Difference between revisions of "Comparing Isosurface With and Without Scatter Classes"
(One intermediate revision by the same user not shown) | |||
Line 448: | Line 448: | ||
! New implementation | ! New implementation | ||
|} | |} | ||
+ | |||
+ | == Performance == | ||
+ | |||
+ | This change shifts the structure of the code quite a bit, so we should be careful not to tank performance. To test this, I have measured the performance of the Marching Cubes algorithm on a couple of testbeds at Sandia. The first uses a Haswell Intel CPU containing 32 cores with 2x hyperthreading (64 threads all total). The second uses an NVIDIA Tesla K80 GPU. All experiments were run on the isosurface of the supernova data set with 10 isovalues ranging from 0 to 0.045. Each isosurface was executed 10 times. | ||
+ | |||
+ | These tests were run using a version of VTK-m taken just before the addition of Scatter objects (SHA 4a167dec0a01aaff40c54f15c179707c1cefb044) and one after the update of the Marching Cubes algorithm to the Scatter classes (SHA 1898ab473ce2a352de9341c322e69557105f400a). | ||
+ | |||
+ | The following table gives the summary of the average times. The average is taken for each code version and hardware architecture over all trials and all isosurface values. | ||
+ | |||
+ | {| | ||
+ | ! Device || Code Version || Average Time | ||
+ | |- | ||
+ | | Haswell || Original || 0.151 s | ||
+ | |- | ||
+ | | || Scatter || 0.140 s | ||
+ | |- | ||
+ | | Tesla K80 || Original || 0.058 s | ||
+ | |- | ||
+ | | || Scatter || 0.054 s | ||
+ | |} | ||
+ | |||
+ | As you can see, using the scatter classes is just as efficient as the original code. If anything, the changes have improved the performance a little bit. | ||
== Next Steps == | == Next Steps == | ||
Line 453: | Line 475: | ||
The new code structure will make it easier to implement isosurfaces for more general data sets. That is surely the next step. | The new code structure will make it easier to implement isosurfaces for more general data sets. That is surely the next step. | ||
− | An improvement that can be made to the VTK-m framework is to better support geometry generation. There are still some by-hand things being done (such as creating <tt>ArrayHandleGroupVec</tt> arrays for the triangle outputs. It would be better if this was managed by the dispatcher. This implementation also generates a triangle soup. There should be a way to weld the vertices to make a manifold surface. We could add something specific to isosurface, but this is a common enough operation in visualization that there should be support at the framework level. | + | An improvement that can be made to the VTK-m framework is to better support geometry generation. There are still some by-hand things being done (such as creating <tt>ArrayHandleGroupVec</tt> arrays for the triangle outputs). It would be better if this was managed by the dispatcher. This implementation also generates a triangle soup. There should be a way to weld the vertices to make a manifold surface. We could add something specific to isosurface, but this is a common enough operation in visualization that there should be support at the framework level. |
Latest revision as of 12:03, 10 November 2015
We have recently added scatter classes to VTK-m, which allows building worklets that have a variable amount of output rather than a 1:1 relationship. This is particularly important for algorithms like isosurface that require specifying some amount of output depending on values of the input. Before these scatter classes were introduced, algorithms like isosurface had to build their own indices and then perform their own cross lookup. In addition to being inconvenient and difficult to read, it also meant that the algorithm could not take advantage of other VTK-m features available in the dispatcher/transport/fetch location.
This document compares the implementation of isosurface just before moving to scatter and just after.
Contents
The Code
The classify cell part of the isosurface algorithm has remained essentially the same, so we'll ignore that part. The implementation of the generate part of the algorithm has changed significantly in this transformation. We will look at the code in parts and compare the implementations directly.
Worklet State
Because the original implementation had to build its own indices and perform its own data lookups, it had to store array portals in the state of the worklet as well as auxiliary metadata so that the data could be reconstructed in the body of the worklet. In contrast, the new implementation can leverage VTK-m's built in ability to perform these data lookups. All that needs to be specified in the state is the Scatter class being used.
const FieldType Isovalue;
vtkm::Id xdim, ydim, zdim;
const float xmin, ymin, zmin, xmax, ymax, zmax;
typedef typename vtkm::cont::ArrayHandle<FieldType>::
template ExecutionTypes<DeviceAdapter>::PortalConst FieldPortalType;
FieldPortalType Field, Source;
typedef typename vtkm::cont::ArrayHandle<vtkm::Vec<FieldType, 3> >::
template ExecutionTypes<DeviceAdapter>::Portal VectorPortalType;
VectorPortalType Vertices;
VectorPortalType Normals;
typedef typename vtkm::cont::ArrayHandle<FieldType>::
template ExecutionTypes<DeviceAdapter>::Portal OutputPortalType;
OutputPortalType Scalars;
typedef typename vtkm::cont::ArrayHandle<vtkm::Id> IdArrayHandle;
typedef typename IdArrayHandle::ExecutionTypes<DeviceAdapter>::PortalConst
IdPortalType;
IdPortalType TriTable;
const vtkm::Id cellsPerLayer, pointsPerLayer;
|
const FieldType Isovalue;
ScatterType Scatter;
|
Original Implementation | New implementation |
---|
As the amount of state has dramatically reduced, so has the the complexity of the constructor.
template<typename U, typename W, typename X>
VTKM_CONT_EXPORT
IsoSurfaceGenerate(FieldType ivalue, const vtkm::Id3 cdims,
IdPortalType triTablePortal,
const U & field, const U & source,
const W & vertices, const W & normals,
const X & scalars) :
Isovalue(ivalue),
xdim(cdims[0]), ydim(cdims[1]), zdim(cdims[2]),
xmin(-1), ymin(-1), zmin(-1), xmax(1), ymax(1), zmax(1),
Field( field.PrepareForInput( DeviceAdapter() ) ),
Source( source.PrepareForInput( DeviceAdapter() ) ),
Vertices(vertices),
Normals(normals),
Scalars(scalars),
TriTable(triTablePortal),
cellsPerLayer(xdim * ydim),
pointsPerLayer ((xdim+1)*(ydim+1))
{
}
|
template<typename CountArrayType, typename Device>
VTKM_CONT_EXPORT
IsoSurfaceGenerate(FieldType isovalue,
const CountArrayType &countArray,
Device)
: Isovalue(isovalue), Scatter(countArray, Device()) { }
|
Original Implementation | New implementation |
---|
Signatures
Because the new implementation passes data through the dispatcher's invoke interface rather than as array state. Consequently, the interface of the ControlSignature and ExecutionSignature are much longer in the new interface, but they also specify more features of the transport/fetch.
typedef void ControlSignature(FieldIn<IdType> inputCellId,
FieldIn<IdType> inputIteration);
typedef void ExecutionSignature(WorkIndex, _1, _2);
|
typedef void ControlSignature(
TopologyIn topology, // Cell set
FieldInPoint<> fieldIn, // Input point field defining the contour
FieldInPoint<Vec3> pcoordIn, // Input point coordinates
FieldOutCell<> vertexOut, // Vertices for output triangles
// TODO: Have a better way to iterate over and interpolate fields
FieldInPoint<Scalar> scalarsIn, // Scalars to interpolate
FieldOutCell<> scalarsOut, // Interpolated scalars (one per tri vertex)
FieldOutCell<> normalsOut, // Estimated normals (one per tri vertex)
ExecObject TriTable // An array portal with the triangle table
);
typedef void ExecutionSignature(
CellShape, _2, _3, _4, _5, _6, _7, _8, VisitIndex);
|
Original Implementation | New implementation |
---|
Likewise, the interface to the parentheses operator has grown commensurately.
VTKM_EXEC_EXPORT
void operator()(vtkm::Id outputCellId,
vtkm::Id inputCellId,
vtkm::Id inputLowerBounds) const
|
template<typename CellShapeTag,
typename FieldInType, // Vec-like, one per input point
typename CoordType, // Vec-like (one per input point) of Vec-3
typename VertexOutType, // Vec-3 of Vec-3 coordinates (for triangle)
typename ScalarInType, // Vec-like, one per input point
typename ScalarOutType, // Vec-3 (one value per tri vertex)
typename NormalOutType, // Vec-3 of Vec-3
typename TriTablePortalType> // Array portal
VTKM_EXEC_EXPORT
void operator()(
CellShapeTag shape,
const FieldInType &fieldIn, // Input point field defining the contour
const CoordType &coords, // Input point coordinates
VertexOutType &vertexOut, // Vertices for output triangles
// TODO: Have a better way to iterate over and interpolate fields
const ScalarInType &scalarsIn, // Scalars to interpolate
ScalarOutType &scalarsOut, // Interpolated scalars (one per tri vertex)
NormalOutType &normalsOut, // Estimated normals (one per tri vertex)
const TriTablePortalType &triTable, // An array portal with the triangle table
vtkm::IdComponent visitIndex
) const
|
Original Implementation | New implementation |
---|
The Meat
Of course, the meat of the worklet algorithm is the code inside the parentheses operator. The new implementation of this code is quite simplified because the operations to compute indices and load data are completely removed. Likewise the point coordinates do not have to be directly computed either. The only data not directly provided is the Marching Cubes case.
// Get data for this cell
const int verticesForEdge[] = { 0, 1, 1, 2, 3, 2, 0, 3,
4, 5, 5, 6, 7, 6, 4, 7,
0, 4, 1, 5, 2, 6, 3, 7 };
const vtkm::Id x = inputCellId % xdim;
const vtkm::Id y = (inputCellId / xdim) % ydim;
const vtkm::Id z = inputCellId / cellsPerLayer;
// Compute indices for the eight vertices of this cell
const vtkm::Id i0 = x + y*(xdim+1) + z * pointsPerLayer;
const vtkm::Id i1 = i0 + 1;
const vtkm::Id i2 = i0 + 1 + (xdim + 1); //xdim is cell dim
const vtkm::Id i3 = i0 + (xdim + 1); //xdim is cell dim
const vtkm::Id i4 = i0 + pointsPerLayer;
const vtkm::Id i5 = i1 + pointsPerLayer;
const vtkm::Id i6 = i2 + pointsPerLayer;
const vtkm::Id i7 = i3 + pointsPerLayer;
// Get the field values at these eight vertices
FieldType f[8];
f[0] = this->Field.Get(i0);
f[1] = this->Field.Get(i1);
f[2] = this->Field.Get(i2);
f[3] = this->Field.Get(i3);
f[4] = this->Field.Get(i4);
f[5] = this->Field.Get(i5);
f[6] = this->Field.Get(i6);
f[7] = this->Field.Get(i7);
// Compute the Marching Cubes case number for this cell
unsigned int cubeindex = 0;
cubeindex += (f[0] > this->Isovalue);
cubeindex += (f[1] > this->Isovalue)*2;
cubeindex += (f[2] > this->Isovalue)*4;
cubeindex += (f[3] > this->Isovalue)*8;
cubeindex += (f[4] > this->Isovalue)*16;
cubeindex += (f[5] > this->Isovalue)*32;
cubeindex += (f[6] > this->Isovalue)*64;
cubeindex += (f[7] > this->Isovalue)*128;
// Compute the coordinates of the uniform regular grid at each of the cell's eight vertices
vtkm::Vec<FieldType, 3> p[8];
// If we have offset and spacing, can we simplify this computation
{
vtkm::Vec<FieldType, 3> offset = vtkm::make_Vec(xmin+(xmax-xmin),
ymin+(ymax-ymin),
zmin+(zmax-zmin) );
vtkm::Vec<FieldType, 3> spacing = vtkm::make_Vec( 1.0f /((float)(xdim-1)),
1.0f /((float)(ydim-1)),
1.0f /((float)(zdim-1)));
vtkm::Vec<FieldType, 3> firstPoint = offset * spacing * vtkm::make_Vec( x, y, z );
vtkm::Vec<FieldType, 3> secondPoint = offset * spacing * vtkm::make_Vec( x+1, y+1, z+1 );
p[0] = vtkm::make_Vec( firstPoint[0], firstPoint[1], firstPoint[2]);
p[1] = vtkm::make_Vec( secondPoint[0], firstPoint[1], firstPoint[2]);
p[2] = vtkm::make_Vec( secondPoint[0], secondPoint[1], firstPoint[2]);
p[3] = vtkm::make_Vec( firstPoint[0], secondPoint[1], firstPoint[2]);
p[4] = vtkm::make_Vec( firstPoint[0], firstPoint[1], secondPoint[2]);
p[5] = vtkm::make_Vec( secondPoint[0], firstPoint[1], secondPoint[2]);
p[6] = vtkm::make_Vec( secondPoint[0], secondPoint[1], secondPoint[2]);
p[7] = vtkm::make_Vec( firstPoint[0], secondPoint[1], secondPoint[2]);
}
// Get the scalar source values at the eight vertices
FieldType s[8];
s[0] = this->Source.Get(i0);
s[1] = this->Source.Get(i1);
s[2] = this->Source.Get(i2);
s[3] = this->Source.Get(i3);
s[4] = this->Source.Get(i4);
s[5] = this->Source.Get(i5);
s[6] = this->Source.Get(i6);
s[7] = this->Source.Get(i7);
|
// Get data for this cell
const vtkm::IdComponent verticesForEdge[] = { 0, 1, 1, 2, 3, 2, 0, 3,
4, 5, 5, 6, 7, 6, 4, 7,
0, 4, 1, 5, 2, 6, 3, 7 };
// Compute the Marching Cubes case number for this cell
vtkm::IdComponent cubeindex = 0;
cubeindex += (fieldIn[0] > this->Isovalue);
cubeindex += (fieldIn[1] > this->Isovalue)*2;
cubeindex += (fieldIn[2] > this->Isovalue)*4;
cubeindex += (fieldIn[3] > this->Isovalue)*8;
cubeindex += (fieldIn[4] > this->Isovalue)*16;
cubeindex += (fieldIn[5] > this->Isovalue)*32;
cubeindex += (fieldIn[6] > this->Isovalue)*64;
cubeindex += (fieldIn[7] > this->Isovalue)*128;
|
Original Implementation | New implementation |
---|
Note that although the new code is significantly smaller, it corrects a major bug in the original implementation where the point coordinates were not being computed correctly. This is because we are using the shared code in the VTK-m library that we know is consistent and correct.
Another advantage of this approach is that we have removed all the structured-grid-specific code. This code is ripe for converting to being used for other types of grids (like explicit cell sets). It will be fairly trivial to add cases for other types of cells as well.
The remainder of the code is interpolating values on the edges. This code is largely the same. Note, however, that I have changed most of the variable names to be more expressive (for code readability), which makes the code extend a few more lines. Also note that I changed the normal computation. The original code used the normal for the triangle generated whereas the new code uses to gradient of the 3D field. The gradient is a bit more accurate.
// Interpolate for vertex positions and associated scalar values
const vtkm::Id inputIteration = (outputCellId - inputLowerBounds);
const vtkm::Id outputVertId = outputCellId * 3;
const vtkm::Id cellOffset = static_cast<vtkm::Id>(cubeindex*16) + (inputIteration * 3);
for (int v = 0; v < 3; v++)
{
const vtkm::Id edge = this->TriTable.Get(cellOffset + v);
const int v0 = verticesForEdge[2*edge];
const int v1 = verticesForEdge[2*edge + 1];
const FieldType t = (this->Isovalue - f[v0]) / (f[v1] - f[v0]);
this->Vertices.Set(outputVertId + v, vtkm::Lerp(p[v0], p[v1], t));
this->Scalars.Set(outputVertId + v, vtkm::Lerp(s[v0], s[v1], t));
}
vtkm::Vec<FieldType, 3> vertex0 = this->Vertices.Get(outputVertId + 0);
vtkm::Vec<FieldType, 3> vertex1 = this->Vertices.Get(outputVertId + 1);
vtkm::Vec<FieldType, 3> vertex2 = this->Vertices.Get(outputVertId + 2);
vtkm::Vec<FieldType, 3> curNorm = vtkm::Cross(vertex1-vertex0, vertex2-vertex0);
vtkm::Normalize(curNorm);
this->Normals.Set(outputVertId + 0, curNorm);
this->Normals.Set(outputVertId + 1, curNorm);
this->Normals.Set(outputVertId + 2, curNorm);
}
|
// Interpolate for vertex positions and associated scalar values
const vtkm::Id triTableOffset =
static_cast<vtkm::Id>(cubeindex*16 + visitIndex*3);
for (vtkm::IdComponent triVertex = 0;
triVertex < 3;
triVertex++)
{
const vtkm::IdComponent edgeIndex =
triTable.Get(triTableOffset + triVertex);
const vtkm::IdComponent edgeVertex0 = verticesForEdge[2*edgeIndex + 0];
const vtkm::IdComponent edgeVertex1 = verticesForEdge[2*edgeIndex + 1];
const FieldType fieldValue0 = fieldIn[edgeVertex0];
const FieldType fieldValue1 = fieldIn[edgeVertex1];
const FieldType interpolant =
(this->Isovalue - fieldValue0) / (fieldValue1 - fieldValue0);
vertexOut[triVertex] = vtkm::Lerp(coords[edgeVertex0],
coords[edgeVertex1],
interpolant);
scalarsOut[triVertex] = vtkm::Lerp(scalarsIn[edgeVertex0],
scalarsIn[edgeVertex1],
interpolant);
const vtkm::Vec<vtkm::FloatDefault,3> edgePCoord0 =
vtkm::exec::ParametricCoordinatesPoint(
fieldIn.GetNumberOfComponents(), edgeVertex0, shape, *this);
const vtkm::Vec<vtkm::FloatDefault,3> edgePCoord1 =
vtkm::exec::ParametricCoordinatesPoint(
fieldIn.GetNumberOfComponents(), edgeVertex1, shape, *this);
const vtkm::Vec<vtkm::FloatDefault,3> interpPCoord =
vtkm::Lerp(edgePCoord0, edgePCoord1, interpolant);
normalsOut[triVertex] =
vtkm::Normal(vtkm::exec::CellDerivative(
fieldIn, coords, interpPCoord, shape, *this));
}
|
Original Implementation | New implementation |
---|
The Control-Side
The code on the control side is also greatly simplified because the index generation is removed because it is now all performed internally by the scatter class.
typedef typename vtkm::cont::DeviceAdapterAlgorithm<DeviceAdapter> DeviceAlgorithms;
// Set up the Marching Cubes case tables
vtkm::cont::ArrayHandle<vtkm::Id> vertexTableArray =
vtkm::cont::make_ArrayHandle(vtkm::worklet::internal::numVerticesTable,
256);
vtkm::cont::ArrayHandle<vtkm::Id> triangleTableArray =
vtkm::cont::make_ArrayHandle(vtkm::worklet::internal::triTable,
256*16);
// Call the ClassifyCell functor to compute the Marching Cubes case numbers
// for each cell, and the number of vertices to be generated
ClassifyCell classifyCell(vertexTableArray.PrepareForInput(DeviceAdapter()),
isovalue);
typedef typename vtkm::worklet::DispatcherMapTopology<
ClassifyCell,
DeviceAdapter> ClassifyCellDispatcher;
ClassifyCellDispatcher classifyCellDispatcher(classifyCell);
vtkm::cont::ArrayHandle<vtkm::Id> numOutputTrisPerCell;
classifyCellDispatcher.Invoke( field,
this->DataSet.GetCellSet(0),
numOutputTrisPerCell);
// Compute the number of valid input cells and those ids
const vtkm::Id numOutputCells = DeviceAlgorithms::ScanInclusive(numOutputTrisPerCell,
numOutputTrisPerCell);
// Terminate if no cells has triangles left
if (numOutputCells == 0) return;
vtkm::cont::ArrayHandle<vtkm::Id> validCellIndicesArray, inputCellIterationNumber;
vtkm::cont::ArrayHandleIndex validCellCountImplicitArray(numOutputCells);
DeviceAlgorithms::UpperBounds(numOutputTrisPerCell,
validCellCountImplicitArray,
validCellIndicesArray);
numOutputTrisPerCell.ReleaseResources();
// Compute for each output triangle what iteration of the input cell generates it
DeviceAlgorithms::LowerBounds(validCellIndicesArray,
validCellIndicesArray,
inputCellIterationNumber);
// Generate a single triangle per cell
const vtkm::Id numTotalVertices = numOutputCells * 3;
IsoSurfaceGenerate isosurface(isovalue,
this->CDims,
triangleTableArray.PrepareForInput(DeviceAdapter()),
field,
field,
verticesArray.PrepareForOutput(numTotalVertices, DeviceAdapter()),
normalsArray.PrepareForOutput(numTotalVertices, DeviceAdapter()),
scalarsArray.PrepareForOutput(numTotalVertices, DeviceAdapter())
);
typedef typename vtkm::worklet::DispatcherMapField< IsoSurfaceGenerate,
DeviceAdapter> IsoSurfaceDispatcher;
IsoSurfaceDispatcher isosurfaceDispatcher(isosurface);
isosurfaceDispatcher.Invoke(validCellIndicesArray,
inputCellIterationNumber);
|
// Set up the Marching Cubes case tables
vtkm::cont::ArrayHandle<vtkm::IdComponent> vertexTableArray =
vtkm::cont::make_ArrayHandle(vtkm::worklet::internal::numVerticesTable,
256);
vtkm::cont::ArrayHandle<vtkm::IdComponent> triangleTableArray =
vtkm::cont::make_ArrayHandle(vtkm::worklet::internal::triTable,
256*16);
typedef vtkm::exec::ExecutionWholeArrayConst<vtkm::IdComponent, VTKM_DEFAULT_STORAGE_TAG, DeviceAdapter>
TableArrayExecObjectType;
// Call the ClassifyCell functor to compute the Marching Cubes case numbers
// for each cell, and the number of vertices to be generated
ClassifyCell classifyCell(isovalue);
typedef typename vtkm::worklet::DispatcherMapTopology<
ClassifyCell,
DeviceAdapter> ClassifyCellDispatcher;
ClassifyCellDispatcher classifyCellDispatcher(classifyCell);
vtkm::cont::ArrayHandle<vtkm::IdComponent> numOutputTrisPerCell;
classifyCellDispatcher.Invoke( field,
this->DataSet.GetCellSet(0),
numOutputTrisPerCell,
TableArrayExecObjectType(vertexTableArray));
IsoSurfaceGenerate isosurface(isovalue, numOutputTrisPerCell, DeviceAdapter());
vtkm::worklet::DispatcherMapTopology<IsoSurfaceGenerate, DeviceAdapter>
isosurfaceDispatcher(isosurface);
isosurfaceDispatcher.Invoke(
// Currently forcing cell set to be structured. Eventually we should
// relax this as we support other grid types.
this->DataSet.GetCellSet(0).ResetCellSetList(
vtkm::ListTagBase<vtkm::cont::CellSetStructured<3> >()),
field,
this->DataSet.GetCoordinateSystem(0).GetData(),
vtkm::cont::make_ArrayHandleGroupVec<3>(verticesArray),
field, // This is silly. The field will interp to isovalue
vtkm::cont::make_ArrayHandleGroupVec<3>(scalarsArray),
vtkm::cont::make_ArrayHandleGroupVec<3>(normalsArray),
TableArrayExecObjectType(triangleTableArray)
);
|
Original Implementation | New implementation |
---|
Performance
This change shifts the structure of the code quite a bit, so we should be careful not to tank performance. To test this, I have measured the performance of the Marching Cubes algorithm on a couple of testbeds at Sandia. The first uses a Haswell Intel CPU containing 32 cores with 2x hyperthreading (64 threads all total). The second uses an NVIDIA Tesla K80 GPU. All experiments were run on the isosurface of the supernova data set with 10 isovalues ranging from 0 to 0.045. Each isosurface was executed 10 times.
These tests were run using a version of VTK-m taken just before the addition of Scatter objects (SHA 4a167dec0a01aaff40c54f15c179707c1cefb044) and one after the update of the Marching Cubes algorithm to the Scatter classes (SHA 1898ab473ce2a352de9341c322e69557105f400a).
The following table gives the summary of the average times. The average is taken for each code version and hardware architecture over all trials and all isosurface values.
Device | Code Version | Average Time |
---|---|---|
Haswell | Original | 0.151 s |
Scatter | 0.140 s | |
Tesla K80 | Original | 0.058 s |
Scatter | 0.054 s |
As you can see, using the scatter classes is just as efficient as the original code. If anything, the changes have improved the performance a little bit.
Next Steps
The new code structure will make it easier to implement isosurfaces for more general data sets. That is surely the next step.
An improvement that can be made to the VTK-m framework is to better support geometry generation. There are still some by-hand things being done (such as creating ArrayHandleGroupVec arrays for the triangle outputs). It would be better if this was managed by the dispatcher. This implementation also generates a triangle soup. There should be a way to weld the vertices to make a manifold surface. We could add something specific to isosurface, but this is a common enough operation in visualization that there should be support at the framework level.