Resample And Composite Engine II (RACE II)
A high-performance low-bandwidth general purpose Volume Graphics solution.

"How do you reach a destination in the least possible time? 
Drive faster or take a short cut? ..........

RACE does both!!"

Click here for the RACE II IEEE Viz99 Late Breaking Hot Topic Paper

 

DVR HOME

The RACE I and RACE II architectures are high-performance engines that were designed to be extremely memory efficient; hence, they are the only architectures that access a voxel only once per projection regardless of the degree of over-sampling. Furthermore, it can potentially exhibit frame-to-frame coherence for multiple view positions. Previous DVR solutions rely primarily on one of two forms of acceleration: 1) sample-reduction or 2) large sample-throughput (sample-memory efficiency defined in [3]). We define sample-memory efficiency as the ability to render the dataset as fast as it can be read from the voxel-memory subsystem when one-to-one voxel-to-samples is present and there is no sample reduction. This is the theoretical maximum performance for rendering volumes that exhibit one-to-one sampling and no sample reduction. To date DVR-architectures have only been able to sufficiently utilize one form acceleration. By combining both forms of acceleration, the RACE II architecture is expected to have a two-fold speedup over other solutions. The RACE I architecture will achieve larger average perspective performance than other solutions that have more than double the amount of voxel-bandwidth as the RACE engine.

To achieve a new level of performance RACE implements a truly hybrid algorithm. In a nutshell, it uses object-order control over image-order rendering units. It allows the RACE architectures to trivially handle perspective projections and sample reduction techniques such as Early-Ray Termination (ERT) and Space-leaping. RACE as presented in [4] only utilizes early-ray termination. RACE II supports space-leaping. Based on simulation results the RACE architecture will achieve higher average perspective rendering performance than currently available solutions using anywhere from 33-75% less bandwidth the other solutions (See table below). RACE II renders 256x256x128 volumes at an average of 80-90Hz and 256x256x256 volumes at an average of 44 Hz with worst-case 40 Hz and 20 Hz respectively.

Hardware Description (RACE I/RACE II):

  • 4 100 MHz rendering units
  • 4 8 MB SDRAMS for volume storage (4-way interleaved)
  • 4 2 MB SDRAMS for pixel-ray storage and slice buffers (4-way interleaved)
  • 1 Projection and memory control unit.
  • 96-bit (12 byte) wide pixel-ray interface to each rendering unit
  • 2 byte wide voxel interface to each rendering unit
  • Operational frequency is currently 100 MHz
  • Worst-case frame rate for one-to-one resampling is 20 Hz
  • Average frame rate is 22 Hz for RACE I and 44 Hz for RACE II.
  • RACE II requires 4 KByte buffer to store empty-space information in the controller

Current limitations of OTHER approaches

Image-order architectures:
  • Sample throughput bottleneck because of the eight-to-one voxel to rendering pipeline topology.
  • Lack of scalability (each rendering pipeline needs a copy of the ENTIRE volume memory!)
  • Low performance when 20% or more of the dataset contributes to the final rendered image.
  • Large sensitivities to dataset type, view position, and classification mapping.
  • Requires non-interactive pre-processing overhead
  • Performance is inversely related to the size of the interpolation filter
Object-order ray-casters:
  • Lack of support for sample reduction techniques
  • Large number of rendering engines required for next generation datasets
The RACE architecture address the limitations of both approaches. The simulated RACE I and RACE II results are compared to published results from other DVR solutions below. The table shows that the RACE I architecture achieves higher effective acceleration than all of the architectures that support perspective projections and a maximum of 0.10% behind the architecture that supports the largest parallel performance. On the other hand, the RACE II architecture almost provides a 2x speedup over the next closest architecture for either perspective or parallel projections under nominal conditions. As a result, we expect the RACE II architecture to be one of the first architectures to support +20 Hz average performance for rendering 512x512x512 datasets and larger with support for perspective and parallel projections. For example, RACE II is expected to render a 512x512x512 dataset at 11 Hz with only 800 Mvoxel-per-second memory bandwidth! That's four RACE rendering units operating at 200 MHz. That's because the RACE II architecture can compute almost two samples on average (1 processed sample and nearly 1 skipped sample) per voxel accessed from memory without over-sampling. The next closet architecture can only compute approximately 1 sample per voxel accessed with several other architectures computing only 1 sample per 6-8 voxels accessed from memory. Thus, the RACE II architecture has extremely efficient voxel-bandwidth utilization allowing near-optimal acceleration.
 
 

The figure above shows that the RACE II architecture achieves the largest amount acceleration when 5-90% dataset contributes to the final image. RACE I achieves better aggregate performance (parallel and perspective projections) than the remaining architectures when 12-100% of the dataset contributes to the final image. Volume Pro excels when 90-100% of the dataset contributes to the final image for parallel projections. The two image-order architectures (plots overlap) will achieve the most acceleration when less than 5% of the dataset contributes to the final image. Since VIZARD II currently only uses Early-Ray termination, it is unlikely that less than 70% of the dataset will contribute to the final image using reasonable opacity thresholds. However, the architecture (called the VG Engine) that utilizes space-leaping may reach as low as10% under certain conditions (i.e., binary opacity classifications of sparse datasets). The RACE engines provides robust acceleration for general purpose Volume Graphics and Volume Visualization since they provide the nearest-to-optimal acceleration for over 85% of the range. As a result of efficient voxel-sample processing, we can achieve comparable or better performance than other solutions using anywhere from 33-75% less voxel-bandwidth!

DVR Architecture Comparison 
(Estimated average frame rate is based on 256x256x256 semi-transparent non-segmented MRI dataset. Worst-case is based on no sample reduction.)

 
SGI High-end Graphics Workstation (w/ 1 Raster Manager)
Volume Pro [2]
Vizartd II [1]
VG Engine [5]
RACE I [4] /
RACE II
Voxel-per-second Bandwidth (Hz)
1,600 M
533 M
800 M
800 M
400 M
Peak sample throughput (Hz)
200 M
533 M
100 M
100 M
400 M
Pipeline Frequency
200 MHz
133 MHz
100 MHz
100 MHz
100 MHz
Frame-Rate Worst-Case
(16M samples per frame)
10 Hz
30 Hz (Parallel Projection)
6 Hz
6 Hz
20 Hz
Frame-Rate Average
10 Hz
30 Hz (Parallel Projection)
10 Hz
43 Hz 
(opaque object)

28 Hz (single-layer transparency)

17 Hz (multiple-layer semi-transparent)

22 Hz (RACE I)
44 Hz (RACE II)
Sample Efficiency (hardware acceleration in absence of sample reduction)
0.125
0.95
0.125
0.125
0.85
Estimated average algorithmic speedup using sample reduction
None
None
20% w/ ERT
150% w/ ERT and Space Leaping
5% w/ ERT (RACE-I)

100% w/ Space-leaping (RACE II)

Effective # of samples computed per voxel accessed
< 0.125
0.95
0.15
0.300
0.89 (RACE I)
1.7 (RACE II)
Perspective Support
Yes
No
Yes
Yes
Yes
Algorithm
3D Texture Map
Hybrid Project-warp
Ray Casting
Ray Casting
Projection Assisted 
Ray Casting
Relative Scalability
Very-poor
Good
Poor
Poor
Moderate
Estimated hardware requirements to render a 512x512x512 dataset at 30 Hz with 200 MHz pipelines 
>20 Pipelines
>32GVoxel/Second
>5120 MB
>160 memory units
20 Pipelines
4GVoxel/Second 
256 MB
4 memory untis 
16 Pipelines
25GVoxel/Second
4096 MB
128 memory units
8 Pipelines
13GVoxel/Second
2048 MB
64 memory units 
RACE I
 20 Pipelines
4GVoxel/Second
256 MB units
4 memory units

RACE II
12 Pipelines
2.4GVoxel/Second
256 MB
4 memory units

Algorithm Class
object-order
object-order
image-order 
image-order
hybrid-order (object-order control over image-order rendering)
Polygon Mixing
Yes
No 
No
No
No (Future Support for opaque polygons or single-layer transparency using z-buffer blending)
Pre-processing
No
No
Yes
Yes
No
Relative Cost
Very Expensive
Moderate
Low
Low
Moderate
Status
Available
Available
Available end of 1999
Available end of 1999
Simulated

RACE II Images:

2563 semi-transparent MRI dataset 
parallel rendered 
(40 Hz)
2563 semi-transparent MRI dataset 
perspective rendered 
(48 Hz)
256x256x128 semi-transparent 
CT dataset parallel rendered
(80 Hz)
256x256x128 semi-transparent 
CT dataset perspective rendered
(85 Hz)
256x256x128 semi-transparent 
synthetic dataset parallel rendered
(84 Hz)
256x256x128 semi-transparent 
synthetic dataset perspective rendered
(89 Hz)

References

1) M. Meissner, U. Kanus, and W. SraBer, VIZARD II: A PCI-Card for Real-Time Volume Rendering. In Proceedings of the Siggraph/Eurographics Workshop on Graphics Hardware, pages 61-67, Lisbon, Portugal 1998.

2) R. Osborne, H. Pfister, H. Lauer, N. McKenzie, S. Gibson, W. Hiatt, and T. Ohkami, EM-CUBE: An Architecture for Low-Cost Real-Time Volume Rendering. In Proceedings of the Siggraph/Eurographics Workshop on Graphics Hardware, pages 131-138, Los Angeles, August 1997

3) H. Ray, H. Pfister, D. Silver, and T. Cook, "Ray Casting Architectures for Volume Visualization", accepted to IEEE Transactions on Visualization and Computer Graphics, September 1999, Vol. 5, No. 3

4) H. Ray and D. Silver, "A Memory Efficient Architecture For Real-Time Parallel and Perspective Direct Volume Rendering", Rutgers State University Technical Report CAIP-TR-237, July 1999

5) B. Vettermann, J. Hesser, and R. Manner, Solving the Hazard Problem for Algorithmically Optimized Real-Time Volume Rendering. In International Workshop on Volume Graphics, March 1999.