next up previous
Next: Conclusions Up: No Title Previous: Diagnostics for visiometrics: vortex

Feature extraction: turbulence field example

The first step in visiometrics, identification, involves feature extraction, which has been considered and carried out in contexts other than the process of obtaining understanding. Feature extraction can be used as a selective way to display data, which avoids the excessive, confusing visual clutter that arises when too much information is shown [10]. Besides supporting the understanding process and helping to avoid visual clutter, feature extraction may be a very effective and natural way to deal with large datasets. Many existing visualization systems make performance trade-offs that assume relatively small quantities of data (solution and grid fit in RAM) [11]. Unfortunately no current computer can hold some of the time dependent CFD datasets (from 5 to 162 Gigabyte) that are currently being produced. The key points in dealing with these datasets are: extraction of ``scientific data'' and use of a ``persistent object database'' [11]. These objects usually correspond to ``coherent structures'' (localized objects, which persist over ``characteristic times'' [3]). Feature extraction is accompanied by large reductions in storage requirements (0.3 - 6.7 % of solution size) [11]. The main disadvantage of the feature extraction approach is that the solution domain outside the extracted regions cannot be examined [11]. Reduced representations of those sub-domains in terms of statistical quantities are still under investigation.

The main function of feature extraction is to start the abstraction process, i. e., the selection of portions of information which are fundamental in the physical phenomena observed and can be accomplished in different ways. Initially it can be just a thresholding operation and extrema tracking [12]; However, it can also be viewed as the process of obtaining reduced representations (ellipsoidal or skeletal representation plus vector lines released from selected starting points) for the data [13]. This can be used to point out causal connections between different variables [12, 9, 13] and also for model juxtaposition, i.e., detailed and quantitative comparison of experimental and/or computational images of similar or different functions at the same or different times [2]. The reduced representations correspond to ``identified objects'' in the data, and can also be used as tools for perusal, interpretation, quantification and feature tracking [3]. Other feature extraction procedures include the use of streamlines connected through nodes or critical points, which characterize global flow topology [14]. Some of these methods have been implemented as interactive tools to extract meaning from datasets in visualization environments.

The feature extraction process suggest a natural distribution of tasks between supercomputer and workstation: solvers, solutions and extractors should reside in the supercomputer, where the large size of the data set is dealt with more efficiently. Reduced object manipulation, feature tracking and time correlation are more properly performed on the workstation. The desired interactivity requires communication between supercomputer and workstation. The communication at the level of reduced object representation helps in avoiding the network bottleneck.

In the particular case of isotropic turbulence simulations, different researchers have used a ``probe'' or ``window'' (fixed or Lagrangian) for searching and obtaining the full time history of vortex tubes [15, 16]. Their object identification algorithm consists of taking a local maximum in the field, and tracing the skeleton of the vortex tube. Diagnostics include the tube's length, curvature, length to diameter ratio and circulation [17]. Algorithms for extracting ``events'' have indicated that intermittent regions may be major, if not dominant contributors to global statistics in turbulence [18]. In our studies of vortex collapse and reconnection, our ``diagnostics box'' surrounds regions of significant physical behavior, detected through the search for maxima events [5].

The feature extraction/object identification is performed in three steps: thresholding, object identification and ellipsoid fitting. The objects in the dataset, or field, are defined by a scalar function, tex2html_wrap_inline501 (e.g. vorticity magnitude) and a threshold, tex2html_wrap_inline503 . The first stage, thresholding, consists of finding and extracting the points in the dataset where f is above the threshold specified by the user, including their position in space. Object identification is performed on the thresholded grid points based on a recursive search and a connectivity criteria. A direct method is not efficient because the required operations is tex2html_wrap_inline507 ). Our object segment algorithm reduces the computational complexity of the problem to tex2html_wrap_inline509 by using an octree data-structure. Once the objects have been found, an object quantification process starts, which we call ellipsoid fitting. This consists of finding the ``mass'' m, the centroid tex2html_wrap_inline513 , the average orientation of the vector field tex2html_wrap_inline515 , the tensor of second moments tex2html_wrap_inline517 , the maximum tex2html_wrap_inline519 , and the position of the maximum inside each object tex2html_wrap_inline523 [12]. The reduced quantities tex2html_wrap_inline525 and are used to produce low order representations, or ellipsoids, which are located in the centroid of the objects. The axes are the square roots of the eigenvalues of the tensor of second moments tex2html_wrap_inline517 , normalized so that the ellipsoids and the objects have the same volume. The ellipsoids are oriented according to the eigenvectors of the tensor tex2html_wrap_inline517 . The reduced representation obtained in this manner not only fits the shape of the object, but averages over the values of the scalar field in the interior, making it possible to differentiate between objects of similar shape and volume. Using one of the reduced quantities (usually tex2html_wrap_inline519 ), the objects are sorted and listed for further use by the user or other post-processing program (like feature tracking).

As a framework for implementing these tools, we selected the commercial package AVS, which is based on a data-flow model for visualization and control [19, 20] and is constructed using the modularity and networking concepts. Application units, called modules, are organized and made available to the user through a ``network editor''. The modules are selected by the user to form networks in the network editor working area. The networks are therefore flexible enough to meet the particular needs of the users. The modular characteristics have the advantage of allowing the user to produce his/her own modules and insert them in networks of standard modules. This gives the user all of the power of the commercial product in their very specific applications. Another advantage is the availability of mechanisms to share tasks among different machines via ``remote modules''. Therefore, the package provides a basis for interaction between the supercomputer and the workstation. We have feature extraction algorithms in different implementations operating in both the supercomputer and the workstation. In the first approach, we perform object segmentation on the supercomputer. The extracted objects are then displayed on the workstation. In a second approach we produce an interactive window in the large dataset by using a remote module running on the CM5. This module sends interactively selected sub-domains of data from the supercomputer to the workstation. In a third implementation, the large dataset is subjected to a thresholding post-processing operation (selective data reduction) in the supercomputer in batch mode. The resulting reduced dataset still covers (selectively) the complete domain and can then be post-processed interactively on the workstation.

In order to deal with the large number of vortex structures present in the turbulence dataset, we classify them according to their relationship to maxima events, not only of vorticity magnitude but strain-rate as well. This is the objective of the ``object-segment'' program, which is an enhancement to the standard iso-surfacing technique. On the CM5, a number of parallel functions are used to represent data-points and for operations of connectivity and membership. We demonstrate the use of this feature isolation code in figure gif. In this figure, the threshold value 20% of the maximum was used to detect the objects observed, however, regions are extracted based upon their connectivity. The dataset is a 256 cubed scalar field (vorticity magnitude). The output of the program consists of a list of objects sorted and colored according to the local maxima inside them, which allows the user to select "coherent" regions for further quantification. In the figure, after the objects in the field have been identified, the predominant object (colored in red) is extracted for closer examination (figure gif).

The large dataset produced on the supercomputer can be accessed more easily when it still resides in that environment. In particular, for the CM5, parallel I/O can be used via the SDA (Scalable Disk Array), which provides a capacity of 25-200 Gigabytes that can be accessed at 33-254 Megabytes/second. It is possible to use the CMAVS/AVS interface to access interactively the CM5 resources through the workstation. Using this procedure, we easily read the data ( tex2html_wrap_inline483 tex2html_wrap_inline537 Megabyte) and hold it in memory. After this process is accomplished, a data reduction process is necessary to transfer the data through the network. Options tried by different researchers include the computation of geometries, which are passed to the workstation for displaying [20]. In some other cases the post-processing is extended to the production of the rendering (the 2D pixel map) on the supercomputer, which may have a smaller volume of data than the actual geometric objects forming an isosurface (for example). In our case, we extract an interactively selected cubic sub-domain, which is transferred through the network to the workstation. From the workstation, the user is allowed to change the size and the position of the extracted sub-domain so that he can browse through the data. It is possible to work in this mode interactively using the whole CM5 (1024 nodes at ACL-LANL), as has been done by some ACL researchers in special circumstances. Nevertheless, in practical situations, it may be difficult to obtain more than 128 nodes to work interactively. Another important factor for interactivity is the amount spent in transferring data between the CM5 and the workstation. For a sub-domain of tex2html_wrap_inline539 , we find that the amount of transfer time on a local Ethernet network between the CM5 and the Onyx machine (at ACL-LANL) is very acceptable. The same case running on the CM5 at NCSA (Illinois) and the Vizlab Onyx machine (New Jersey) takes a few more seconds, but is still acceptable. Researchers at ACL are able to send considerably larger amounts of data by using the HIPPI network.

In our last approach the thresholding operation is performed on the CM5 in batch mode. The thresholded points are marked, enumerated and then transferred to the output arrays by scatter-gather operations. The output file of this program contains position and the scalar value of the thresholded points. The reduced datasets obtained in this way may also store other information like strain-rate and vortex stretching magnitude. The thresholded points have the appearance of a set of scattered points. The modules developed to process this type of information include object segmentation algorithms and the diagnostics box. Non-standard data types for data-flow between the modules are introduced to handle the new formats of data (scatter points and lists of objects) produced by the data reduction and object identification processes. Rendering is achieved using standard modules. The first output of the modules is a list of "interesting" objects to be examined. The selection criteria is prescribed interactively by the user, but the search is performed automatically by the computer. The second output are ``ellipsoids'', representing the objects in the list which are colored according to the local maximum inside them. It is possible also to display the "filtered" tex2html_wrap_inline483 domain by using spheres with sizes, colors and transparencies proportional to the magnitude of the scalar field for each of the thresholded points (figure gif). The user is able to visually browse through the list of objects via the diagnostics box (figure gif). Different variables can be examined simultaneously. In this way a field can be visualized (e.g. vorticity magnitude) according to the important objects found in a related field (e.g. strain-rate magnitude), which is useful for establishing correlations. The reduced object representations or ellipsoids are also used as pointers to local maxima. The ellipsoids turn out to be very effective release regions for vector line tracers, which can be selected interactively by the user.

The region containing the maximum vorticity magnitude in the turbulence dataset is examined in figure gif. The vorticity magnitude isosurface at the threshold 30% of the maximum, shows the topology of this region. The ellipsoids, fitted at the threshold 45% of the maximum mark the local maxima regions inside the objects. The vector lines trace the vorticity field associated with the objects. The color of the lines is used to indicate direction of the vortex field (vorticity "flows" from blue to red). Sets of bundles are released from the three ellipsoids. The lines in the object in the tube at the center of the picture appears to be formed by two parallel tubes winding around each other, which bifurcate in the upper right corner region. The isosurface in the lower pictures corresponds to the magnitude of the strain-rate. It can be observed that the strain-rate maxima are not localized in the same position as the vorticity maxima, which has been observed in other turbulence simulations [21] and identified as a fundamental feature in our vortex filament models. In figure gif, we present vorticity and strain-rate magnitude fields for the object classified as the 12th according to the vorticity magnitude in the tex2html_wrap_inline483 dataset. The vorticity in the two parallel vortex tubes are of opposing signs, nevertheless tracking in time of this object is necessary to determine if this is a case of vortex collapse [7].


next up previous
Next: Conclusions Up: No Title Previous: Diagnostics for visiometrics: vortex

David &
Thu Feb 29 14:23:56 EST 1996