|
This paper appeared in the IEEE International Conference on Shape Modeling and Applications
(SMI05), 15-17 June 2005, Boston, USA.
Download [PDF] here. |
Nicu D. Cornea
M. Fatih Demirci
Deborah Silver
Ali Shokoufandeh
Sven J. Dickinson
Paul B. Kantor
Rutgers University
Drexel University
University of Toronto
Matching 3D objects is a difficult problem, with a complex relation to the 2D shape-matching problem. While the 3D nature of the representation helps remove some of the viewpoint, lighting, and occlusion problems in computer vision, other issues arise. Of course, the added dimension and the inherent increase in data size make the matching process more computationally expensive. Furthermore, many of the models are degenerate, containing holes, intersecting polygons, overly thin regions, etc. And there are many different types of matching that may be desirable. Given a query object, one may want to search an entire database for a matching exemplar, if one exists. On the other hand, if the database contains categorical models, one may want to find the category to which the query exemplar belongs.
In this paper, we use the curve-skeleton of a 3D shape to drive the
matching process. The skeleton is a useful shape abstraction that
captures the essential topology of an object in both two and three
dimensions. It provides the following characteristics, not present
in global shape descriptors:
Part/Component Matching: curve-skeletons incorporate the notion
of parts or components and so they can accommodate part
matching, where the object to
be matched is part of a larger object, or vice versa. This feature
can give the users more control over the matching
algorithm, allowing them to specify what part of the object they
would like to match or whether the matching algorithm should weight
one part of the object more than another.
Registration and visualization: The skeleton can be used to
register the two matched objects and visualize the result in a
common space. This is very important in scientific applications
where one is interested in both finding a similar object and
understanding the extent of the similarity [21].
Intuitiveness: The skeleton is an intuitive representation of
shape and can be easily understood by the user, providing more
control in the matching process.
Articulated transformation invariance: The method presented here
can be used for articulated object matching, because the skeleton
topology does not change as a result of articulated motion. An
example was shown in [21].
This work enhances the framework presented in [21] by using a new skeletonization algorithm and an extension of the many-to-many matching algorithm introduced in [12]. In [21], the idea of using skeletons for matching was described; however, only a small database was used for experimentation, and in general, the methodology was not scalable. In this paper, the entire methodology has been revised by using a more robust skeletonization algorithm and a more robust matching algorithm. We demonstrate the efficacy of this new matching framework on the objects from the Princeton Shape Database [19]. Our matching algorithm is based on establishing correspondences among two skeletal representations via distribution-based matching in metric spaces. While the performance of our algorithm is comparable to that of other existing 3D matching methods (eg. [14,19]), the locality of our skeletal representation and matching algorithm has some other benefits, such as allowing part matching, articulated matching, etc.
A number of different approaches have been proposed for the matching problem. Using a simplified description of a 3D model, usually in lower dimensions (also known as a shape signature), reduces the 3D matching problem to comparing these different signatures. The dimensional reduction and the simple nature of these shape descriptors make them ideal for applications involving searching in large databases of 3D models. Osada et al. in [14] propose the use of a distribution, sampled from one of many shape functions, as the shape signature. Among the shape functions, the distance between two random points on the surface proved to be the most effective at retrieving similar shapes. In [19], a shape descriptor based on 2D views (images rendered from uniformly sampled positions on the viewing sphere), called the Light Field Descriptor, performed better than descriptors that use the 3D properties of the object. In [11], Kazhdan et al. propose a shape description based on a spherical harmonic representation.
Related to the work described here is the 1D skeleton determination (also known as the curve-skeleton). This is related to the medial surface. A curve-skeleton is used both for its intuitiveness and for its simplicity even though it is not unique. There are many skeleton extraction techniques: thinning-based methods attempt to iteratively peel layers off a voxelized representation of the object while attempting to maintain the topology [7]; geometric methods use Voronoi diagrams to determine a medial axis [3]; distance field methods use a distance function to classify and retain centered voxels [22]; while a continuous quench function computed from the object, identifies the extremes as medial axis points [4]. For a full description, please see [6].
Most of the previous work on point and skeleton matching has focused on solving one-to-one correspondence problems. Kim and Kak [13] used a combination of discrete relaxation and bipartite matching in model-based 3-D object recognition in computer vision. Pellilo et al. [15] devised a quadratic programming framework for the matching of 2D skeletons using a maximal clique formulation. Siddiqi et al. combined a bipartite matching framework with a spectral decomposition of graph structure to match shock graphs [20].
The problem of many-to-many matching has also been studied, most often in the context of edit-distance (see, e.g. [18]). In such a setting, one seeks a minimal set of relabelings, additions, deletions, merges, and splits of nodes and edges that transform one graph into another. In [12], we introduced a many-to-many matching algorithm and studied its utility for 2D skeletal representations. We used a specific measure, the Earth Mover's Distance, to compute distances between sets of weighted vectors. The matching algorithm in this paper is an extension of the one presented in [12].
Our curve-skeleton extraction algorithm works on a volumetric representation of the 3D object. It is based on the method presented by Chuang et. al. [4] which uses a generalized potential field generated by charges placed on the surface of the object. The generalized potential at a point due to a nearby point charge is defined as a repulsive force, pushing the point away from the charge with a strength that is inversely proportional to some power of the distance between the point and the charge. This step produces a vector field.
Given a 3D vector field, we use concepts from vector field visualization to identify two types of seed points that we will used to construct a curve-skeleton: critical points and high divergence points. At critical points, the magnitude of the vector vanishes, which is why they are also called zeros of the vector field. A full discussion of the visualization of vector-field topology and the different types of critical points can be found in [8] and [10]. In addition to critical points, we also use the divergence of the vector field. The user specifies a percentage of the highest divergence value which will be used as a threshold to select new seed points. By varying this parameter, one can generate an entire hierarchy of skeletons of various complexities and select the best one for a given application. In the experiments presented in Section 4, we used a 40% divergence threshold for all our skeletons.
Skeleton segments are discovered using a force-following algorithm on the underlying vector field, starting at each of the identified seed points. The force following process evaluates the vector (force) value at the current point and moves in the direction of the vector with a small pre-defined step. At critical points, where the force vanishes, the initial directions are determined by evaluating the eigenvalues and eigenvectors of the Jacobian at the critical point. For more details on computing the curve-skeleton see [6]. Figure 1 shows a few examples of 3D objects and their respective skeletons.
The skeleton obtained using the above algorithm consists of a set of points sampled by the force following algorithm. Each skeleton point is then equipped with a distance-transform value [7], a real number specifying the distance to the closest point on the surface of the object. This additional information is used by the many-to-many matching process.
|
Computing the EMD is based on a solution to the well-known
transportation
problem [2], whose
optimal value determines the minimum amount of ``work'' required to
transform one distribution into the other. More formally, let
be the first distribution
with
points, and let
be the second
distribution with
points. Let
be the ground
distance matrix, where
is the ground distance between
points
and
. Our objective is to find a flow matrix
, with
being the flow between points
and
, that minimizes the overall cost
subject to
the following list of constraints:

The optimal value of the objective function
defines the Earth Mover's Distance between the two distributions.
An extension of the original EMD approach
[17] allows us to
match point sets that are ``non-rigidly'' embedded into the Euclidean
space by allowing sets to undergo transformations. Assuming that a
transformation is applied to the second distribution, distances
are defined as
and the
objective function becomes
The minimal value of the objective function
defines the Earth Mover's Distance between
the two distributions that are allowed to undergo a transformation.
An iterative process (which called FT, short for ``an optimal
Flow and an optimal Transformation''), that achieves
a local minimum of the objective function was suggested in
[17]. Starting with
an initial transformation
from a given
, they compute the optimal flow
that minimizes the objective function
, and from a given optimal flow
they compute an optimal transformation
that minimizes the objective function
. The iterative process stops when
the improvement in the objective function value falls below a
threshold. The resulting optimal pair
depends on the
initial transformation
.
We will use the Earth Mover's Distance under transformation between two skeletons as a measure of their similarity: a small value of this distance indicating high similarity between the two skeletons. Figure 6 shows the distances between the query and the top matched objects.
Figure 2 shows an example of matching between two objects: in step 1, the curve-skeleton for each object is computed while in step 2, the many-to-many matching establishes the distance and the correspondence between the two skeletal representations. The skeleton regions that were matched to each other are shown in the same color in Figure 2.
We first tested our proposed approach to retrieving similar objects on a subset of 1081 objects from the Princeton Shape Benchmark Database [16], grouped into 99 non-empty classes from both the test and training classifications [19] (the 3D base classification task). In our experiments, we first created 3D skeletons for each object. We used a divergence threshold of 40% for all our skeletons. We then computed the distance from each object to the remaining database entries using our many-to-many matching algorithm. If the conceptual classes correspond to bodies which vary only in scale, or by articulated transformation, our algorithm should return an object that belongs to the same class as the query. We will classify this as a ``correct matching''. Based on the overall matching statistics, we observe that in 71.1% of the experiments, the overall best match selected by our algorithm belonged to the same class as the query (also known as the nearest neighbor criterion [19]). In 74.3% of the experiments, the best match belonged to the same parent class as that of query.
In a second experiment, we asked how many of the models in the query's
class appear within the top
matches, where
is the size of
the query's class (first tier [19]). This number was
17.2%. Repeating the same experiment, but considering the top
matches (second tier [19]) covers 22.7% of
the members of the class.
Comparing these results with those reported by Shilane et. al [19] in Table 4 of their work, it should be noted that our method outperforms all methods on the nearest neighbor criterion, but does not do as well on the first and second tier criteria. The sharp drop in the precision-recall curve in Figure 3 illustrates the loss of precision for the first and second tier criteria. The precision-recall plot shows the relation between recall (the ratio of models from the class of the query returned within the top N matches) and the precision (the ratio of the top N matches that belong to the query class) [19]. Figure 3 shows the precision-recall plot averaged over all models and looking at the first 20 best matches only.
In Figure 4, we have presented the matching results for a small subset of objects. The first column of each row shows the query object; the remaining elements of each row represent the top 10 closest objects of the database determined by our matching algorithm. These are all instances where the closest object is an object from a similar class. In some cases, while the algorithm has identified an object with similar structure as best match, it was still penalized for selecting an object from an incorrect category. The query object (race car) in row one and its best matched object are an example of such a case. They can be attributed to the particular hierarchy of categories used by the Princeton Shape Benchmark Database [16]. When similarity of shape is desired, a method which relies on shape would help retrieve objects not normally associated with the exemplar and not typically categorized with it.
An important aspect of the matching is the computation of correspondence between the matched objects. Our many-to-many matching algorithm provides a direct correspondence between the skeleton points of the two matched objects. This allows one to register the query object to the database objects and aid in visualization and a better understanding of the match. The correspondences between the matched objects are illustrated with color coded regions in Figures 2 and 5. Global shape descriptors perform poorly at this task because global information cannot preserve local correspondences.
Matching of a part within a complex whole is useful for CAD-type databases and also for recognition in laser-scanned images, which tend to cluster objects together. It is presumably also central to medical applications in which a particular configuration is to be found somewhere in a larger object. Specifically, given a part of an object as a query, one attempts to locate objects containing similar subparts. Here, the difficulty lies in the fact that none of the database objects contains an exact copy of the query.
In our next experiment, we used a query part (a torso) and matched it against several simple and composite objects in the database, some containing the query part. The composite objects were obtained by a union operation applied to two simple objects - the kind of composition one would expect to encounter in laser-scanned scenes. Figure 5 shows the query object (and the corresponding skeleton) in (a) and some of the objects that was matched against in (b). In (c) we show the computed correspondence between the query skeleton and the skeletons of the objects in (b). Points that match the query skeleton are shown in red.
![]() |
Compared to our previous skeletonization method described in [21], the potential field-based skeletonization algorithm is more robust to reasonable amounts of noise on the boundary of the object (see [6]). However, the curve-skeleton may not be the best abstraction for all types of objects. For example thin disk-like objects or a mushroom shape are not well represented by a curve-skeleton. The EMD-based matching compensates for a certain amount of differences in skeletonization by looking at loose relationships among the skeleton points.
In this paper, we have presented a matching framework that is an extension to our many-to-many matching algorithm in [12]. In addition to new skeletonization and matching approaches, we have demonstrated the performance of the method on a database of over 1000 objects, with retrieval results comparable to the global shape descriptor methods presented in [19].
The skeleton-based approach has a number of advantages over the global shape descriptor methods. It is an intuitive representation of 3D objects that can be easily used to understand the similarities present in the matched objects. Since our many-to-many matching algorithm provides a direct correspondence between skeleton points in two matched objects, one can use this correspondence for registration and juxtaposition. The skeleton captures both global and local properties of the shape, so it can be used for many different matching tasks. This is demonstrated in part matching, where only a portion of the skeleton is matched. Part matching is useful in CAD environments or segmentation of laser-scanned scenes.
As future work, we plan on improving the skeletonization code further so that it can correctly handle objects with thin regions. Our part matching examples have shown many-to-many matching can be used to locate a part in a database of composite objects. The inverse problem is also of interest, where given a composite object, one would like to identify its component parts among the objects of a database.
This document was generated using the LaTeX2HTML translator Version 2002 (1.62)
Copyright © 1993, 1994, 1995, 1996,
Nikos Drakos,
Computer Based Learning Unit, University of Leeds.
Copyright © 1997, 1998, 1999,
Ross Moore,
Mathematics Department, Macquarie University, Sydney.
The command line arguments were:
latex2html -split 0 -nonavigation -address 'Nicu D. Cornea: cornea@caip.rutgers.edu' smi2005.tex
The translation was initiated by Nicu Cornea on 2005-07-13
|
This paper appeared in the IEEE International Conference on Shape Modeling and Applications
(SMI05), 15-17 June 2005, Boston, USA.
Download [PDF] here. |