|
|
|
The idea of using parallel coordinates as a visualization method originated with the work of A. Inselberg . For an overview based on Inselberg's work, see MS thesis, "The Analysis of T48 Low Pressure Turbine Inlet Temperatures Using Parallel Coordinates" by F. S. Bundy. It is very important to keep in mind that "parallel coordinates" are really a generalization of the simple bar graph. In that sense, parallel coordinates represents a collection of data points as y-axis coordinate values arrayed along the x-axis. The name "parallel coordinates" derives from the fact that a specific point in N-dimensional Eucidean space can be represented by N y-axis values arrayed along the x-axis. In order to formalize the mathematical representation, Inselberg introduced two additional representational features:
These two additional features actually have nothing to do with the data, per se. In reality, such features can be used to distort data meaning. It is common in newspapers to see data represented by bar graphs with thicker bars used as an "editorial" feature to lead the reader to focus on a given value. (For example, a bar graph might use space rocket icons of different height and thickness to represent the space program budget of different countries. A tall rocket drawn wider than the adjacent less tall rockets will give a visual emphasis to one data point, namely the value represented by the tall rocket, at the expense of the values of the others.) By the same token, connecting the data points can also be misleading. Consider a low and high value adjacent to each other (on the x-axis). Drawing a straight line between these two points as opposed to connecting them with, say, a hyperbolic curve can create two very different perspectives. Since one of the purposes of the parallel coordinate idea was to search out "patterns", it is important to recognize the role that spacing and interconnection function will have on any perceived pattern. (Of course, certain bar graph representations attach quantitative meaning to the width, as well as the height, of bars; histograms are a case in point.) While the interaxis spacing and method of interpoint connection may be factors that have no direct relation to the data per se, both are necessary to provide a mathematical formalism for the parallel coordinate concept . This formalism can be used to deduce topological and geometric features which might be reflected in visual patterns observed in data sets. A given data set consisting of N data values, represented as a point in N-dimensional Euclidean space, will be represented as a collection of N-1 line segments connecting the N data values, each represented as a y-axis value on one of N equally spaced vertical lines arrayed on the x-axis. Many such data sets (points in Euclidean space) will map to many of these "broken" lines in parallel coordinate representation. Viewed as a whole, these many lines might well exhibit some coherent patterns which could possibly be associated with inherent correlation of the data sets involved. Attempts to view these data sets (and identify correlation) using traditional Euclidean N-space visualization are fraught with difficulty, not the least of which is that practical visualization methodology restricts us to N = 4 (three space and one time coordinate). To deal with N > 4, we need to project views on an N = 3 space and this will result in ambiguous data point identification; this ambiguity is also present for N = 4 if there are two or more data sets being observed. That is, there are views of N = 3 space where one point can mask another.
|
|
In the light of this discussion, one can identify three reasons for considering parallel coordinate representation for data set visualization.
As an example, consider the "air traffic control" problem. Assume there are three airplanes occupying a region of air space. These airplanes (shown as the colored circles) can have trajectories that cross in space; if they cross in space at the same time there will be a collision. The three dimensional space view clearly is ambiguous as to whether a collision will occur. Even rotation of the view cannot absolutely eliminate the ambiguity.
|
|
|
|
Now consider the parallel coordinate representation. The airplanes will be represented with three broken lines connecting the X, Y, Z positions of each airplane. In addition, a fourth parallel coordinate representing time is provided. (We will show only three times associated with the initial time and the times the red airplane appears to cross the trajectories of the other two.) Y-axis values represent some physical position and time measures.
|
|
|
|
The parallel coordinate view shows that at times t = 1 and t = 2 the airplanes are not going to collide; the apparent crossing of the red and green trajectories in the 3 dimensional space view is a graphical artifact. However, the crossing of the red and blue trajectories is indicative of a possible collision because at t = 3 the red and blue lines in the parallel coordinate space show that the two airplanes are very close in physical space. This example illustrates the three aspects of parallel coordinates previously discussed. It is important to note that this particular example has no serious scaling problem; spatial position and time will be scaled similarly for the three data sets (three airplanes). As a result, apparent correlation (here shown by the close proximity of the red and green lines at t = 3) can be interpreted in physical terms. If this scaling similarity was not present, patterns such as the lines proximity at t = 3 would be difficult to interpret. When parallel coordinates are applied to more complex data sets, such as those associated with power plants, the scaling issues present when one attempts to visualize global plant data will obfuscate any meaningful correlation. However, parallel coordinates can be a useful paradigm for power plant data if the data sets themselves are chosen in a way that preserves meaningful scaling (e.g. all the inlet temperatures of a given unit such as a turbine). Based on this sort of reasoning, we expect that the parallel coordinate representation will be a more meaningful paradigm when applied to the data sets associated with power dispatch and distribution. These data are more easily partitioned into sets that have similar scaling for data values. |
|
|
|
|