A method frequently used in

statistical learning to get, as the name states, the principal

variation components of the data

distribution.

Let's say we have a

set of

*n* vectors belonging to the three-dimensional

space. They are just points in the space. If you use the PCA, you will get:

- the average elements of the distribution (the points with the smallest average distance from the other points, I guess).
- a set of
*n* orthogonal, three-dimensional vectors, called **principal components**, that represent the directions in which the points are distributed.

The set is ordered depending on the variation of the points

projections on each p.c., from the biggest to the smallest. In this way we can get a

dimensionality reduction of your set of points, taking in account only the first

*m* points and discarding the other.

By construction, the points you chose are the most

significant, and if you don't discard to many of the less significant p.c.s, you can get an accurate (at least from a statistical point of view) decription of the starting set.

Note that, even if the

*n* p.c.s are geometrically orthogonal, that doesn't mean that they are statistically

independent: actually, given that you compute also an average point, the meaningful p.c.s are only the first

*n-1*.