CART: A Brief Description



CART stands for "Classification and Regression Trees", and, as it's name implies, it is a method of categorizing very large sets of data, very quickly. You often hear the term "Data-mining" used these days to describe the process of analyzing data to find relationships between elements, or building a predictive system to explain the data. Most of the data-mining done today involves classification, and therefore can benefit greatly from the use of CART.

CART was introduced in 1984 by Leo Breiman, Jerome Friedman, Richard Olshen, and Charles Stone; all UC Berkeley and Stanford statisticians. Their CART methodology has vastly improved various performance bottlenecks, as well as problems in accuracy. Some of this was achieved by using only binary splitting , and by introducing automatic tree testing. They also came up with a completely different, and improved method for dealing with missing values.

An example of a CART tree
It is probably easiest to start to understand CART by first looking at an example. Take the digit recognition model described in "Classification and regression Trees" by Breiman, Friedman, Olshen and Stone: On electrical clocks, digits are usually shown as a set of lighted lines: three horizontal lines, and four vertical lines. Given this representation, each of the 10 digits 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 can be described by a set of 1's and 0's, each of which correspond to whether a particular line in the display is on or off. As an example, the digit 2 could be represented as the set (1 0 1 1 1 0 1), since line 1 is `on', line 2 is `off', line 3 is `on', etc. Using these sets for each digit, we can create a classification tree by asking yes or no questions at each node, and sending the object left or right depending upon the answer. In this example, a position is picked as the first node (in this case position 5), and if that position is `on' for a particular object, it is sent right, if it is `off', it is sent left.

P(2) = (1 0 1 1 1 0 1) = (p1 p2 p3 . . . p7)

The classification tree that is created is clearly a Binary Search Tree, but there are some very strong similarities to tries. More specifically, the fact that this tree has been pruned. If we had included every single possible permutations of a 7-digit binary string, the tree would have been enormously huge. This reduces the overhead by an incredible amount, and a program need only ask itself simple questions such as "Does this object have an `on' line in position 1" to quickly deduce which character is actually being represented.

Applications of CART are numerous. As stated earlier, the CART methodology can be used for a wide range of applications. CART can by used in many situations where a set of data is given, and you would like to form relationships and/or find a predictive model for the data. It is a technique that can be used to find structure in data. It is especially useful when there is a large amount of data, there are a number of explanatory variables or factors, and it is uncertainty how the variables and their interactions should appear in the model.

CART is also fairly nice to look at. A sort of `flow chart' is created (called a decision tree) with questions at each node. It is very easy for a person to look at this and be able to understand quickly what is going on. For this reason, many models can be represented with CART. Given a set of variables, such as average number of shoppers, and average amount of money spent, a store could create a decision tree that predicts how much money it will make.

Some classification problems that could definitely be solved with CART include sorting profitable from unprofitable, finding fraudulent claims, picking out repeat buyers, profiling high-value customers, and flagging high credit-risk applications. It's quite clear that companies aren't relying on psychologists alone anymore...