Near Matches
Ignore Exact
Full Text
Everything
2
Classification and Regression Trees (thing)
See all of Classification and Regression Trees
, no other writeups in this node.
(
thing
)
by
britishcoal
Wed Jun 21 2000 at 15:56:42
CART: A Brief Description
CART stands for "Classification and Regression Trees", and, as it's name implies, it is a method of categorizing very large sets of data, very quickly. You often hear the term "
Data-mining
" used these days to describe the process of analyzing data to find
relationships
between elements, or building a predictive system to explain the data. Most of the data-mining done today involves classification, and therefore can benefit greatly from the use of CART.
CART was introduced in 1984 by Leo Breiman, Jerome Friedman, Richard Olshen, and Charles Stone; all UC
Berkeley
and
Stanford
statistician
s. Their CART methodology has vastly improved various performance
bottleneck
s, as well as problems in accuracy. Some of this was achieved by using only binary splitting , and by introducing automatic tree testing. They also came up with a completely different, and improved method for dealing with missing values.
An example of a CART tree
It is probably easiest to start to understand CART by first looking at an example. Take the digit recognition model described in "Classification and regression Trees" by Breiman, Friedman, Olshen and Stone: On electrical clocks, digits are usually shown as a set of lighted lines: three horizontal lines, and four vertical lines. Given this representation, each of the 10 digits 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 can be described by a set of 1's and 0's, each of which correspond to whether a particular line in the display is on or off. As an example, the digit 2 could be represented as the set (1 0 1 1 1 0 1), since line 1 is `on', line 2 is `off', line 3 is `on', etc. Using these sets for each digit, we can create a classification tree by asking yes or no questions at each node, and sending the object left or right depending upon the answer. In this example, a position is picked as the first node (in this case position 5), and if that position is `on' for a particular object, it is sent right, if it is `off', it is sent left.
P(2) = (1 0 1 1 1 0 1) = (p1 p2 p3 . . . p7)
The classification tree that is created is clearly a
Binary Search Tree
, but there are some very strong similarities to tries. More specifically, the fact that this tree has been pruned. If we had included every single possible
permutation
s of a 7-digit binary string, the tree would have been enormously huge. This reduces the
overhead
by an incredible amount, and a program need only ask itself simple questions such as "Does this object have an `on' line in position 1" to quickly deduce which character is actually being represented.
Applications of CART are numerous. As stated earlier, the CART methodology can be used for a wide range of applications. CART can by used in many situations where a set of data is given, and you would like to form relationships and/or find a predictive model for the data. It is a technique that can be used to find structure in data. It is especially useful when there is a large amount of data, there are a number of explanatory variables or factors, and it is uncertainty how the variables and their interactions should appear in the model.
CART is also fairly nice to look at. A sort of `flow chart' is created (called a decision tree) with questions at each node. It is very easy for a person to look at this and be able to understand quickly what is going on. For this reason, many models can be represented with CART. Given a set of variables, such as average number of shoppers, and average amount of money spent, a store could create a decision tree that predicts how much money it will make.
Some classification problems that could definitely be solved with CART include sorting profitable from unprofitable, finding fraudulent claims, picking out repeat buyers, profiling high-value customers, and flagging high credit-risk applications. It's quite clear that companies aren't relying on
psychologist
s alone anymore...
binary search tree
Put Palm Trees in Your Stories
Shark biscuit
NSA Handbook
BSP Tree
Infinite regress
bottleneck
decision tree
Sign In
Login
Password
Remember me
Lost password
Sign Up
Need help?
accounthelp@everything2.com
Recommended Reading
About Everything2
User Picks
twin halves of one august event
Dang-Blasted False Advertising!
Things required of practical people.
Things required of practical people.
Dang-Blasted False Advertising!
A poem for when I loved him
like fireflies in the dark
like fireflies in the dark
(she's a monster) she ate my heart
Perceived the scene and foretold the rest
the secret of death
Sleepovers at 31
In the event of an emergency, we're all screwed.
If Thy Eye Offend Thee
It's Midnight up in Harlem
Another gray face for your new war
Sleepovers at 31
In the event of an emergency, we're all screwed.
June 15, 2025
There was a time when I was a better person
Lollapalooza
June 15, 2025
Diplomacy
Nemo me impune lacessit
Drink coffee. Smoke cigarettes. Eat fire.
mushroom coffee
This is the concept you will use to make yourself invisible
mushroom coffee
Sumer
Gödel's theorem applied to God
Editor Picks
America
Planetary Linguistics
Hunter S. Thompson on George W. Bush
B
heyoka
Invisible Man
Collateral
Eight Myths of Economic Globalization
R.E.M. ordering breakfast at Denny's at 3 AM
Evil is not the same as Cool
Les Fleurs du Mal
Act of Union
extreme ironing
To anyone contemplating suicide
Scissors that shed their safety
New Writeups
Things required of practical people.
(
how-to
)
by
etouffee
Dang-Blasted False Advertising!
(
fiction
)
by
Chord
the time loop and the lifetime
(
idea
)
by
raincomplex
Sleepovers at 31
(
thing
)
by
ame_9001
In the event of an emergency, we're all screwed.
(
fiction
)
by
gate
June 15, 2025
(
event
)
by
JD
mushroom coffee
(
thing
)
by
wertperch
Silence of ... Forgotten
(
poetry
)
by
EmrysK:?
killer application
(
thing
)
by
wertperch
June 13, 2025
(
log
)
by
wertperch
(
more
)