Bill James is a statistician, a baseball fan, and a writer. Those three qualities are in no particular order, and the fact that he wears multiple hats should not imply that he does not give his full attention to each of them. In fact, perhaps the crowning achievement of his career is in validating that it is quite possible to be a statistician, and a baseball fan, and a writer.

Prior to James' first Baseball Abstract in 1977 — the first of an annual series and his best-known body of work — few attempted James' trinity, and certainly none had tried since the 1950s. What we now know as "basic" baseball statistics had been long ago invented, figures such as batting average, home runs, caught stealings. Since these were already out for public consumption, it was assumed that there was no reason for a statistician to seriously undertake baseball. Moreover, attempting to couple that in a book ... who would want to read a scholarly book about baseball numbers? Baseball was a leisure sport; the only way people would want to hear the term "deviation" would be in describing Dick Allen's behavior, right?

None of those assumptions were true. Bill James disproved them, as did fellow baseball statistician Pete Palmer, and as did the Society for American Baseball Research. But James stands head and shoulders above the others. Not only were his formulae robust, but his writing was clear, concise and fun to read. He brought statistics down from the Mount and to the masses.

His most famous formula, and the one from which all his other hitting formulae spring from, was Runs Created:

                A  x  B
         RC =  ----------

A = The get-on-base factor
B = The advancement factor
C = The context, or the opportunity, factor
RC = Runs Created

In the simple version of the formula, A = (hits + walks), B = (total bases) and C = (At-Bats + Walks). These statistics have been readily available since the era of Babe Ruth. However, until James came around, it was assumed that they were only relevant in and of themselves — that is, people knew that the four total bases from a home run were good, but they didn't know exactly how good they were. It was assumed to be an unsolvable mystery.

But James noticed something funny. Take a team's season-ending totals — like, say, the 1984 Baltimore Orioles — and run it through the formula above.

A = (620 + 1374) = 1994
B = 2134
C = (5456 + 620) = 6076

So, RC = (1994 x 2134)/6076 = 700, rounded to the nearest ones digit.

The Baltimore Orioles scored 681 runs in the 1984 season. Coincidence? No way.

Here's what's happening: Baseball teams score runs by either getting on base, or by driving in people who are already on base. (Or by hitting home runs, which does both.) So runs should be dependent on the product of the on-base factor (A) and the advancement factor (B), right? C, then, is the context of how many plate appearances those on-bases and those advancements happened in.

The formula works marvelously. Refine it by adding things like the advancement of a stolen base and the negative on-base repurcussion of hitting into a double play, and the Runs Created for a team approaches the actual runs output. It's amazing that so simple a formula can work so well — it would be as if one could look at NFL stats, multiply passing yards with rushing yards and divide it by the number of plays, and then suddenly come up with touchdowns.

The avenues opened up by James were tremendous. For instance — now we know how many runs one particular player creates. In 1921, Babe Ruth had an amazing year; in 2001, Barry Bonds had an amazing year. Who created more runs for his team? Thanks to James, we can answer that. Who was the more valuable player? Well ... that gets trickier; it depends on things like the relative value of a run in 1921 and 2001, as well as other factors ... but with enough work, we can give an answer. It may not be the definitive answer, but it's pretty damned good.

James saw the power and importance of his work. As can be expected, the rest of the country took some time.

* * *
Some people think Bill James is God. They overstate, but not by much.
Vanity Fair
* * *

James couldn't find anyone to publish his Baseball Abstract at first; he self-published them from 1977-1981. Finally, he convinced Ballantine Books to take over in 1982; to say the least, it was a gamble for the publisher. There wasn't any hard evidence that his books would succeed on a national scale.

Thankfully, it did. And James followed up with new Abstracts through the decade, and sales got better and better. Part of the reason would have to be James' writing style — he could spell out a formula in a clear and consise way, much better than I did above (I would have cut-and-pasted, if such writeups did not carry a death wish). He was funny — in the Historical Baseball Abstract, a voluminous book that applied his methods to all of baseball history, instead of just the past year, he convinced his wife to pick the cutest players from each decade. He was readable.

Though sales were strong, there was a backlash; like a bastardization of the plot of "Revenge of the Nerds." According to the stereotype, the statisticians were Seamheads and RotoGeeks and pencil-necked doofuses who may prove a unifying formula between pop flies and foul balls but couldn't hit a curveball to save their lives. An even more damaging allegation was that while their formulae were correct to a degree, the absence of "intangibles" left them woefully incomplete and thus only worthy for the rubbish-bin. As in, because we don't have a good way to know how many baserunners Roberto Clemente prevented from tagging up, that we should trash all statistics developed since 1960. This type of argument sounds ludicrous, but it's often made — many in the traditional media would suggest that Rickey Henderson's greatness stems from how his base-stealing ability unmeasureably "unnerved" opposing pitchers. Actually, Henderson's greatness stems from the fact that he had (as of 2001) 3,000 career hits and a major-league record 2,141 walks. That gives him a simple A factor of 5,141, one of the highest in baseball history.

But the world came to learn. Others followed in his footsteps — such "post-modern" statistics such as Total Average and Batting Runs would be inconceivable without James. A generation of high-school students (myself included) found an immediate use for algebra. Baseball teams eventually came around, as some teams (though they don't admit it) have come to understand that the most valuable players are those who can get on base and drive other runners around the bases. In the late 90s, the Oakland Athletics decreed that none of their minor-leaguers would be honored as organizational players of the month unless they either walked 10 percent of the time, or unless their on-base percentage was .400. lists OPS as a basic statistic on their baseball player pages — OPS stands for (On-Base%) Plus (Slugging%), which is simply a variant of Runs Created, adding A to B instead of multiplying them.

Then, after the 2002 season, the Boston Red Sox hired James as a senior adviser. The Red Sox' new general manager, Theo Epstein, wanted James to conduct statistical analysis of players and managerial tactics. Epstein, just 28 years old, does not represent the "old guard" of Major League Baseball, but the hiring does show that Bill James is finally gaining a measure of acceptance. (Unfortunately, James' first suggestion — a bullpen-by-committee — blew up after all the Red Sox relievers stunk in April of 2003.)

Bill James

Important works:
The Baseball Abstract, 1977-1988
The Bill James Historical Abstract, 1986
The Politics of Glory (later titled, "Whatever Happened to the Hall of Fame?"), 1994

Other works:
The Baseball Book, 1990-1992
The Bill James Player Ratings Book, 1993-1994
This Time Let's Not Eat the Bones, 1989
Bill James Guide to Baseball Managers, 1997

Updated 5/31/2003: Added information regarding James' employment by the Red Sox. (Thanks to keops for this.)