Being the statistics nut that I am, I was curious about the relative contributions of e2 users. I mean, on one hand you have a ton of users who contributed one daylog and then bolted for the door. On the other, you have pingouin's 1883 writeups, Segnbora-t's 1715, and the Jargon File's 2254 contributions. Not to mention that one guy with all the words.

So, being somewhat learned in the ways of web programming, I conjured up a PHP script to surf e2 for users and record the number of writeups they have. I limited my search to only users who have contributed at least one writeup.

So far, I've pulled out 5,340 users and over 350,000 writeups (that includes Webby and user aliases) for my data - roughly 77% of all total writeups. Throwing out the esteemed Mr. Webster as a major outlier, here are some general statistics for the data:

Users Found: 5348
Total Writeups: 252295
Mean: 47.184402468674
Standard Deviation: 19.657106235794

That is to say, the average user contributes 47 writeups here - not too shabby. But with a deviation of 19, our number's not nearly as promising. That means that about 65% of our users have somewhere between 28 and 66 writeups, and about 95% have between 8 and 85 writeups. And since you can't have less than 1 writeup in the data, the third deviation (covering 98% of all users) is significantly skewed to the right.

Of course, all of this data makes sense. We have a lot of users past and present who contributed 10-50 writeups and then generally cooled off, and post maybe one writeup a month - if they're stilla round. And so users like Pseudo_intellectual, who posted 1600+ writeups, make up for a lot of the data. Disproportionately so, it would seem.

And now it's time for a break down.

So, bearing in mind my original experiment, to see who contributes what to e2, here is a breakdown into percentiles the number of writeups contributed. (I threw out 8 users with only 1 writeup to get a number of users divisible by 20. Their addition to the data would be negligible.)

Total and Cumulative Data, 5% increments (n= 267)

 TOTAL    PCT     CUM.    PCT
135572  53.74   135572  53.74 (Top 5%)
 43733  17.33   179305  71.07
 25458  10.09   204763  81.16
 16330   6.47   221093  87.63 
 10480   4.15   231573  91.79 (Top 25%)
  6493   2.57   238066  94.36
  3933   1.56   241999  95.92
  2517   1.00   244516  96.92
  1781   0.71   246297  97.62
  1284   0.51   247581  98.13 (Top 50%)
   997   0.41   248578  98.53
   804   0.32   249382  98.85
   577   0.23   249959  99.07
   527   0.21   250486  99.28
   472   0.19   250958  99.47 (Top 75%)
   267   0.11   251225  99.58 (1 writeup per user)
   267   0.11   251492  99.68         |
   267   0.11   251760  99.79         |
   267   0.11   252027  99.89         |
   260   0.10   252287    100         v

Looking at this data (which is more than statistically significant and well above minimum sampling requirements) we see that the top 5% of users contribute over 50% of the total nodegel. We also see that Pareto's Law holds more than true, with the top 20% of users contributing 87.63% of the writeups.

It's important to note here that we're talking about out of 5,340 users, not the 78,000 listed in Everything Statistics. 20% of that is 1,068 users. The lowest person on that list has 49 writeups. So if you've hit 50 or more, congratulations - you're a major E2 contributor! (If you've reached 123 writeups, you've breached the top 10% plateau.)

As I accumulate more data, I notice two trends. One is that both the mean and the standard deviation are moving downwards - it seems I've culled the majority of users with more than 50 writeups, though there are still a few floating around. The second trend is that the number of users with 1 writeup is astounding - almost the point of ludicrosity. I wonder why so many people just stuck around for a cup of coffee and then exited forever ...

To emphasize this, I remembered Benford's Law: that in large unbounded distributions, the distribution of the first digit of the data is logarithmic, and thus more of the data's first digits will be a 1. The actual formula for the probability of a certain digit appearing is

log_10( 1+ (1/d) )

where d is the digit in question. Anyway, here is a comparison of expected to actual outcomes:

    1   0.3010 0.4042      1.343
    2   0.1761 0.1923      1.092
    3   0.1249 0.1139      0.912
    4   0.0969 0.0814      0.840
    5   0.0792 0.0561      0.708
    6   0.0669 0.0514      0.768
    7   0.0580 0.0368      0.634
    8   0.0512 0.0335      0.654
    9   0.0458 0.0305      0.666

Notice that the numbers on the right in theory should all be 1s - that is, if we had a relatively uniform distribution. But we have so many users with just 1 writeup that it skews it entirely. Over half (56%) of the users whose number of writeups begin with a 1 have, in fact, just 1 writeup. Taking them out brings the numbers much closer to Benford's ideal.

Most of this is just data for the curious. It doesn't address issues of quality, reputation, how long a user has been with e2, or any of the other data that might prove more compelling or interesting. Maybe some day I'll go get some of that data, but for now, you can check these numbers out and maybe come up with an interesting conclusion of your own. Ciao!