(A clientdev document intended for mass-consumption. See also clientdev: Homenode List Generator.)

Introduction:

Ever since reaching level 2, I have been wondering how long it would take for my username to ascend to the top half of the Other Users nodelet. For that matter, it seemed to me that one could easily write a client to poll the Other Users ticker and, given a user's XP, one could use the data to predict the expected user's position on the Other Users list. This sort of problem really appeals to me, as E2 is a large database with a lot of information, and one can coax salient information out of the data mass with minimal effort merely by asking the right questions.

In describing my methodology and by showing you how simple the Everything XML interfaces are, I hope to spur would-be client developers to write E2 client code. My total coding time, including figuring out that the Other Users XML Ticker II exists, was half an hour. If one can get useful data from E2 after half an hour of hacking, imagine what a serious effort could produce! E2 users who know a scripting language (Perl, Python, &tc.), take this to heart and write you own clients. Go unto everything and give them source.


Methodology:

The steps I followed to obtain my results are as follows:

  • Write a client, other-users.pl, to poll the Other Users nodelet every fifteen minutes and save the data to other-users.out.
  • Write a program, stats.pl, to analyze the data and produce statistics.
  • Collect data over a day that indicates the position on the Other Users nodelet, on a scale from 0.0 (lowest on the list) to 1.0 (highest on the list), given the user's XP. For example, in one instance, an XP of 262 had Other Users position 0.333, that is, one-third of the way from the bottom.
  • Run stats.pl on other-users.out and analyze results.

Bear in mind that I know nothing about the internals of the everything2 code base. But, such knowledge is not necessary for client development. Additionally, you will not need to muss with parsing HTML, as the Everything XML interface make it a snap to obtain real-time data from E2.

The Other Users XML Ticker II will spew the current Other Users list as a list of the following XML items:

<user e2god="0" ce="0" edev="1" xp="204" borged="0" >
<e2link node_id="1290327">wick</e2link>

Given the simplicity of the application, XML parsing was not necessary. The script extract sthe XP's using a simple regex. The position on the Other Users nodelet was the same as the position in the XML list, so the position is obtained by dividing the XML list position by the number of entries in the list.

Results:

The Other Users nodelet was polled every fifteen minutes from 5:00 EST 28 August 2002 for twenty-four hours. Although the standard deviations stabilized after a few hours (indicating that further data would provide no greater accuracy), I ran the script for a day.

For all the data collected, I split the output into XP ranges based upon level requirements. (This was purely arbitrary.) More narrow ranges are possible, but the results for more narrow ranges are not much more accurate or informative.

In the following table, mid is the average (expected) Other Users position for that XP range. dev is the standard deviation of the position. low and high are two standard deviations away from mid. In other words, for the given XP range, there is an 86% chance that the Other Users position will be between low and high.

      XP range        low     mid     high      dev
-----------------------------------------------------
  thefez -    50     0.000   0.114   0.269    (0.078)
      50 -   200     0.143   0.275   0.406    (0.066)
     200 -   400     0.233   0.350   0.467    (0.059)
     400 -   800     0.294   0.409   0.523    (0.057)
     800 -  1350     0.359   0.490   0.621    (0.065)
    1350 -  2100     0.446   0.570   0.694    (0.062)
    2100 -  2900     0.515   0.636   0.757    (0.060)
    2900 -  4000     0.575   0.704   0.832    (0.064)
    4000 -  7500     0.681   0.803   0.926    (0.061)
    7500 - 13000     0.809   0.887   0.966    (0.039)
   13000 - 23000     0.885   0.940   0.994    (0.027)
   23000 - 38000     0.943   0.974   1.000    (0.016)
   38000 - nate      0.976   0.994   1.000    (0.009)

As you can see, the position of XP<50 noders is relatively highly variable. This can be attributed to fickleness of newbie usage patterns. Between XP=50 and XP=7500 position variability remains relatively constant, until it drops of sharply at high XPs, where a high position is assured with little variance.

The average XP of someone with position 0.5, halfway up the list, is 1075. (Standard deviation is 0.064.)


Code:

Given the nature of this project, comments were added post hoc.

The following falls under the GNU General Public License.

other-users.pl:

#!/usr/bin/perl

# Hard-code the URL of the Other Users XML Ticker II
$url = 'http://www.everything2.com/index.pl?node_id=1291746';
# Download the XML ticker output
$users = `curl -s '$url'`;
@u = ();

# Push the XPs onto a list
while ($users =~ m/xp="(-?[0-9]+)"/) {
        push @u, $1;
        $users =~ s/xp="(-?[0-9]+)"//;
}

# Output a list of values:
# position    XP
$un = scalar(@u) - 1;
for ($i = $un; $i >= 0; $i--) {
        print (sprintf ("%.3f %d\n", (($un - $i) / $un, $u[$i])));
}

Edit your crontab (using crontab -e) to include the following line. This will poll the Other Users XML Ticker II every fifteen minutes until you disable the line. You will have to edit the directories to match those on your system:

0,15,30,45 * * * *   /usr/bin/perl /home/wick/e2/other-users.pl >> /home/wick/e2/other-users.out

stats.pl:

#!/usr/bin/perl

# Use Statistics::Descriptive module to calculate the standard deviations
use Statistics::Descriptive;

$big = 9999999;
# Put data in buckets according to level XP qualifications
@xp = (-$big, 50, 200, 400, 800, 1350, 2100, 2900,
        4000, 7500, 13000, 23000, 38000, $big);

# Read and chop up data from other-users.out
@us = split(/[\r\n]+/, `sort -n +1 other-users.out`);

$i = 0;
# Depending on which bucket the XP belongs, store
# the position data
for ($j = 0; $j < scalar @us; $j++) {
        $stat[$j] = Statistics::Descriptive::Sparse->new();
        if ($us[$j] =~ m/([0-9\.]+) ([\-0-9]+)/) {
                $i++ while $2 >= $xp[$i+1];
                $stat[$i]->add_data($1);
        }
}

# For each bucket, calculate and output the mean
# and standard deviation of the position values
for ($i = 0; $i < scalar(@xp) - 1; $i++) {
        $m = $stat[$i]->mean();
        $d = $stat[$i]->standard_deviation();
        print sprintf("%d - %d\t%.3f\t%.3f\t%.3f\t(%.3f)\n",
                $xp[$i], $xp[$i+1], $m - 2 * $d, $m, $m + 2 * $d, $d);
}