10/17/2003 Removed a semicolon that had crept into this copy, fixed the useragent string, and added the version to the useragent string.
12/3/2002 Revised to use displaytype=xmltrue, and now allows login again.
7/23/2001 Ok, I fixed the login bit. As long as you DON'T use the login feature, backup will work fine. This also allows you to backup another user's writeups. Furthermore, I've added a delay (I'd love to use LWP::RobotUA, but given that all robots are disallowed, it's suicide.)
7/17/2001 Apparently, the login is broken again, but so is XML display for ones' own writeups. I'll try to figure something out.
3/16/2001 Updated to work with the new login.
Ever want to have the text of your writeups handy? Want to revise a bunch of similar nodes? Worried since you're not big enough to get into Node Heaven yet? Worry no longer! This Perl program will back up your writeups to a autonodeable single file containing all writeups, or indivdual files, each of which contain exactly what you entered. It can also dump in XML format.
Like Cow of Doom's E2 node tracker (which, coincidentally, it was obviously based upon -- gotta love the GPL), it requires the LWP and HTTP::Cookies modules in libwww-perl, which you can get very easily from CPAN, but it also requires GetOpt::Long (which should be part of the standard distribution) and the CGI module (for one function to fix the re-translation to entities in the XML displaytype).
Basic usage is simple -- run it followed by your username and password (quoted if you have spaces in them), and it will save all your writeups to individually-named files. But, there are a few options to modify its behavior:
usage: $0 [--spaced-names | --nospaced-names ] [--dump-to=dirname] [--xml]
[--regexp=expr] [--single=name] [--login=username,pass] <username>
<username> should be quoted if they have spaces in them
(single for UNIX, double for DOS/Windows).
--spaced-names and -nospaced-names control whether or not spaces will appear
in the filenames. The default is --no-spaces.
--dump-to will place the files in dirname.
--regexp will only dump nodes fitting the regular expression expr. For those
not familiar with regular expressions, one way to use it is to wrap a
case-sensitive substring for titles you want in quotation marks.
--single will output everything to the named filename, rather than separate
filenames. The format is compatible with kaatanut's autonoder.
--login will log you in (the username and password might need quoting) This currently breaks things when backing up your own nodes due to a bug in E2's XML display.
--xml will output in raw XML format, based on E2's exact XML output, which
escapes characters.
I suggest you cut and paste the following to a file, presumably e2back.pl. My preferred method is to view source and run the code through the E2 Source code formatter in deformat mode.
#!/usr/bin/perl -w
# e2back.pl - gathers user nodes from everything2.com.
# Portions Copyright (C) 2001,2002,2003 Arthur Shipkowski aka "sleeping wolf"
# <Art_Kowolf at yahoo.com>
# Portions Copyright (C) 2000,2001 Will Woods <wwoods@cowofdoom.com>
# Distributed under the terms of the GNU General Public License,
# included here by reference.
#
# send comments, questions, and stories to the top address above, or just
# /msg me.
#
# To-Do:
# * Perhaps a switch to just route it all to stdout would be nice, rather than having
# to figure out the appropriate filename.
# * Figure out if my XML is Everything Core compatible.
#
# history:
# v1.0.0: (initial release)
# v1.0.1: Added single non-XML and XML file output.
# Made the handling of command-line directory setup less braindead.
# Replaced the 'delete chars from filename' approach with the
# 'substitute chars into filename' approach.
# v1.0.2: Updated the login routine from Cow of Doom's latest since it stopped
# working.
# v1.0.3: Added usersearch option to save someone else's WUs instead of your
# own.
# v1.0.4: Fixed login (again) and added sleeps, since the robots.txt or whatnot
# prohibits me from using the robot agent.
# v1.1.0: Changeover to use displaytype=xmltrue.
# v1.1.1: Bugfix to getXMLwu and the LWP->UA instantiation
# v1.1.2: Formatted to fit in 80 columns per request.
$0="$0"; # Perl magic to clean the commandline from the process list
my $version = "1.1.2";
use LWP::UserAgent; # these are both part of libwww-perl, available
use HTTP::Cookies; # at your friendly local CPAN mirror
# The CGI package should be available at your friendly local CPAN mirror, too.
use CGI qw/unescapeHTML/;
use Getopt::Long; # This should be part of the standard distribution.
use File::Spec::Functions; # Should also be part of the standard distribution
my $spaces_ok = 0, $XML_mode = 0, $dump_to_dir = curdir(), $regexp = "",
$singlefileMode = "", $loginpass = "";
GetOptions('spaced-names!' => \$spaces_ok, 'dump-to=s' => \$dump_to_dir,
'regexp=s' => \$regexp, 'single=s' => \$singlefileMode, 'xml' => \$XML_mode,
'login=s' => \$loginpass);
$baseurl="http://www.everything2.com/index.pl";
$|=1;
my $ua = LWP::UserAgent->new(agent => "e2backup/$version");
$ua->env_proxy();
$cookies = HTTP::Cookies->new();
$ua->cookie_jar($cookies);
if ($#ARGV < 0) {
print "
usage: $0 [--spaced-names | --nospaced-names ] [--dump-to=dirname]
[--xml] [--regexp=expr] [--single=name] [--login=username,pass] <username>
<username> should be quoted if they have spaces in them
(single for UNIX, double for DOS/Windows).
--spaced-names and -nospaced-names control whether or not spaces will appear
in the filenames. The default is --no-spaces.
--dump-to will place the files in dirname.
--regexp will only dump nodes fitting the regular expression expr. For those
not familiar with regular expressions, one way to use it is to wrap a
case-sensitive substring for titles you want in quotation marks.
--single will output everything to the named filename, rather than separate
filenames. The format is compatible with kaatanut's autonoder.
--login will log you in (the username and password might need quoting)
--xml will output in raw XML format, based on E2's exact XML output, which
escapes characters.\n";
exit(1);}
#
# Overwrite the old file if in single-file-mode
#
if ($singlefileMode) {
$fullnodeoutputfilename = catfile($dump_to_dir, $singlefileMode);
open(NODEFILE, ">$fullnodeoutputfilename");
close(NODEFILE);
}
$usersearch = $ARGV[0];
if ($loginpass)
{
($login, $pass) = split(/,/, $loginpass);
$username = $ARGV[0];
$usersearch = $username unless $usersearch;
print "Logging in...";
login($login, $pass) or die "failed";
print "ok.\n";
sleep(10);
}
# get the User Search XML page, and array-ify it
print "Doing user search...";
@data = split(/\n/,&getUserSearchXMLTicker) or die "failed";
print "ok.\n";
sleep(10);
# Read the info out of the User Search page.
foreach (@data) { # loop over each line in the page
if (/^<writeup/g) { # if this line is about a writeup..
while (/ (\w+)=\"(.*?)\"/gc) { $n{$1}=$2; } # get node info
($name, $type) = />(.*) \(([a-z]+)\)<\/writeup>/gc;
next unless (($regexp eq "") or ($name =~ m{$regexp}));
print "\rCurrent node title: ", substr($name.' 'x59,0,59);
$nodecontent = &getXMLwu($n{node_id});
sleep(10);
unless ($XML_mode) {
if ( $nodecontent =~ m{<doctext>(\C*)</doctext>}is ) {
if ($singlefileMode)
{ singlefileDump($dump_to_dir, $singlefileMode,
"$name\n$type\n" . &unescapeHTML($1) . "\n----\n"); }
else
{ multifileDump($dump_to_dir, $name, &unescapeHTML($1)); }
}
else
{ print "Warning! Unable to get content for $name!\n"; }
}
else
{
if ($singlefileMode)
{
singlefileDump($dump_to_dir, $singlefileMode, $nodecontent );
}
else
{ multifileDump($dump_to_dir, $name, $nodecontent ); }
}
}
}
#----- end of main program ------------------------------
#----- subroutines --------------------------------------
sub getnode {
# takes one argument: $node_id
# assumes that $ua is a valid HTTP::UserAgent object
# returns the contents of the page in a scalar variable
# example: $page = getnode($node_id);
my $req = HTTP::Request->new('GET', "$baseurl?node_id=$_[0]");
return($ua->request($req)->content());
}
sub getUserSearchXMLTicker {
# 762826 = User Search XML Ticker
my $req = HTTP::Request->new('GET', "$baseurl?node_id=762826&usersearch=$usersearch");
return ($ua->request($req)->content());
}
sub getXMLwu {
# takes one argument: $node_id
# assumes that $ua is a valid HTTP::UserAgent object
# returns the contents of the XML writeup page in a scalar variable
# example: $page = getnode($node_id);
my $req =
HTTP::Request->new('GET',
"$baseurl?node_id=$_[0]&displaytype=xmltrue");
return($ua->request($req)->content());
}
sub login {
# takes two arguments: $username, $password
# assumes that $ua is a valid HTTP::UserAgent object
# returns true on success, false on failure
# example: login($username, $password) or die "failed";
my $req = HTTP::Request->new('POST', "$baseurl");
$req->content_type('application/x-www-form-urlencoded');
$req->content("op=login&user=$_[0]&passwd=$_[1]&displaytype=null");
my $response = $ua->request($req);
return($cookies->as_string() ne "");
}
sub multifileDump {
# Takes three arguments, $dump_to_dir, $nodetitle, and $nodecontent
# Dumps $nodecontent to a file named based on $nodetitle.
# Example: Example: multifileDump("/home/sleepingwolf", "Fred the Node",
# $nodecontent);
my $dump_to_dir = $_[0];
my $nodetitle = $_[1];
my $nodecontent = $_[2];
$nodefilename = $nodetitle;
# Need to escape /:\?'*"<>;&! and \0
$nodefilename =~ s,\/,(slash),g;
$nodefilename =~ s,\:,(colon),g;
$nodefilename =~ s,\\,(backslash),g;
$nodefilename =~ s,\?,(questionmark),g;
$nodefilename =~ s,\',(singlequot),g;
$nodefilename =~ s,\*,(asterix),g;
$nodefilename =~ s,\",(doublequot),g;
$nodefilename =~ s,\<,(lessthan),g;
$nodefilename =~ s,\>,(greaterthan),g;
$nodefilename =~ s,\;,(semicolon),g;
$nodefilename =~ s,\&,(ampersand),g;
$nodefilename =~ s,\!,(bang),g;
$nodefilename =~ s,\0,(null),g;
if ($nodefilename eq ".")
{ $nodefilename = "(dot)" }
elsif ($nodefilename eq "..")
{ $nodefilename = "(dot)(dot)" }
$nodefilename =~ s/ /(space)/g unless ($spaces_ok);
$fullnodefilename = catfile($dump_to_dir, $nodefilename);
open(NODEFILE,">$fullnodefilename") or die
"Couldn't open $fullnodefilename: $!";
print NODEFILE $nodecontent;
close(NODEFILE);
}
sub singlefileDump {
# Takes three arguments: $dump_to_dir, $backupfilename, and $nodecontent
# Dumps it all to one file, using append mode.
# Example: singlefileDump("/home/sleepingwolf", "nodebackups", "Fred the Node"
# . "\n" . $nodecontent);
my $dump_to_dir = $_[0];
my $backupfilename = $_[1];
my $nodecontent = $_[2];
$fullnodeoutputfilename = catfile($dump_to_dir, $backupfilename);
open(NODEFILE, ">>$fullnodeoutputfilename") or
die "Couldn't open $fullnodeoutputfilename: $!";
print NODEFILE $nodecontent;
close(NODEFILE);
}
|