(Back to my E2 tools index)

YeS! Yet another HUGE slithering Perl horror!

This night (2002-08-20/21), my eyes felt a bit sore. I was unable to read stuff in the day without 150% zoom in Mozilla.

And in the night, someone asked in Slashdot something like "no one was worried of what games Hitler played", in reference to some study about some correlation between violence and games, or lack of thereof. I remembered hearing of Hitler's games at some point, and wished to find the answer from E2. Typed "Adolf Hitler", and boom, I was greeted with pages and pages and pages of stuff about Aatu himself. Not good for my aching eyes!

...and then, in my feeble little brains, an idea formed.

"I have a speech synthetizer here. Let's make the computer read it for me."

Earlier, I had written an autovoter using LWP and XML::Twig. Since an autovoter would be a bad-ass tool in hands of the Irresponsible, I was feeling pretty bad for not coming up with something less controversial. This script took many big chunks of the code from the autovoter, but does something less controversial: It reads the content to the user.

Some bugs... well, it sometimes halts, and [misleading|pipelinks] don't work, and some pronouncations are pretty odd. I need to work on those. But it works.

This script needs the following:

  • Festival and some sound set. Basically, if you say echo "This is loud and clear." | festival --tts and it croacs back something, it's okay. Any other speech synth that understands SABLE 0.2 markup may work, but as far as I know only Festival supports it.
  • Perl 5.6 or later
  • Modules from CPAN, also packaged in Debian:

NOTE: This is not the final version. This works, this does adequeate job on many cases, but it's FAR from perfect.

The script is, as shown, distributed under GPL.



#!/usr/bin/perl
# $Id: everylecture.pl,v 1.3 2002/08/21 23:25:49 wwwwolf Exp wwwwolf $
# ==========================================================================
# Everylecture - read articles from everything2.com.
# Copyright (C) 2002 Urpo Lankinen
#
# This program is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License
# as published by the Free Software Foundation; either version 2
# of the License, or (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA  02111-1307, USA.
# ==========================================================================
# See "perlpod" manual page on how to view the documentation. At easiest,
# use "pod2html everylecture.pl > everylecture.html" to generate document
# in HTML format.

require v5.6.0;

use strict;
use warnings;

use Getopt::Long;
use LWP;
use LWP::UserAgent;
use HTTP::Request;
use HTTP::Cookies;
use XML::Twig;
use HTML::TreeBuilder;
use HTML::FormatText;

use constant VERSION => '1.0.1';
use constant TEMPPREFIX => '/tmp';
use constant DEFSPEECHSPEED => '-10%';

our ($nodename, $cookiefile, $speed, $nospeech, $dbgout, $dbgleave, $sablefile);
my ($cookies, $browser);

print "\nEverylecture (WWWWolf's Node Reader) v", VERSION, "\n";
print "(c) Urpo Lankinen 2002-08-21\n\n";

# Get options
GetOptions( "debugdump=s" => \$dbgout,
	    "cookies=s" => \$cookiefile,
	    "speed=s" => \$speed,
	    "nospeech" => \$nospeech,
	    "leavesable" => \$dbgleave );
die "Usage: $0\n".
  "\t [--debugdump=filename]\n".
  "\t [--cookiefile=filename]\n".
  "\t [--speed=pecentage]\n".
  "\t [--nospeech]\n".
  "\t [--leavesable]\n".
  "\t node [node...]\n"
  if(scalar(@ARGV) < 1);

# Speed is default if it's not said on command line.
$speed = DEFSPEECHSPEED unless $speed;

# Set defaults if not set on the command line.
$cookiefile = $ENV{HOME}."/.mozilla/default/cookies.txt" unless $cookiefile;

print "cookiefile = $cookiefile\n";
die "No cookiefile!\n" unless (-e $cookiefile);
# Inexact version number.
if($cookiefile =~ /mozilla/ && eval("v$HTTP::Cookies::VERSION") le v1.24) {
  warn "NOTE: To use Mozilla cookies, you *may* need a hacked\n".
    "HTTP::Cookies module. See the documentation for details.\n";
}

$cookies = HTTP::Cookies::Netscape->new(
					File => $cookiefile,
					AutoSave => 0,
				       )
  or die "Error in opening the cookie file...\n";

# Make the User-Agent ready to roll.
$browser = new LWP::UserAgent;
$browser->agent("EveryLecture/".VERSION." (LWP ".$LWP::VERSION.")");
$browser->cookie_jar($cookies);
#print $browser->agent(), " ready.\n";

# Open the SABLE output file.

$sablefile = TEMPPREFIX . "/everylecture.$$.sable";
open (SABLE, ">$sablefile")
  or die "Can't open SABLE output file: $!\n";

print SABLE <<"SABLEHEAD";
<?xml version="1.0"?>
<!DOCTYPE SABLE PUBLIC "-//SABLE//DTD SABLE speech mark up//EN" 
      "Sable.v0_2.dtd"
[]>
<SABLE>
SABLEHEAD


# Loop over each node.
NODE: foreach $nodename (@ARGV) {
  print "Reading node $nodename\n";

  # Formulate the URL.
  my $uri = URI->new("http://www.everything2.com/index.pl?node=$nodename");
  my $url = $uri->canonical()->as_string();
  $url =~ s/&/%26/g;
  $url .= "&displaytype=xmltrue&links_noparse=1&no_findings=1";
  print "URL: $url\n";

  # Formulate the request, and then... request the page.
  my $request = new HTTP::Request(GET => $url);

  { $| = 1;
    print "Requesting page... ";
  }
  my $response = $browser->request($request);
  print "done.\n";
  die "Damn! Didn't work!\n" unless($response->is_success);
  my $content = $response->content;

  # See if we really got it in XML.
  unless($content =~ /^<\?xml version=/) {
    print STDERR "Doesn't look like XML to me!\n";
    next NODE;
  }

  # Now we have the node in neat XML form.
  # Append the content to debug file if so requested.
  if($dbgout) {
    open(DEBUGOUT, ">>$dbgout") or die "Couldn't open debug output: $!\n";
    print DEBUGOUT "\n\nWeb content\nRequested from $url\n\n";
    print DEBUGOUT "-" x 70, "\n$content\n", "-" x 70, "\n";
    close DEBUGOUT;
  }

  # Create our XML parser and parse the XML.
  my $xmltree = new XML::Twig;
  $xmltree->parse($content, ProtocolEncoding => 'ISO-8859-1');
  my $node = $xmltree->root();
  my $mynodeid = $node->att('node_id');
  if(!$mynodeid) {
    print STDERR "Hmm, couldn't parse the nodeid out of this mess.\n";
    next NODE;
  }

  my @wus = $node->children('writeup');
  my $node_title = $node->first_child('title')->text();
  my $wu_count = scalar(@wus);

print SABLE <<"SABLEINFO";

<RATE SPEED="-30%">
This is a lecture done with EveryLecture. The information is taken from
Everything<BREAK LEVEL="small" />2 website.
The article is titled ${node_title}.
This article has ${wu_count} parts.
</RATE>
SABLEINFO

  my $wu;
  foreach $wu (@wus) {
    my ($title, $type, $author, $node_id, $text);
    my ($htmltree, $formatter, $formatted);

    $type = $wu->first_child("writeuptype")->text();

    $title = $wu->first_child('title')->text();
    $node_id = $wu->att('node_id');
    $author = $wu->first_child("author")->text();

    $text = $wu->first_child('doctext')->text();

    # Remove links.
    $text =~ s/\[//g;
    $text =~ s/\]//g;

    # Add stuff.
    $text = "<html>\n<head><title></title></head>\n<body>\n".
      $text . "\n</body>\n</html>\n";

    # Parse HTML.
    $htmltree = HTML::TreeBuilder->new->parse($text);
    # ...and format it.
    $formatter = HTML::FormatText->new(leftmargin => 4,
				       rightmargin => 65);
    $formatted = $formatter->format($htmltree);

print SABLE <<"SABLENODE";

<BREAK LEVEL="large" />
<RATE SPEED="-20%">
Writeup<BREAK LEVEL="small" /><RATE SPEED="-40%">${title}</RATE>,
of type<BREAK LEVEL="small" />${type}.
Written by <RATE SPEED="-20%">${author}</RATE><BREAK />
(that is, <RATE SPEED="+20%"><SAYAS MODE="literal">${author}</SAYAS></RATE>).
Node <SAYAS MODE="literal">ID</SAYAS>
<SAYAS MODE="literal">${node_id}</SAYAS>.
<BREAK LEVEL="large" />
<PITCH BASE="high">Beginning of a writeup.</PITCH>
<BREAK LEVEL="large" />
<RATE SPEED="${speed}">
${formatted}
</RATE>
</RATE>
<BREAK LEVEL="large" />
<PITCH BASE="high">End of a writeup.</PITCH>
<BREAK MSEC="2000" />

SABLENODE

  }
}

# Print the end of the SABLE file.
print SABLE <<"SABLEFOOT";

<BREAK MSEC="1000" />
This affordable entertainment was brought to you by
The Everything Development Company, through the website
<SAYAS MODE="net" MODETYPE="url">http://www.everything2.com/</SAYAS>.

</SABLE>
SABLEFOOT

# We're done with the SABLE file.
close SABLE;

print "Done generating speech content.\n";

# Do the speech thing.
unless($nospeech) {
  print "Starting text-to-speech system.\n";
  system("festival --tts $sablefile");
  print "Done with TTS.\n";
}

# Remove the tempfile.
unless($dbgleave) {
  unlink $sablefile or die "Couldn't remove $sablefile: $!\n";
}

print "Finished.\n";

__END__
######################################################################
=pod

=head1 NAME

everylecture - Reader of content in Everything2.

=head1 SYNOPSIS

everylecture [options] nodetitle [nodetitle...]

=head1 DESCRIPTION

Everylecture fetches a content of article ("node") from
Everything2.com, a Web encyclopedia/community/Thing, and then converts
this article to a readable format and feeds it to Festival text-to-speech
system, which in turn (predictably) converts the text to speech.

=head1 OPTIONS

=head2 Normal options

=over 4

=item --cookiefile=I<filename>

The location of the Netscape/Mozilla cookie file from which the
everything2.com user cookie can be found from.

=item --speed=I<speed>

Speed of the bulk of the text, given in SABLE speed format.
Default is "-10%" (since I am not a native speaker and need things to be
told to me slowwwly and clearrrly andnottoofastlikefestivalnormallydoes.)

=back

=head2 Debug options

=over 4

=item --debugdump=I<filename>

The information actually retrieved from everything2.com - the node
content in XML - is appended to this file.

=item --leavesable

Do not delete the SABLE markup file generated for feeding to Text-to-Speech.
Will be left as F</tmp/everylecture.*.sable>, where * is the Process ID.

=item --nospeech

Do not actually start the Text-to-Speech program. (Handy with B<--leavesable>)


=back

=head1 FILES

The program uses no configuration files. If told to leave the SABLE
files, they are saved as F</tmp/everylecture.*.sable>.

=head1 NOTES

Everylecture was written in Perl, and requires Perl v5.6.0 or later
(it may work with earlier 5.x versions with some changes).

In Debian GNU/Linux, this program requires libwww-perl,
libxml-twig-perl, libhtml-format-perl, and, of course, every other
module and program and package that is needed for proper operation of
Perl and Festival. Other platforms need to consult their own packages
and/or CPAN.

If you use Mozilla, you may need to "hack" your HTTP::Cookies module.
Find HTTP/Cookies.pm file, and find the file that mentions C<# Netscape
HTTP Cookie File>. Change this to C<# (Netscape )?HTTP Cookie File>
- one space after Netscape, no space between )? and HTTP. Or, bug the
author of that module.

=head1 DISTRIBUTION LICENSE

Everylecture is distributed under General Public License version 2 or later.
You should have received the license with the Perl distribution, if not,
see I<http://www.gnu.org/licenses/gpl.html>. No warranty expressed or implied.

=head1 BUGS

Sometimes finishes sort of early. Odd.

A better HTML-to-SABLE conversion should be done. At the moment,
it completely ignores E<lt>ACRONYME<gt> and E<lt>ABBREVE<gt> and such,
and also sometimes does silly things.

=head1 HISTORY

Initial idea came up in 2002-08-21. I did not want to read about Adolf
Hitler with my sore eyes, so I made the computer to give me a detailed
lecture of this disrespectable figure, based on the information on the
encyclopedic website that I visit frequently and had a long bit on it.

The script itself is largely based on my Everything2 autovoter script,
with some additions. With great power came great responsibility, and I
finally found some application for this code that is
Less Disastrous In Wrong Hands.

=head1 AUTHOR

Weyfour WWWWolf (aka Urpo Lankinen)

=over 4

=item E-mail

E<lt>wwwwolf@iki.fiE<gt>

=item Home page

E<lt>http://www.iki.fi/wwwwolf/E<gt>

=back

=head1 SEE ALSO

festival(1)

SABLE home page at I<http://www.cstr.ed.ac.uk/projects/sable/>.


=cut