XML::Twig is a nice Perl module for processing XML files, written by Michel Rodriguez. It is fast even with large files, and doesn't use much CPU or memory. From programmer's viewpoint, the XML file looks like a tree. It's easy to say "Find first branch named 'foo', give me its children called 'bar'..."

Home page: http://www.xmltwig.com/
Also available as a Debian package (libxml-twig-perl).

Here's a small example on how to use it to navigate an XML tree. The tree was taken using Everything2's xmltrue displaytype. (http://www.everything2.com/?node=nodetitle&displaytype=xmltrue...)


use strict;
use warnings;

use XML::Twig;

my $twig = new XML::Twig;

# Parse the XML from a file.
$twig->parsefile("node.xml", ProtocolEncoding => 'ISO-8859-1');

# <node> is the root.
my $node = $twig->root();

# Print node title and ID. (Find child element <title>, and <node>'s parameter
# 'node_id'.
print "Node title: ", $node->first_child('title')->text(), "\n";
print "Node id: ", $node->att('node_id'), "\n";

# Get all <node>'s child <writeup> tags.
my @wus = $node->children('writeup');

print "Node has ", scalar(@wus), " writeups.\n";

# Iterate over each writeup.
my $wu;
foreach $wu (@wus) {
  my ($title, $type, $author, $id, $rep, $vote);

  # Here we get writeup's information...
  $title = $wu->first_child('title')->text();
  $id = $wu->att('node_id');
  $type = $wu->first_child("writeuptype")->text();
  $author = $wu->first_child("author")->text();

  $rep = $wu->first_child("reputation");
  if(defined $rep) {
    $vote = "<voted: " . $rep->att('cast') . ">";
  } else {
    $vote = "<unvoted>";

  # Print out writeup's information.
  print "$title [$id] [$type] by $author $vote\n";