A HTML processing Perl module by Gisle Aas and (today) Sean Burke.

It allows you to manipulate tree structures representing HTML documents.

For parsing it relies on HTML::Parser; for individual node manipulation, on HTML::Element. Therefore, the resulting trees are typically the result of ad hoc HTML parsing, rather than a strict or even XML-based approach - other modules exist to do that.

It's worth noting that heuristics to deal with 'real-world' HTML are present in all of the modules mentioned here. There is no separation between generic tree handling functionality and features that are specifically useful in the context of HTML processing.

HTML::Treebuilder has an unusual class relationship with HTML::Element, which makes subclassing difficult. Sean adds new features, improves the API and the implementation, so it's worthwhile to keep up to date with developments on the LWP mailing list if you use this software.

Log in or register to write something here or to contact authors.