Demonstrates the use of the Renderer class that performs a simple text rendering of HTML markup, similar to the way Mozilla Thunderbird and other email clients provide an automatic conversion of HTML content to text in their alternative MIME encoding of emails.
At that time I evaluated 6 other parsers, none of which were capable of achieving my aims.
Most couldn't reproduce a typical HTML document without change, none could reproduce a source document containing badly formatted or non-HTML components without change, and none provided a means to track the positions of nodes in the source text.
Jericho HTML Parser is a java library allowing analysis and manipulation of parts of an HTML document, including server-side tags, while reproducing verbatim any unrecognised or invalid HTML.
It also provides high-level HTML form manipulation functions.
The javadocs provide comprehensive documentation of the entire API, as well as being a very useful reference on aspects of HTML and XML in general.