The best way to de-bastardize Microsoft HTML (MS-HTML) or any crappy HTML is to use the wonderful open source, W3C approved program HTML Tidy.

Tidy can now perform wonders on HTML saved from Microsoft Word 2000! Word bulks out HTML files with stuff for round-tripping presentation between HTML and Word. If you are more concerned about using HTML on the Web, check out Tidy's "Word-2000" config option! Of course Tidy does a good job on Word'97 files as well!

To use from a command line, just add --word-2000 yes