How to get clean HTML from Microsoft Word

When you paste formatted content from Word into a Web-based HTML editing system tons of Microsoft specific tags come along for the ride. They are in there for round-tripping; so that you can put the content back into Word without it losing its Word-ness.

If your goal is to serve the content online and never take it back into Word, those extra tags need to be eliminated. If you leave them in, your entry will look one way in Internet Explorer (which is surprisingly Microsoft tag-friendly) and much differently in all other browsers. Their class attributes and non-valid tags will defeat many of your site styles, giving your visitors a mouthful of Verdana and Arial when you are expecting them to partake in Georgia or Trebuchet MS.

When Jason added the BloggingSundance Deal Tracker table, it was possessed by MS tags. My efforts to scrub the HTML by hand were going too slowly so I did a quick search for an exorcist. Dean Allen’s Word HTML Cleaner came to the rescue. It is too easy to use: Just upload a 20K-or-less HTML file that you exported from Word and he will expel the tag demons for you.

From: Microsoft Expel – The Unofficial Microsoft Weblog –