Microsoft Word has its place, but that place isn't the web. If you've ever tried to go from a Word document to an HTML document you know that Word's tools are a disaster -- bloated files, proprietary markup and exposed personal information are among the gems you'll get with the Convert to HTML function. So if Word isn't up to the job, then how to go about turning a .doc file into a web page? The answer depends how many .doc files you have to convert. If you've got a client who just needs a few .doc files turned into web pages, there are a number of way to go about it. If you have a lot of files or very large, complex files to deal with, consider one the batch processing options listed below. For the simpler case of converting just a few documents, read on. ==Working With Word== To get so a semi-sane starting point, try using Word's "Save As: Web Page, Filtered" rather than the regular webpage option. This will strip out many of the proprietary tags and won't include potentially personal and revealing info contained in the File Properties dialog (the regular HTML converter in Word appends anything in the File Properties dialog to the top of the HTML code). It's a start, but your HTML can be made even better with some outside tools. ===Textism=== When you use Word's built in Convert to HTML tool you'll get an HTML file, but the problem is that it will include enough markup so that Microsoft Word can still understand it as a native file. That means your code will be full of proprietary HTML tags and tons of unnecessary markup (from a show-it-on-the-web point of view). But fear not, you're not the first person to encounter this mess of so-called HTML. The good folks over at [http://textism.com/wordcleaner/ Textism] have a tool that will "strip Microsoft’s proprietary tags and other superfluous noise from Word-generated HTML documents." The results are not only much closer to standards compliant markup, they also create much much smaller pages. Keep in mind of course that Textism is not intended to convert massive, complexly styled documents. In such cases you're probably going to have to resort to at least some hand coding. But if you do most of your writing in Word and you just want a way to generate a nice slim web page from your documents, Textism fits the bill. It even does a nice job of handling typographer’s quotes, dashes, and other non-ascii characters, which are converted into their respective HTML entities. ===HTMLTidy=== Another way to process the HTML that word generates is to use [http://tidy.sourceforge.net/ HTMLTidy], a tool for cleaning up HTML. Although Tidy was not designed specifically for handling Word's skewed HTML, it can help. The only catch is that it will require a bit of command line know-how on your part. There are some graphical tools that use Tidy in the background (many text editors offer Tidy plugins) and there's a [https://addons.mozilla.org/en-US/firefox/addon/249 Firefox plugin] as well (Windows and Linux). However, the main way to use Tidy is from the command line. To clean up Word docs use the --word2000 flag which should handle some of Word's bloated HTML output. For instance, the following, when entered in your terminal, will process the file named myWordHTML.html. tidy -f errs.txt --word2000 myWordHTML.html While Tidy sounds like it would great (and indeed it is for many things) sometimes its handling of the Word generated HTML isn't all that great, but it can help get you started on the road to cleaner HTML. ==Other Options== If Word's HTML export options don't strike you as a good starting point, there are some other ways to go about the conversion process. One way would be to take advantage of the work others have already done. ===Gmail=== For instance, Gmail offers the ability to view Word attachments as HTML files. When you click the "View as HTML" link at the bottom of your Gmail message, Google will spit out a converted page. Just use your browser's view source tool to copy and paste the results. Gmail will do some things you may not like, such use font tags to specify text colors and heading attributes, but you can always clean those up later. ===TinyMCE=== Another viable option is to use a tool like [http://tinymce.moxiecode.com/ TinyMCE], a JavaScript Rich Text Editor that offers a "Paste from Word" option. Paste From Word is intended to used by those who would like to just select-all in Word and paste the content into TinyMCE. Depending on the complexity of your document, TinyMCE may be able to fix some of Word's styling quirks and output usable HTML. To use this feature look for the TinyMCE icon that has a small Word graphic on a clipboard. Click that and then paste your Word doc in the resulting window. Click insert and to see your results just click the source button in the TinyMCE interface. Then copy and paste that code into your web document. There's even a [http://tinymce.moxiecode.com/example_full.php?example=true demo version of TinyMCE available] through the website that you can use. Again, neither of these options is going to handle really complex documents, but for the simple case, they may take care of some grunt work, meaning that all you need to do is a little clean up. ==Batch Processing== If you've got a significant number of files to convert you best option is probably to throw down the cash for a dedicated converter. Tools like [http://www.zapadoo.com/ Word Cleaner] or [http://www.clicktoconvert.com/pages/convert_word_document_to_html/index.htm Click to Convert] can batch process your files and generate acceptable HTML. You'll also get some extra features like automatic PDF creation and other niceties.