Markdown for Microsoft Word
August 17, 2019
If you prefer text-based documentation(markdown, etc) over WYSIWYG(MS Word), then this is for you! If you work in modern corporate America, Microsoft Word is unavoidable. Even though I wish it weren’t so, most engineering organizations use it for technical documentation.
I want to use markdown as the source and then automatically render Microsoft Word. Most folks online recommend pandoc for this kind of thing. The problem is that pandoc renders docx using it’s own style.
But there’s usually a standard corporate template that you’re required to use.
At this point I usually stare blankly at my screen for a couple of minutes then begrudgingly load up the corporate template in Word and start copy-n-past’ing then manually formatting. But…. no…. longer.
I came across a little hack that works pretty well. If you insert unstyled HTML
into a Word template using Insert->Object->Text from File
, Word will
automatically apply the template styles.
So, I generate HTML from markdown: pandoc -s my.md -o my.html
, then insert the
HTML using the Text from File
menu item in Word and voila! I have a
perfectly formatted Word docx using the corporate template automatically.
However, it’s still a hassle to have to perform these manual menu selections everything I want to re-render the docx. So here’s a little Python script to automate that part using COM:
import sys
import os
import argparse
import win32com.client as win32
from win32com.client import constants as c
def main(argv=None):
if argv is None:
argv = sys.argv
parser = argparse.ArgumentParser(description="convert html to docx")
parser.add_argument("htmlfile", help="the html file path for input")
parser.add_argument("docxfile", help="the docx file path for output")
parser.add_argument("-t", "--template", default="template.dotx", help="the template file")
args = parser.parse_args(argv[1:])
template_path = os.path.realpath(args.template)
html_path = os.path.realpath(args.htmlfile)
docx_path = os.path.realpath(args.docxfile)
word = win32.gencache.EnsureDispatch('Word.Application')
try:
doc = word.Documents.Open(template_path)
word.Visible = True
section = doc.Sections(2)
section.Range.InsertFile(html_path)
doc.TablesOfContents(1).Update()
for i in range(1, doc.InlineShapes.Count + 1):
doc.InlineShapes(i).LinkFormat.BreakLink()
doc.SaveAs2(docx_path, FileFormat=c.wdFormatXMLDocument)
finally:
word.Quit()
if __name__ == '__main__':
sys.exit(main())
It also regenerates the table of contents and embeds the images after the text is inserted. I hope this helps any other text-based documentation lovers out there.
2019/9/13 Update: We’re now embedding all the images instead of leaving them as links.
2020/6/15 Update: It turns out that this method doesn’t work in the general case. It happens to work with my template because the template fonts are close enough to the default Word fonts that they match.
2024/9/14 Update: Perhaps this is the best answer now?