pandoc command-line word markdown tutorial

Pandoc Word to Markdown: Command-Line Conversion Guide

How to use Pandoc to convert Word documents to Markdown on the command line. Includes installation, basic commands, output customization, and comparison with browser tools.

W
WordToMD Team
·

Pandoc is the most powerful document conversion tool available. It converts between dozens of formats, including .docx to Markdown, and gives you fine-grained control over the output. This guide covers everything you need to use Pandoc for Word to Markdown conversion.

What Is Pandoc?

Pandoc is a free, open-source command-line tool that converts documents between formats. It supports over 40 input and output formats. For Word to Markdown conversion, Pandoc is the gold standard — more configurable than any browser-based tool, including WordToMD.

When to use Pandoc vs. WordToMD:

ScenarioPandocWordToMD
One-off conversion❌ Requires install✅ Instant, browser-based
Batch conversion✅ Script it❌ One file at a time
Custom output format✅ Extensive options❌ Standard GFM only
Privacy-sensitive docs✅ Local processing✅ Browser-only, no upload
No terminal experience❌ CLI knowledge needed✅ Drag and drop
CI/CD pipeline✅ Automatable❌ Not automatable

Installing Pandoc

Windows

Download the installer from pandoc.org/installing.html, or use a package manager:

# Winget
winget install pandoc

# Chocolatey
choco install pandoc

# Scoop
scoop install pandoc

macOS

# Homebrew
brew install pandoc

# MacPorts
sudo port install pandoc

Linux

# Ubuntu/Debian
sudo apt-get install pandoc

# Fedora
sudo dnf install pandoc

# Arch
sudo pacman -S pandoc

Verify installation:

pandoc --version

Basic Conversion Command

pandoc document.docx -o output.md

That’s it. Pandoc infers the input format from .docx and the output format from .md.

More explicit version:

pandoc -f docx -t markdown document.docx -o output.md

Output Format Options

Pandoc offers several Markdown variants as output:

# GitHub Flavored Markdown (GFM)
pandoc document.docx -t gfm -o output.md

# CommonMark
pandoc document.docx -t commonmark -o output.md

# Pandoc's extended Markdown (most features)
pandoc document.docx -t markdown -o output.md

# Standard Markdown (basic)
pandoc document.docx -t markdown_strict -o output.md

GFM is recommended for most use cases (GitHub, Obsidian, GitBook, Notion).

Useful Conversion Options

Extract Media Files

Images embedded in .docx are extracted to a directory:

pandoc document.docx --extract-media=./media -o output.md

This saves images to ./media/ and adds Markdown image references to the output.

Wrap Lines

Control line wrapping in the output:

# No wrapping (one paragraph = one line)
pandoc document.docx --wrap=none -o output.md

# Wrap at 80 characters
pandoc document.docx --wrap=auto --columns=80 -o output.md

--wrap=none is recommended for Markdown that will be version-controlled — it produces cleaner Git diffs.

Table of Contents

pandoc document.docx --toc -o output.md

Standalone Document with Metadata

pandoc document.docx -s -o output.md

The -s (standalone) flag generates YAML frontmatter from the document’s metadata.

Batch Conversion

Convert all .docx files in a directory:

Windows PowerShell

Get-ChildItem -Filter "*.docx" | ForEach-Object {
    $output = [System.IO.Path]::ChangeExtension($_.Name, ".md")
    pandoc $_.FullName -t gfm --wrap=none -o $output
    Write-Host "Converted: $($_.Name)$output"
}

Bash (macOS/Linux)

for file in *.docx; do
    output="${file%.docx}.md"
    pandoc "$file" -t gfm --wrap=none -o "$output"
    echo "Converted: $file$output"
done

For more batch conversion options, see Batch Convert Word to Markdown.

Customizing Output with Lua Filters

Pandoc supports Lua filters that transform the document AST during conversion:

pandoc document.docx --lua-filter=my-filter.lua -o output.md

Example Lua filter to add a custom frontmatter block:

-- add-frontmatter.lua
function Meta(meta)
  meta.draft = false
  meta.author = "WordToMD Team"
  return meta
end

This is the kind of customization that browser tools can’t match.

Comparing Pandoc and mammoth.js Output

WordToMD uses mammoth.js under the hood. Pandoc uses its own parser. Key differences:

FeaturePandocmammoth.js (WordToMD)
Image extraction--extract-media⚠️ Noted in Conversion Notes
Custom styles✅ Via reference doc⚠️ Logged as warnings
Math equations✅ MathML → LaTeX❌ Not supported
Track changes✅ Configurable❌ Stripped
Footnotes✅ Preserved⚠️ Inline only
Comments✅ Optional❌ Stripped

For complex documents, Pandoc gives more complete output. For simple documents, WordToMD is faster with zero setup.

Pandoc Defaults Files

For repeated conversions with the same settings, create a defaults file:

# my-defaults.yaml
from: docx
to: gfm
wrap: none
extract-media: ./media
standalone: true

Then run:

pandoc document.docx -d my-defaults.yaml -o output.md

Using Pandoc in CI/CD

For automated pipelines (GitHub Actions, etc.):

# .github/workflows/convert-docs.yml
- name: Install Pandoc
  run: sudo apt-get install pandoc

- name: Convert Word docs to Markdown
  run: |
    for f in docs/*.docx; do
      pandoc "$f" -t gfm --wrap=none -o "${f%.docx}.md"
    done

FAQ

Pandoc outputs \ line continuations in my Markdown. How do I remove them? Add --wrap=none to disable hard line wrapping.

My tables look garbled in Pandoc output. Try -t gfm for GFM table syntax instead of the default Pandoc Markdown tables.

Images aren’t showing up in the output. Add --extract-media=./images to extract embedded images. Then reference them correctly in your target environment.

Pandoc converts smart quotes to " characters — how do I keep them? Add --no-highlight or modify the template. For smart quotes specifically, they should be preserved by default in most output formats.

How do Pandoc and WordToMD compare for DOCX to Markdown? Both work well for standard documents. WordToMD requires zero setup and produces clean GFM output. Pandoc requires installation but handles complex documents (images, math, comments, footnotes) more completely.

Conclusion

Pandoc is the most capable tool for Word to Markdown conversion, especially for batch processing, custom output formats, and CI/CD pipelines. For quick, one-off conversions without installation, WordToMD gets the job done instantly. The two tools complement each other well.