Converting a single Word document to Markdown takes seconds with WordToMD. But what if you have 50, 100, or 500 .docx files to convert? That’s where batch conversion scripts come in. This guide covers the fastest approaches for bulk Word to Markdown conversion.
When You Need Batch Conversion
If you have:
- A legacy documentation library in Word format
- Weekly reports that need to become Markdown pages
- A content migration project (Word → static site, wiki, or knowledge base)
- Multiple authors submitting Word docs that publish to Markdown
You need batch conversion. For single files, WordToMD remains the easiest option.
Pandoc: The Batch Conversion Backbone
Pandoc is the best tool for batch conversion. Install it once, then script it for any volume of files.
Windows PowerShell Script
# convert-all.ps1
# Converts all .docx files in current directory to .md
$docxFiles = Get-ChildItem -Path "." -Filter "*.docx" -Recurse
foreach ($file in $docxFiles) {
$outputPath = [System.IO.Path]::ChangeExtension($file.FullName, ".md")
pandoc `
$file.FullName `
-t gfm `
--wrap=none `
--extract-media="./media" `
-o $outputPath
Write-Host "✓ $($file.Name) → $([System.IO.Path]::GetFileName($outputPath))"
}
Write-Host "Done. Converted $($docxFiles.Count) files."
Run it:
.\convert-all.ps1
Bash Script (macOS / Linux)
#!/bin/bash
# convert-all.sh
# Converts all .docx files recursively
OUTPUT_DIR="./markdown-output"
mkdir -p "$OUTPUT_DIR"
find . -name "*.docx" | while read -r docx_file; do
filename=$(basename "$docx_file" .docx)
output="$OUTPUT_DIR/$filename.md"
pandoc \
"$docx_file" \
-t gfm \
--wrap=none \
--extract-media="$OUTPUT_DIR/media" \
-o "$output"
echo "✓ $docx_file → $output"
done
echo "Done."
Run it:
chmod +x convert-all.sh
./convert-all.sh
Python Script for Advanced Use Cases
Python’s python-docx library and Pandoc subprocess approach gives more control:
#!/usr/bin/env python3
"""
batch_convert.py — Convert all .docx files to Markdown with frontmatter injection
"""
import os
import subprocess
import sys
from pathlib import Path
from datetime import datetime
INPUT_DIR = Path("./word-docs")
OUTPUT_DIR = Path("./markdown-output")
OUTPUT_DIR.mkdir(exist_ok=True)
def convert_file(docx_path: Path) -> Path:
"""Convert a .docx file to GFM Markdown using Pandoc."""
output_path = OUTPUT_DIR / docx_path.with_suffix(".md").name
result = subprocess.run(
[
"pandoc",
str(docx_path),
"-t", "gfm",
"--wrap=none",
f"--extract-media={OUTPUT_DIR}/media",
"-o", str(output_path)
],
capture_output=True,
text=True
)
if result.returncode != 0:
print(f" ERROR: {result.stderr}", file=sys.stderr)
return None
return output_path
def add_frontmatter(md_path: Path, title: str) -> None:
"""Prepend YAML frontmatter to a Markdown file."""
frontmatter = f"""---
title: "{title}"
date: {datetime.now().strftime('%Y-%m-%d')}
draft: false
---
"""
content = md_path.read_text(encoding="utf-8")
md_path.write_text(frontmatter + content, encoding="utf-8")
def main():
docx_files = list(INPUT_DIR.glob("**/*.docx"))
if not docx_files:
print(f"No .docx files found in {INPUT_DIR}")
return
print(f"Converting {len(docx_files)} files...")
success = 0
for docx_path in docx_files:
print(f" Converting: {docx_path.name}")
output_path = convert_file(docx_path)
if output_path:
# Use filename (without extension) as title, replace hyphens/underscores
title = docx_path.stem.replace("-", " ").replace("_", " ").title()
add_frontmatter(output_path, title)
print(f" ✓ → {output_path.name}")
success += 1
else:
print(f" ✗ Failed: {docx_path.name}")
print(f"\nDone: {success}/{len(docx_files)} files converted.")
if __name__ == "__main__":
main()
Run it:
python3 batch_convert.py
Adding Frontmatter Automatically
Static site generators need frontmatter. The Python script above adds basic frontmatter. For a more complete approach, extract the document title from the first H1 heading:
def extract_title(md_content: str) -> str:
"""Extract the first H1 heading as the title."""
for line in md_content.splitlines():
if line.startswith("# "):
return line[2:].strip()
return "Untitled"
Then update the add_frontmatter call to use the extracted title.
Preserving Directory Structure
When converting a nested folder structure, maintain the hierarchy:
# Bash: preserve subdirectory structure
find ./word-docs -name "*.docx" | while read -r docx; do
# Calculate relative path
rel_path="${docx#./word-docs/}"
output_dir="./markdown-output/$(dirname "$rel_path")"
mkdir -p "$output_dir"
output="$output_dir/$(basename "$docx" .docx).md"
pandoc "$docx" -t gfm --wrap=none -o "$output"
echo "✓ $rel_path"
done
Image Handling in Batch Conversion
With --extract-media, Pandoc extracts all images to a folder. The challenge: multiple .docx files may produce files with identical names (e.g., image1.png). Use a per-document subdirectory:
for docx_file in *.docx; do
slug="${docx_file%.docx}"
mkdir -p "./media/$slug"
pandoc "$docx_file" \
-t gfm \
--wrap=none \
--extract-media="./media/$slug" \
-o "$slug.md"
done
Performance for Large Sets
For large document sets (100+ files):
- Pandoc processes one file per invocation — this is fine for up to ~100 files
- For 500+ files, use GNU Parallel for concurrent processing:
find . -name "*.docx" | \
parallel pandoc {} -t gfm --wrap=none -o {.}.md
Install GNU Parallel: brew install parallel / sudo apt install parallel
Validating Output Quality
After batch conversion, spot-check your output:
- Random sample — Open 5-10 converted files and compare to the originals
- Table count — Count tables in a document and verify they all converted
- Heading levels — Ensure heading hierarchy is preserved
- Broken links — Search for Markdown links and verify they resolve
A simple check script:
# Count files with no headings (might indicate conversion failure)
for md in ./markdown-output/*.md; do
if ! grep -q "^#" "$md"; then
echo "No headings found: $md"
fi
done
FAQ
Can I batch convert .doc files (old Word format)?
Save them as .docx first. Word’s macro feature can batch-save: Tools → Macros → run a SaveAs macro across all open files. LibreOffice can also batch-convert via command line: soffice --headless --convert-to docx *.doc.
How long does batch conversion take? Pandoc converts a typical 10-page document in under a second. 100 files = roughly 1-2 minutes. Very large documents (100+ pages) take longer.
I’m getting “pandoc: command not found” in my script.
Make sure Pandoc is installed and on your PATH. Run which pandoc (macOS/Linux) or Get-Command pandoc (PowerShell) to check.
Can I run batch conversion on Windows without PowerShell?
Yes — use a .bat file or install Git Bash to run the bash scripts.
Related Guides
- Pandoc Word to Markdown — Complete Pandoc reference
- Word to Static Site Generator — Publishing converted content
- Word to GitBook — Migrating to GitBook
Conclusion
Batch Word to Markdown conversion is straightforward with Pandoc and a few lines of scripting. The PowerShell and Bash scripts above handle most scenarios. For custom frontmatter, directory structure preservation, or image handling, the Python script gives you full control. For one-off conversions, WordToMD remains the fastest option.