Markdown Auto-Discovery
Making WordPress sites AI-crawler friendly with auto-generated markdown
Overview
A system for generating AI-friendly Markdown versions of WordPress website pages. It processes content locally and uploads static .md files to the server, enabling AI crawlers like ClaudeBot, GPTBot, and Perplexity to discover and consume content in a clean, structured format.
Deployed across two production WordPress sites, the system extracts content from WordPress, generates clean markdown files using a Node.js script, and deploys them alongside server rewrite rules so any page's markdown version is accessible by appending .md to the URL. Discovery tags in the WordPress theme link HTML pages to their markdown equivalents.
The system generated 289 markdown files for the UK site and 119 for the USA site, with a documented refresh process for when content changes.
289
UK Files Generated
119
USA Files Generated
2
Sites Deployed
5+
AI Crawlers Served
Key Features
Automated MD Generation
Node.js script processes WordPress database exports (TSV format) and generates clean markdown files for every published post and page.
Apache URL Routing
Custom .htaccess rewrite rules serve markdown files at /slug.md URLs, making every page's markdown version accessible by appending .md.
Discovery Tags
WordPress theme hook adds <link rel="alternate" type="text/markdown"> tags so AI crawlers can automatically find the markdown version of any page.
Multi-Site Deployment
Identical system deployed across two production WordPress installations with site-specific content.
Documented Refresh Process
Complete workflow for re-extracting content, regenerating markdown, and deploying when content changes.
CDN Compatibility
Works correctly with CDN caching layers, including URL rewriting through CDN proxies.
Technology Stack
Scripts
Server
WordPress
Protocol
View More Projects
Explore the full portfolio of web platforms, data pipelines, and AI-powered tools.
Back to Portfolio