---
title: "Get 50% more traffic by automating content versions (not new content) that LLMs love!"
date: 2026-02-26T11:26:36.637+00:00
url: https://hurricane.works/blog/md-files-llmstext-llm-content
description: "A practical guide to making your website content accessible to AI systems like ChatGPT, Claude, Perplexity, and other LLM-powered tools."
author: "Alastair"
categories: ["AI"]
---

# Get 50% more traffic by automating content versions (not new content) that LLMs love!

## How to Make Your Website LLM-Friendly

A practical guide to making your website content accessible to AI systems like ChatGPT, Claude, Perplexity, and other LLM-powered tools.

---

## Table of Contents

1. [Why: The Third Audience](#why-the-third-audience)
2. [What: The Three Components](#what-the-three-components)
3. [When: Should You Implement This?](#when-should-you-implement-this)
4. [How: Implementation Overview](#how-implementation-overview)
5. [Part 1: Markdown Auto-Discovery](#part-1-markdown-auto-discovery)
6. [Part 2: llms.txt (AI Sitemap)](#part-2-llmstxt-ai-sitemap)
7. [Part 3: Bot Tracking](#part-3-bot-tracking)
8. [Platform-Specific Instructions](#platform-specific-instructions)
   - [WordPress](#wordpress)
   - [Next.js / React](#nextjs--react)
   - [Hugo / Static Site Generators](#hugo--static-site-generators)
   - [Custom / Other Frameworks](#custom--other-frameworks)
9. [Testing & Validation](#testing--validation)
10. [Real-World Results](#real-world-results)
11. [Using AI Assistants to Implement](#using-ai-assistants-to-implement)
12. [Resources](#resources)

---

## Why: The Third Audience

Websites have traditionally served two audiences:

1. **Humans** -- who read and interact with your pages
2. **Search engines** -- who crawl and index for Google, Bing, etc.

There is now a **third audience: AI systems**.

Large Language Models (LLMs) like GPT-4, Claude, and others are increasingly used to answer questions, summarise information, and assist with research. When these systems reference your content, they benefit from clean, structured formats rather than complex HTML with navigation, ads, and scripts.

**Reference:** [The Third Audience](https://dri.es/the-third-audience) by Dries Buytaert (creator of Drupal)

### Benefits

- **Accurate AI responses** -- Clean content means fewer hallucinations when AI references your site
- **Brand mentions** -- AI tools may cite your site more often when they can easily parse your content
- **Future-proofing** -- As AI search grows, optimised sites will have an advantage
- **No SEO downside** -- This complements traditional SEO, doesn't replace it

---

## What: The Three Components

A fully LLM-optimised site has three things:

### 1. Markdown Auto-Discovery

For every page on your site, provide a Markdown version at a predictable URL and tell crawlers about it:

```
https://yoursite.com/about        --> HTML page (for humans)
https://yoursite.com/about.md     --> Markdown version (for AI)
```

Plus a discovery link in your HTML `<head>`:

```html
<link rel="alternate" type="text/markdown" href="/about.md" />
```

### 2. llms.txt (AI Sitemap)

A single file at your site root following the [llmstxt.org](https://llmstxt.org/) standard. Think of it as a sitemap specifically for AI crawlers -- it describes your site and links to all your `.md` pages:

```
https://yoursite.com/llms.txt
```

### 3. Bot Tracking (Optional)

Log which AI crawlers visit your `.md` files so you can measure whether it's working.

---

## When: Should You Implement This?

### Good Candidates

- **Content-heavy sites** -- Blogs, documentation, news, educational content
- **B2B websites** -- Product pages, pricing, features that people research
- **Reference sites** -- APIs, technical docs, knowledge bases
- **Sites that want AI visibility** -- If you want AI to accurately represent your brand

### Less Critical

- **E-commerce product listings** -- Structured data (JSON-LD) may be more valuable
- **Highly interactive apps** -- Where content is dynamic/personalised
- **Private/gated content** -- Unless you want AI to access it

### Time Investment

- **Simple sites**: 2-4 hours
- **WordPress/CMS**: 4-8 hours
- **Large custom sites**: 1-2 days

---

## How: Implementation Overview

Regardless of platform, the implementation follows these steps:

1. **Generate Markdown files** -- Convert your HTML content to clean Markdown with YAML frontmatter
2. **Serve them** -- Configure your web server to serve `.md` files at predictable URLs
3. **Add discovery links** -- Inject `<link rel="alternate" type="text/markdown">` into every page
4. **Create llms.txt** -- Build an AI sitemap listing your key pages
5. **Add bot tracking** -- (Optional) Log AI crawler visits to measure adoption
6. **Test everything** -- Verify endpoints, headers, and discovery links

---

## Part 1: Markdown Auto-Discovery

### What Your Markdown Files Should Look Like

Each `.md` file should have YAML frontmatter followed by clean content:

```markdown
---
title: "About Us"
date: 2024-01-15
url: https://yoursite.com/about
type: page
description: "Learn about our company and mission"
---

# About Us

Your content here in clean Markdown format...
```

**Key points:**
- Strip all HTML boilerplate (navigation, footers, ads, scripts)
- Keep the actual content -- headings, paragraphs, lists, links
- Remove decorative images (icons, logos) but keep meaningful ones
- Remove duplicate CTAs and testimonial sections
- The goal is clean, readable text that an LLM can consume efficiently

### Content-Type Header

Always serve `.md` files with:

```
Content-Type: text/markdown; charset=utf-8
```

### Discovery Link Format

Add this to the `<head>` of every HTML page that has a Markdown counterpart:

```html
<link rel="alternate" type="text/markdown" href="/page-slug.md" />
```

This is how AI crawlers find the Markdown version -- similar to how RSS feeds are discovered.

---

## Part 2: llms.txt (AI Sitemap)

The [llmstxt.org](https://llmstxt.org/) standard defines a file at `/llms.txt` that helps AI systems understand your site at a glance.

### Format

```markdown
# Your Company Name

> A one-paragraph summary of what your company does and what this site contains.

## Core Pages

- [About](https://yoursite.com/about.md): Company background and mission
- [Pricing](https://yoursite.com/pricing.md): Plans and pricing details
- [Contact](https://yoursite.com/contact.md): How to get in touch

## Products

- [Product One](https://yoursite.com/product-one.md): Description of product one
- [Product Two](https://yoursite.com/product-two.md): Description of product two

## Blog

- [Recent Article](https://yoursite.com/recent-article.md): Article description
```

**Rules (from the spec):**
- H1 with your project/company name (required)
- Blockquote summary (required)
- H2 sections grouping related pages
- Markdown list items with links to `.md` endpoints
- Optional descriptions after each link

### Where to Put It

- **Primary:** `https://yoursite.com/llms.txt` (root of your site)
- **Alternative:** `https://yoursite.com/.well-known/llms.txt`

### Generating It

For small sites, write it by hand. For larger sites, generate it from your existing `.md` files by reading their frontmatter (title, description, type) and grouping them into sections.

---

## Part 3: Bot Tracking

Knowing which AI crawlers visit your content helps you measure whether the implementation is working.

### Known AI Bot User Agents

| Bot | Company | User Agent String |
|-----|---------|-------------------|
| GPTBot | OpenAI | `GPTBot/1.x` |
| ChatGPT-User | OpenAI | `ChatGPT-User` |
| ClaudeBot | Anthropic | `ClaudeBot/1.0` |
| Claude-Web | Anthropic | `Claude-Web` |
| PerplexityBot | Perplexity | `PerplexityBot` |
| Google-Extended | Google | `Google-Extended` |
| Applebot-Extended | Apple | `Applebot-Extended` |
| Meta-ExternalAgent | Meta | `meta-externalagent` |
| Bytespider | ByteDance | `Bytespider` |
| CCBot | Common Crawl | `CCBot` |
| YouBot | You.com | `YouBot` |

### Simple Tracking Approach

Log requests to `.md` URLs where the user agent matches a known AI bot. Write each hit to a CSV file:

```
timestamp,bot_name,page,ip,user_agent
2026-01-22 13:32:03,GPTBot,/about.md,74.7.243.200,"Mozilla/5.0 ... GPTBot/1.3 ..."
2026-01-23 20:38:27,Meta-AI,/pricing.md,2a03:2880:f80e:5b::,"meta-externalagent/1.1 ..."
```

### What to Look For

- **Are bots visiting?** Any hits at all means your discovery links and/or sitemap are working
- **Which bots?** GPTBot and ClaudeBot are the most common
- **Which pages?** See what content AI systems find most interesting
- **Frequency?** Daily visits vs occasional crawls

---

## Platform-Specific Instructions

### WordPress

WordPress is one of the easiest platforms to implement this on.

#### Step 1: Export Content and Generate Markdown

Export your posts/pages from the WordPress database, then convert HTML to Markdown locally using [Turndown](https://github.com/mixmark-io/turndown) (Node.js):

```javascript
// generate-markdown.js
const fs = require('fs');
const TurndownService = require('turndown');
const turndown = new TurndownService({ headingStyle: 'atx', codeBlockStyle: 'fenced' });

// For each post/page from your database export:
function processPost(post) {
    const markdown = turndown.turndown(post.content);

    return `---
title: "${post.title}"
date: ${post.date}
url: https://yoursite.com/${post.slug}
type: ${post.type}
---

# ${post.title}

${markdown}`;
}

// Write to file
fs.writeFileSync(`output/md/${post.slug}.md`, processPost(post));
```

**Tip:** Export content via MySQL query as a TSV file -- more reliable than JSON for content with special characters:

```sql
SELECT ID, post_title, post_name, post_date, post_type, post_excerpt,
REPLACE(REPLACE(REPLACE(post_content, '\t', ' '), '\r', ''), '\n', '{{NEWLINE}}')
FROM wp_posts WHERE post_status = 'publish' AND post_type IN ('post', 'page')
ORDER BY post_date DESC;
```

#### Step 2: Upload to Server

```bash
scp -r output/md/* user@server:/var/www/html/md/
```

For Bitnami WordPress on Lightsail, the path is `/opt/bitnami/wordpress/md/`.

#### Step 3: Add .htaccess Rules

Add these **before** the WordPress rewrite rules:

```apache
# BEGIN Markdown Auto-Discovery
<IfModule mod_mime.c>
    AddType text/markdown .md
</IfModule>

<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /

RewriteCond %{REQUEST_URI} ^/([^/]+)\.md$
RewriteCond %{DOCUMENT_ROOT}/md/%1.md -f
RewriteRule ^([^/]+)\.md$ /md/$1.md [L]
</IfModule>
# END Markdown Auto-Discovery
```

This routes `yoursite.com/about.md` to `yoursite.com/md/about.md` if the file exists.

#### Step 4: Add Discovery Link (functions.php)

Add to your child theme's `functions.php`:

```php
function add_markdown_discovery_link() {
    if (is_singular(array('post', 'page'))) {
        $slug = get_post_field('post_name', get_post());
        if ($slug) {
            echo '<link rel="alternate" type="text/markdown" href="/' . esc_attr($slug) . '.md" />' . "\n";
        }
    }
}
add_action('wp_head', 'add_markdown_discovery_link', 2);
```

#### Step 5: Add Bot Tracker (functions.php)

```php
add_action('init', 'track_llm_bot_visits');

function track_llm_bot_visits() {
    $request_uri = $_SERVER['REQUEST_URI'] ?? '';
    if (!preg_match('/\.md$/i', $request_uri)) return;

    $user_agent = $_SERVER['HTTP_USER_AGENT'] ?? '';
    $ai_bots = [
        'GPTBot' => 'GPTBot', 'ChatGPT-User' => 'ChatGPT-User',
        'ClaudeBot' => 'ClaudeBot', 'Claude-Web' => 'Claude-Web',
        'anthropic-ai' => 'Anthropic', 'Google-Extended' => 'Google-Extended',
        'Applebot-Extended' => 'Applebot-Extended', 'PerplexityBot' => 'PerplexityBot',
        'Bytespider' => 'Bytespider', 'CCBot' => 'CCBot', 'cohere-ai' => 'Cohere',
        'YouBot' => 'YouBot', 'Meta-ExternalAgent' => 'Meta-AI',
    ];

    $detected_bot = null;
    foreach ($ai_bots as $pattern => $name) {
        if (stripos($user_agent, $pattern) !== false) { $detected_bot = $name; break; }
    }
    if (!$detected_bot) return;

    $csv_path = ABSPATH . 'md/llm-visits.csv';
    if (!file_exists($csv_path)) {
        file_put_contents($csv_path, "timestamp,bot_name,page,ip,user_agent\n");
    }

    $log_entry = sprintf("%s,%s,%s,%s,\"%s\"\n",
        date('Y-m-d H:i:s'), $detected_bot,
        parse_url($request_uri, PHP_URL_PATH),
        $_SERVER['REMOTE_ADDR'] ?? 'unknown',
        str_replace('"', '""', $user_agent)
    );
    file_put_contents($csv_path, $log_entry, FILE_APPEND | LOCK_EX);
}
```

#### Step 6: Create llms.txt

Write your `llms.txt` file and upload it to the web root:

```bash
scp llms.txt user@server:/var/www/html/llms.txt
```

No server config changes needed -- Apache serves static files from the web root by default.

---

### Next.js / React

#### Markdown API Route

```typescript
// app/[slug]/page.md/route.ts (App Router)
import { NextRequest, NextResponse } from 'next/server';
import TurndownService from 'turndown';
import { getPostBySlug } from '@/lib/posts';

const turndown = new TurndownService();

export async function GET(
  request: NextRequest,
  { params }: { params: { slug: string } }
) {
  const post = await getPostBySlug(params.slug);
  if (!post) return new NextResponse('Not found', { status: 404 });

  const markdown = `---
title: "${post.title}"
date: ${post.date}
url: https://yoursite.com/${post.slug}
description: "${post.description}"
---

# ${post.title}

${turndown.turndown(post.content)}`;

  return new NextResponse(markdown, {
    headers: { 'Content-Type': 'text/markdown; charset=utf-8' },
  });
}
```

#### Discovery Link

```tsx
// In your page component's <head> or metadata
<link rel="alternate" type="text/markdown" href={`/${post.slug}.md`} />
```

#### Rewrites (next.config.js)

```javascript
module.exports = {
  async rewrites() {
    return [{ source: '/:slug.md', destination: '/api/:slug.md' }];
  },
};
```

#### llms.txt

For Next.js, create `public/llms.txt` -- it will be served automatically at `/llms.txt`.

---

### Hugo / Static Site Generators

Hugo can output Markdown alongside HTML natively:

```toml
# config.toml
[outputFormats.MD]
mediaType = "text/markdown"
baseName = "index"
isPlainText = true

[outputs]
page = ["HTML", "MD"]
```

Create a template at `layouts/_default/single.md`:

```
---
title: "{{ .Title }}"
date: {{ .Date.Format "2006-01-02" }}
url: {{ .Permalink }}
---

# {{ .Title }}

{{ .RawContent }}
```

Add discovery link to `layouts/_default/baseof.html`:

```html
{{ if .IsPage }}
<link rel="alternate" type="text/markdown" href="{{ .RelPermalink }}index.md" />
{{ end }}
```

For `llms.txt`, create `static/llms.txt` and it will be copied to the build output.

---

### Custom / Other Frameworks

The pattern is the same regardless of framework:

1. **Create a `/slug.md` endpoint** that returns Markdown with `Content-Type: text/markdown`
2. **Add `<link rel="alternate" type="text/markdown">` to your HTML `<head>`**
3. **Put `llms.txt` at your web root**
4. **(Optional) Log AI bot visits to a file**

**Server config for Nginx:**

```nginx
location ~ ^/(.+)\.md$ {
    alias /var/www/html/md/$1.md;
    default_type text/markdown;
    add_header Content-Type "text/markdown; charset=utf-8";
}
```

**Server config for Apache:**

```apache
AddType text/markdown .md
```

---

## Testing & Validation

Run these checks after implementation:

```bash
# 1. Markdown endpoint returns 200
curl -I https://yoursite.com/about.md
# Expected: HTTP/2 200

# 2. Content-Type is correct
curl -I https://yoursite.com/about.md | grep -i content-type
# Expected: content-type: text/markdown

# 3. Content has frontmatter
curl https://yoursite.com/about.md | head -10
# Expected: starts with ---

# 4. Discovery link exists in HTML
curl -s https://yoursite.com/about | grep 'text/markdown'
# Expected: <link rel="alternate" type="text/markdown" href="/about.md" />

# 5. llms.txt is accessible
curl -I https://yoursite.com/llms.txt
# Expected: HTTP/2 200

# 6. llms.txt content looks right
curl https://yoursite.com/llms.txt | head -10
# Expected: starts with # Your Company Name
```

---

## Real-World Results

We implemented this on two production WordPress sites in January 2026. Within 5 weeks:

**UK Site (289 pages):**
- 701 AI bot visits to `.md` endpoints
- ClaudeBot: 275 visits (39%)
- GPTBot: 231 visits (33%)
- Meta-AI: 195 visits (28%)

**US Site (119 pages):**
- 187 AI bot visits to `.md` endpoints
- GPTBot: 95 visits (51%)
- ClaudeBot: 91 visits (49%)
- PerplexityBot: 1 visit

AI crawlers started hitting the `.md` endpoints within hours of deployment. GPTBot in particular crawled aggressively once it discovered the first few pages, often visiting dozens of pages in a single session.

---

## Using AI Assistants to Implement

This entire implementation can be done with help from AI coding assistants like Claude Code or Cursor.

### Example Prompts

**For WordPress:**
```
I have a WordPress site hosted on [provider]. I want to implement
markdown auto-discovery for AI crawlers. Please:
1. Export my posts/pages content
2. Generate markdown files with frontmatter
3. Configure .htaccess to serve them
4. Add discovery links to my theme
5. Create an llms.txt file
6. Add bot tracking

My SSH access is [details]. My theme is [theme-name].
```

**For Next.js:**
```
Add LLM-friendly content to my Next.js site. I want:
1. An API route serving markdown versions of pages at /[slug].md
2. Discovery <link> tags in the <head> of each page
3. An llms.txt file in public/
4. Proper Content-Type headers

My content comes from [CMS/database/files].
```

### Tips

1. **Share your project structure** so the AI understands your codebase
2. **Go step by step** -- don't try to do everything at once
3. **Test after each change** -- verify before moving on
4. **Keep backups** of any files you modify (especially `.htaccess` and `functions.php`)

---

## Quick Reference

### Files You'll Create/Modify

| Platform | Markdown Files | Server Config | Discovery Link | llms.txt |
|----------|---------------|---------------|----------------|----------|
| WordPress | `/md/*.md` | `.htaccess` | `functions.php` | Web root |
| Next.js | API route | `next.config.js` | `<Head>` component | `public/llms.txt` |
| Gatsby | `public/md/*.md` | `_redirects` | `<Helmet>` component | `public/llms.txt` |
| Hugo | Built-in output | N/A | Template | `static/llms.txt` |
| Custom | `/md/*.md` | nginx/apache conf | Template | Web root |

### Key URLs to Verify

```
https://yoursite.com/llms.txt              -- AI sitemap
https://yoursite.com/about.md              -- Example markdown page
https://yoursite.com/.well-known/llms.txt  -- Alternative AI sitemap location
```

---

## Resources

- [The Third Audience](https://dri.es/the-third-audience) -- The concept article by Dries Buytaert
- [llmstxt.org](https://llmstxt.org/) -- The llms.txt standard specification
- [Turndown](https://github.com/mixmark-io/turndown) -- HTML to Markdown converter (JavaScript)
- [markdownify](https://github.com/matthewwithanm/python-markdownify) -- HTML to Markdown (Python)

---

*Last updated: February 2026*

