AI Crawlers Ignore llms.txt — But AI Agents Don't

March 07, 2026 • 9 min read

Bart Waardenburg

AI Agent Readiness Expert & Founder

Dries Buytaert, founder of Drupal, recently published a data-driven analysis of llms.txt and markdown adoption by AI crawlers. His conclusion: zero AI crawlers accessed his llms.txt file, markdown pages increased total crawl traffic by 7%, and no crawler used HTTP content negotiation. He called llms.txt "a solution looking for a problem."

The data is solid. The conclusion is wrong, because he measured the wrong thing.

What the Data Actually Shows

Dries analyzed his Cloudflare logs after making all his pages available as markdown files. The findings are worth taking seriously:

AI CRAWLERS ACCESSING LLMS.TXT

CRAWL TRAFFIC INCREASE FROM .MD PAGES

+7%

CRAWLERS USING CONTENT NEGOTIATION

PAGES CRAWLED PER CITATION SENT BACK

1,241

Across Acquia's entire hosting infrastructure, one of the largest Drupal hosting platforms, llms.txt represented just 0.001% of 400 million requests. All 52 requests to llms.txt came from SEO audit tools, not AI systems.

Leon Furze ran a similar experiment on his WordPress blog. Same result: markdown and HTML pages crawled at roughly the same rate, no measurable traffic difference, and llms.txt made no visible impact on crawler behavior.

The data is clear: AI crawlers don't use llms.txt. But that's like measuring how many trucks use your bike lane and concluding bike lanes are useless.

Crawlers and Agents Are Fundamentally Different

Dries' analysis has a blind spot: it only looks at one half of the equation. Crawling and training is not the only way AI systems interact with web content. The distinction that matters:

	AI Crawlers	AI Agents
Purpose	Scrape content for training data	Complete a task for a specific user
Behavior	Mass crawl, grab everything	Targeted fetch, get what's needed
Token efficiency	Irrelevant — data is preprocessed offline	Critical — every token costs time and money
Content format	HTML is fine, they strip it anyway	Markdown saves 80% of tokens
Discovery	Sitemaps, link crawling	llms.txt, content negotiation, tool manifests
Examples	GPTBot, ClaudeBot, Google-Extended	Claude Code, Cursor, Windsurf, Bun

AI crawlers are built to hoover up the web. They have established pipelines optimized for HTML scraping, built years ago. They'd be silly to change that setup just because a few sites now offer raw markdown.

AI agents are the opposite. They fetch specific pages to solve a specific task, and every token counts. A blog post that's 20% content and 80% navigation HTML? Wasteful. Markdown and llms.txt solve that problem directly.

Coding Agents Are Already Using These Standards

Look beyond crawler logs and there's already concrete agent-side adoption happening:

Claude Code

Anthropic's coding agent sends Accept headers that prefer markdown when fetching documentation. It also looks for llms.txt to discover relevant content on a site.

Bun

The JavaScript runtime started sending content negotiation headers when fetching documentation pages, preferring markdown when available.

Cursor & Windsurf

AI-powered code editors fetch documentation to help developers. They benefit directly from markdown versions that preserve structure without HTML noise.

Cloudflare

Now offers content negotiation and markdown transformation in its paid tiers — a clear signal that platform providers see demand from the agent side.

Some documentation platforms have already started putting "agent directives" on pages pointing agents to llms.txt for content discovery. The pattern is clear: content negotiation and llms.txt adoption is being driven by the agentic developer tooling space. Not by the training pipeline.

Adoption Is Industry-Specific

Another factor Dries' analysis misses: llms.txt and markdown adoption is heavily skewed toward developer documentation. Dries runs a personal blog, not a docs site. The use case is different.

Developer documentation is where coding agents spend most of their time. When Claude Code needs to understand a library API, or Cursor needs to look up a framework's configuration options, they're fetching documentation pages. Exactly the pages where:

Markdown versions save the most tokens (docs pages are heavy on navigation and sidebars)
llms.txt provides a curated entry point to the most relevant pages
Content negotiation allows agents to get clean content without the UI chrome

Vercel, Cloudflare, Stripe, and other developer-facing companies have already implemented these standards. The Vercel State of AEO report explicitly recommends llms.txt as part of a comprehensive AI visibility strategy. Vercel even built AEO tracking for coding agents to measure this adoption.

Why Crawlers Will Probably Never Use llms.txt

Understanding why crawlers ignore llms.txt makes the distinction even clearer:

Scale economics. Crawlers process billions of pages. Adding a curated discovery step per domain adds complexity for minimal gain. They already have sitemaps and link graphs
Training incentives. More data is better for training. A curated llms.txt that points to 20 key pages is the opposite of what a training pipeline wants
Existing infrastructure. HTML scraping pipelines are mature and battle-tested. There's no business case to rebuild them for markdown
Content control concerns. Why would they bother with a curated list? They get more context if they take everything. The incentives are misaligned

This is not a failure of llms.txt. It's confirmation that llms.txt was never meant for crawlers in the first place.

Readiness Is Not About Today's ROI

Dries' article concludes with practical advice: focus on "clear writing, authoritative content, and timely publishing" rather than llms.txt. That advice isn't wrong. But it's incomplete.

The same argument was made about mobile optimization in 2010, about HTTPS in 2014, and about structured data in 2018. Every time, early adopters who invested before the wave hit were rewarded when adoption tipped. The sites that waited got to scramble.

The agent ecosystem is growing fast. Coding agents are becoming the default way developers interact with documentation, and AI-powered browsing agents like ChatGPT Search and Claude Search are maturing. Sites that are already machine-readable will have a structural advantage.

What You Should Actually Implement

Based on where agent adoption actually is, not where crawler adoption is, here's what matters:

1. llms.txt

Create a curated entry point for agents. List your most important pages with brief descriptions. Low effort, high signal for any agent that looks for it.

2. Content Negotiation

Serve markdown when agents request it via Accept headers. Cloudflare offers this out of the box. Saves agents 80% of token overhead.

3. Structured Data

JSON-LD, Schema.org types, and FAQPage schema make your content machine-readable for both crawlers and agents. Cited sites are far more likely to carry FAQPage schema (an 8x correlation), though a 2026 Ahrefs study found adding schema alone does not directly move citations.

4. Crawler Access

Allow AI crawlers in robots.txt. Block training bots if you want, but keep search bots open. This is the baseline. No access means no visibility.

The first two are agent-specific. The last two help both crawlers and agents. Together, they cover the full spectrum of how AI systems interact with your content.

The Bottom Line

Dries' data is accurate: AI crawlers don't use llms.txt. But measuring llms.txt adoption by crawler behavior is like measuring the success of an API by how many web browsers access it. The audience is different.

AI agents, coding assistants, browsing agents, task automation tools, are the actual consumers of llms.txt and content negotiation. They're smaller in volume than crawlers but growing fast. They represent the future of how software interacts with web content.

"Do AI crawlers use llms.txt today?" is the wrong question. The right one: when agents become the primary way users interact with your content, will your site be ready?

Sources

Dries Buytaert: Markdown, llms.txt, and AI Crawlers — Original analysis of crawler behavior with Cloudflare log data
Leon Furze: Letting the Robots In — Independent replication on WordPress with similar findings
llms.txt Specification — The original proposal by Jeremy Howard
How Vercel Built AEO Tracking for Coding Agents — Vercel's approach to measuring agent adoption
Cloudflare: Markdown for Agents — Content negotiation and markdown transformation feature
IsAgentReady: The State of AEO — Key Insights from Vercel's 2026 Report

AI Crawlers Ignore llms.txt — But AI Agents Don't

What the Data Actually Shows

Crawlers and Agents Are Fundamentally Different

Coding Agents Are Already Using These Standards

Claude Code

Bun

Cursor & Windsurf

Cloudflare

Adoption Is Industry-Specific

Why Crawlers Will Probably Never Use llms.txt

Readiness Is Not About Today's ROI

What You Should Actually Implement

1. llms.txt

2. Content Negotiation

3. Structured Data

4. Crawler Access

The Bottom Line

Sources

SCAN YOUR WEBSITE

RELATED ARTICLES

Does Schema Markup Get You Cited by AI? What the Data Actually Shows

Content Negotiation for AI Agents: Why Sentry Serves Markdown Over HTML

Cloudflare /crawl Endpoint: One API Call to Crawl Any Website

EXPLORE MORE

RANKINGS

COMPARE

ABOUT