AI Crawlers Ignore llms.txt — But AI Agents Don't
Dries Buytaert, founder of Drupal, recently published a data-driven analysis of llms.txt and markdown adoption by AI crawlers. His conclusion: zero AI crawlers accessed his llms.txt file, markdown pages increased total crawl traffic by 7%, and no crawler used HTTP content negotiation. He called llms.txt "a solution looking for a problem."
The data is solid. The conclusion is wrong, because he measured the wrong thing.
What the Data Actually Shows
Dries analyzed his Cloudflare logs after making all his pages available as markdown files. The findings are worth taking seriously:
Across Acquia's entire hosting infrastructure, one of the largest Drupal hosting platforms, llms.txt represented just 0.001% of 400 million requests. All 52 requests to llms.txt came from SEO audit tools, not AI systems.
Leon Furze ran a similar experiment on his WordPress blog. Same result: markdown and HTML pages crawled at roughly the same rate, no measurable traffic difference, and llms.txt made no visible impact on crawler behavior.
The data is clear: AI crawlers don't use llms.txt. But that's like measuring how many trucks use your bike lane and concluding bike lanes are useless.
Crawlers and Agents Are Fundamentally Different
Dries' analysis has a blind spot: it only looks at one half of the equation. Crawling and training is not the only way AI systems interact with web content. The distinction that matters:
| AI Crawlers | AI Agents | |
|---|---|---|
| Purpose | Scrape content for training data | Complete a task for a specific user |
| Behavior | Mass crawl, grab everything | Targeted fetch, get what's needed |
| Token efficiency | Irrelevant — data is preprocessed offline | Critical — every token costs time and money |
| Content format | HTML is fine, they strip it anyway | Markdown saves 80% of tokens |
| Discovery | Sitemaps, link crawling | llms.txt, content negotiation, tool manifests |
| Examples | GPTBot, ClaudeBot, Google-Extended | Claude Code, Cursor, Windsurf, Bun |
AI crawlers are built to hoover up the web. They have established pipelines optimized for HTML scraping, built years ago. They'd be silly to change that setup just because a few sites now offer raw markdown.
AI agents are the opposite. They fetch specific pages to solve a specific task, and every token counts. A blog post that's 20% content and 80% navigation HTML? Wasteful. Markdown and llms.txt solve that problem directly.
Coding Agents Are Already Using These Standards
Look beyond crawler logs and there's already concrete agent-side adoption happening:
Claude Code
Anthropic's coding agent sends Accept headers that prefer markdown when fetching documentation. It also looks for llms.txt to discover relevant content on a site.
Bun
The JavaScript runtime started sending content negotiation headers when fetching documentation pages, preferring markdown when available.
Cursor & Windsurf
AI-powered code editors fetch documentation to help developers. They benefit directly from markdown versions that preserve structure without HTML noise.
Cloudflare
Now offers content negotiation and markdown transformation in its paid tiers — a clear signal that platform providers see demand from the agent side.
Some documentation platforms have already started putting "agent directives" on pages pointing agents to llms.txt for content discovery. The pattern is clear: content negotiation and llms.txt adoption is being driven by the agentic developer tooling space. Not by the training pipeline.
Adoption Is Industry-Specific
Another factor Dries' analysis misses: llms.txt and markdown adoption is heavily skewed toward developer documentation. Dries runs a personal blog, not a docs site. The use case is different.
Developer documentation is where coding agents spend most of their time. When Claude Code needs to understand a library API, or Cursor needs to look up a framework's configuration options, they're fetching documentation pages. Exactly the pages where:
- Markdown versions save the most tokens (docs pages are heavy on navigation and sidebars)
- llms.txt provides a curated entry point to the most relevant pages
- Content negotiation allows agents to get clean content without the UI chrome
Vercel, Cloudflare, Stripe, and other developer-facing companies have already implemented these standards. The Vercel State of AEO report explicitly recommends llms.txt as part of a comprehensive AI visibility strategy. Vercel even built AEO tracking for coding agents to measure this adoption.
Why Crawlers Will Probably Never Use llms.txt
Understanding why crawlers ignore llms.txt makes the distinction even clearer:
- Scale economics. Crawlers process billions of pages. Adding a curated discovery step per domain adds complexity for minimal gain. They already have sitemaps and link graphs
- Training incentives. More data is better for training. A curated llms.txt that points to 20 key pages is the opposite of what a training pipeline wants
- Existing infrastructure. HTML scraping pipelines are mature and battle-tested. There's no business case to rebuild them for markdown
- Content control concerns. Why would they bother with a curated list? They get more context if they take everything. The incentives are misaligned
This is not a failure of llms.txt. It's confirmation that llms.txt was never meant for crawlers in the first place.
Readiness Is Not About Today's ROI
Dries' article concludes with practical advice: focus on "clear writing, authoritative content, and timely publishing" rather than llms.txt. That advice isn't wrong. But it's incomplete.
The same argument was made about mobile optimization in 2010, about HTTPS in 2014, and about structured data in 2018. Every time, early adopters who invested before the wave hit were rewarded when adoption tipped. The sites that waited got to scramble.
The agent ecosystem is growing fast. Coding agents are becoming the default way developers interact with documentation, and AI-powered browsing agents like ChatGPT Search and Claude Search are maturing. Sites that are already machine-readable will have a structural advantage.
What You Should Actually Implement
Based on where agent adoption actually is, not where crawler adoption is, here's what matters:
1. llms.txt
Create a curated entry point for agents. List your most important pages with brief descriptions. Low effort, high signal for any agent that looks for it.
2. Content Negotiation
Serve markdown when agents request it via Accept headers. Cloudflare offers this out of the box. Saves agents 80% of token overhead.
3. Structured Data
JSON-LD, Schema.org types, and FAQPage schema help both crawlers and agents understand your content. This is table stakes, 8x visibility difference for ChatGPT.
4. Crawler Access
Allow AI crawlers in robots.txt. Block training bots if you want, but keep search bots open. This is the baseline. No access means no visibility.
The first two are agent-specific. The last two help both crawlers and agents. Together, they cover the full spectrum of how AI systems interact with your content.
The Bottom Line
Dries' data is accurate: AI crawlers don't use llms.txt. But measuring llms.txt adoption by crawler behavior is like measuring the success of an API by how many web browsers access it. The audience is different.
AI agents, coding assistants, browsing agents, task automation tools, are the actual consumers of llms.txt and content negotiation. They're smaller in volume than crawlers but growing fast. They represent the future of how software interacts with web content.
"Do AI crawlers use llms.txt today?" is the wrong question. The right one: when agents become the primary way users interact with your content, will your site be ready?
Sources
- Dries Buytaert: Markdown, llms.txt, and AI Crawlers — Original analysis of crawler behavior with Cloudflare log data
- Leon Furze: Letting the Robots In — Independent replication on WordPress with similar findings
- llms.txt Specification — The original proposal by Jeremy Howard
- How Vercel Built AEO Tracking for Coding Agents — Vercel's approach to measuring agent adoption
- Cloudflare: Markdown for Agents — Content negotiation and markdown transformation feature
- IsAgentReady: The State of AEO — Key Insights from Vercel's 2026 Report