Skip to content

How Claude Search Selects Sources to Cite

9 min read
Bart Waardenburg

Bart Waardenburg

AI Agent Readiness Expert & Founder

Claude's web search has something ChatGPT and Google AI Overviews don't: it relies almost entirely on a single search backend. Brave Search, with an 86.7% alignment between citations and Brave's organic results. Understand that relationship and you understand how to get cited by Claude.

I dug through Anthropic's crawler documentation, API specs, third-party studies, and technical disclosures to figure out how Claude picks its sources and what makes it different from the rest.

The Brave Search Backbone

In March 2025, TechCrunch confirmed that Claude's web search runs on Brave Search. That changes quite a bit about how you optimize for Claude visibility.

The BrightEdge analysis put numbers on it: 86.7% of Claude's cited results overlap with Brave's top non-sponsored organic results. For comparison, ChatGPT shows only 26.7% alignment with Bing's top results. Claude trusts its search backend way more than ChatGPT trusts Bing.

CLAUDE + BRAVE
0
CHATGPT + BING
0

Bottom line: ranking well in Brave Search is ranking well in Claude. But Brave's index has a threshold. Content needs visits from at least 20 unique Brave browser users with data-sharing enabled before becoming eligible for indexing. That gives established domains with diverse traffic an automatic head start.

Three Crawlers, Three Purposes

Anthropic operates three separate crawlers , each with its own purpose. The documentation was last updated February 20, 2026, when the newest crawler was added.

CLAUDE-SEARCHBOT

Indexes and evaluates content quality for search results. Blocking this reduces your visibility and accuracy in Claude-powered search.

CLAUDEBOT

Crawls content for AI model training data. Can be blocked independently from search without affecting visibility.

CLAUDE-USER

Fetches pages when users explicitly ask Claude to read a specific URL. Still honors robots.txt, unlike OpenAI's equivalent.

The big difference from OpenAI: all three of Anthropic's crawlers still honor robots.txt, including Claude-User. OpenAI stopped respecting robots.txt for their ChatGPT-User bot in December 2025. Anthropic still plays by the rules. They also support the non-standard Crawl-delay directive and don't try to bypass CAPTCHAs.

As Search Engine Journal reported , this three-bot system gives site owners finer-grained control than any other AI platform. You can allow search indexing while blocking training, or allow user browsing while restricting automated crawling.

The recommended robots.txt for granular Anthropic control:

robots.txt - Allow search and browsing, block training plain
# Allow Claude search indexing
User-agent: Claude-SearchBot
Allow: /

# Allow Claude user-initiated browsing
User-agent: Claude-User
Allow: /

# Block AI model training (optional)
User-agent: ClaudeBot
Disallow: /

How Claude Actually Selects Sources

Claude's source selection is a multi-step process, documented in Anthropic's API documentation :

  1. Decision to search: Claude autonomously decides whether to search based on three criteria - freshness (does the query need current info?), specificity (how targeted is the question?), and intent (what's the underlying purpose?)
  2. Query execution: The Brave Search API returns the top organic results
  3. Content evaluation: Claude filters and evaluates results based on relevance, clarity, and extractability
  4. Iteration: This cycle can repeat up to ten times in a single conversation turn, refining the search as Claude learns more

An interesting detail from the Groundy analysis : Claude favors content that is "concise, current, and aligns closely with the user's phrasing and intent." Pages need to match conversational query patterns. Content written in a natural, question-answering style performs better than keyword-stuffed copy. Fits right into the shift from traditional SEO toward AI Engine Optimization (AEO) , where writing for how people ask questions matters more than keyword density.

Dynamic Filtering: Why Clean HTML Matters

In February 2026, Anthropic shipped something that caught my attention: dynamic filtering . Claude can now write and execute Python code to post-process raw HTML before it reaches the context window.

In practice, Claude actively strips away:

  • Navigation menus and sidebars
  • Footer content and boilerplate
  • Advertising and tracking markup
  • Irrelevant metadata

The numbers speak for themselves:

ACCURACY BOOST
0
TOKEN SAVINGS
0

Dynamic filtering is currently available only on Opus 4.6 and Sonnet 4.6 via the Claude API and Azure (not Vertex AI), and requires the code execution tool to be enabled alongside web search.

How Claude Cites Sources

Claude uses inline citations with clickable source links. Similar to ChatGPT, but different from Perplexity's footnote-heavy approach. Every web-sourced claim includes:

  • URL: The source page URL
  • Title: The source page title
  • Cited text: Up to 150 characters of the specific content being cited
  • Encrypted index: A reference for maintaining citations in multi-turn conversations

Nice detail from the API docs: citation metadata (cited_text, title, url) does not count toward input or output token usage. That saves money when building applications with Claude's web search.

Claude "only cites what it can verify" and avoids hallucinated citations. Can't verify a claim against search results? It'll either omit the citation or qualify the statement. No making up plausible-looking references.

The Citations API: A Separate Feature

A distinction that's easy to miss. Claude's web search citations (discussed above) are different from the separate Citations API launched in January 2025. That API lets developers ground Claude's responses in user-provided documents (PDFs, plain text, custom content) with precise character-level references.

Internal evaluations show the Citations API increases recall accuracy by up to 15% compared to custom prompt-based implementations. But it's a developer tool for supplied documents. It has no impact on how your website gets discovered or cited in Claude's web search.

RECALL ACCURACY INCREASE
0

No Publisher Licensing Deals

OpenAI has formal licensing deals with AP, Conde Nast, Financial Times, News Corp, The Atlantic, Springer, and Washington Post. Anthropic? No announced publisher licensing partnerships.

What Anthropic does have is a $1.5 billion copyright settlement (September 2025). The largest in U.S. history, covering roughly 500,000 copyrighted works. The settlement covers only past use (before August 25, 2025) and explicitly is not a licensing deal for future use.

For website owners, this means there's no "preferred publisher" list for Claude citations. Every site competes on equal footing through Brave Search rankings and content quality. That makes the technical optimization covered in this article all the more relevant.

LEVEL PLAYING FIELD

No publisher licensing deals means every website competes on equal terms. Your visibility in Claude depends entirely on Brave Search rankings and content quality, not on corporate partnerships.

How Claude Compares to ChatGPT and Google

Dimension Claude ChatGPT Google AI Overviews
Search backend Brave Search Bing (+ Google for paid) Google Search
Backend alignment 86.7% with Brave 26.7% with Bing Native integration
User-browsing bot Honors robots.txt Ignores robots.txt (since Dec 2025) N/A
Content processing Dynamic filtering (strips boilerplate) Direct content ingestion Full index processing
Publisher deals None AP, Conde Nast, FT, News Corp Various licensing agreements
Citation style Inline clickable links Inline links in text Source cards with URLs
Crawl-delay support Yes Not documented No

What You Can Do Today

Based on Claude's architecture, these are the things that make the biggest difference:

1. ALLOW CLAUDE-SEARCHBOT

This is the gate to Claude visibility. Blocking this bot reduces your presence in Claude's search answers.

2. OPTIMIZE FOR BRAVE

With 86.7% alignment, ranking in Brave Search is effectively ranking in Claude. Ensure Brave can index your content.

3. CLEAN YOUR HTML

Claude's dynamic filtering strips boilerplate. Clean semantic HTML with content-first structure gives you an edge.

4. WRITE CONVERSATIONALLY

Claude favors content that matches conversational query patterns. Write naturally, not keyword-stuffed.

  • Use semantic HTML (<article>, <main>, <section>) to help Claude's filtering understand your content structure
  • Server-side render your content. Claude's crawlers cannot execute client-side JavaScript
  • Keep content concise and current. Claude filters for relevance and freshness
  • Add structured data. While Claude relies on Brave, structured data improves Brave rankings which flows through to Claude. Across platforms, schema markup shows a +73% selection rate in AI Overviews and sites with FAQPage schema are 8× more likely to be cited by ChatGPT
  • Maintain an XML sitemap. Aids content discovery for all crawlers including Claude-SearchBot

Wrapping Up

Of the three major AI platforms, Claude's source selection is the most transparent. 86.7% alignment with Brave Search. No mystery about how to get cited: rank well in Brave, allow Claude-SearchBot, and write clean, well-structured content that matches how people naturally ask questions.

The advantages of optimizing for Claude: Anthropic respects all robots.txt directives (including for user-initiated browsing), offers the most fine-grained crawler control, and has no preferred publisher list. A level playing field where content quality and technical execution determine visibility.

For the full picture across AI platforms, read our analyses of how ChatGPT chooses which websites to cite and how Google AI Overviews selects sources . For broader trends, see key insights from Vercel's 2026 AEO report .

Sources

Ready to check?

SCAN YOUR WEBSITE

Get your AI agent readiness score with actionable recommendations across 5 categories.

  • Free instant scan with letter grade
  • 5 categories, 47 checkpoints
  • Code examples for every recommendation

RELATED ARTICLES

Continue reading about AI agent readiness and web optimization.

Content Negotiation for AI Agents: Why Sentry Serves Markdown Over HTML
9 min read

Content Negotiation for AI Agents: Why Sentry Serves Markdown Over HTML

Sentry co-founder David Cramer shows how content negotiation — a 25-year-old HTTP standard — saves AI agents 80% of tokens. We break down the implementation: Accept headers, markdown delivery, authenticated page redirects, and what this means for every website preparing for agent traffic.

ai-agents seo getting-started
Cloudflare /crawl Endpoint: One API Call to Crawl Any Website
9 min read

Cloudflare /crawl Endpoint: One API Call to Crawl Any Website

Cloudflare launched a /crawl endpoint that crawls entire websites with one API call — returning HTML, Markdown, or AI-extracted JSON. We break down what this means for AI agent readiness: why your robots.txt, sitemap, semantic HTML, and server-side rendering now matter more than ever.

ai-agents seo getting-started
AI Crawlers Ignore llms.txt — But AI Agents Don't
9 min read

AI Crawlers Ignore llms.txt — But AI Agents Don't

Dries Buytaert's data shows zero AI crawlers use llms.txt. But he measured the wrong thing. Crawlers scrape for training data — agents complete tasks. We break down why the crawler vs agent distinction matters, which coding agents already use llms.txt and content negotiation, and what you should implement today.

ai-agents seo getting-started

EXPLORE MORE

Most websites score under 45. Find out where you stand.

RANKINGS
SEE HOW OTHERS SCORE

RANKINGS

Browse AI readiness scores for scanned websites.
COMPARE
HEAD TO HEAD

COMPARE

Compare two websites side-by-side across all 5 categories and 47 checkpoints.
ABOUT
HOW WE MEASURE

ABOUT

Learn about our 5-category scoring methodology.