How Claude Search Selects Sources to Cite

February 03, 2026 • 9 min read

Bart Waardenburg

AI Agent Readiness Expert & Founder

Claude's web search has something ChatGPT and Google AI Overviews don't: it relies almost entirely on a single search backend. Brave Search, with an 86.7% alignment between citations and Brave's organic results. Understand that relationship and you understand how to get cited by Claude.

I dug through Anthropic's crawler documentation, API specs, third-party studies, and technical disclosures to figure out how Claude picks its sources and what makes it different from the rest.

The Brave Search Backbone

In March 2025, TechCrunch confirmed that Claude's web search runs on Brave Search. That changes quite a bit about how you optimize for Claude visibility.

The BrightEdge analysis put numbers on it: 86.7% of Claude's cited results overlap with Brave's top non-sponsored organic results. For comparison, ChatGPT shows only 26.7% alignment with Bing's top results. Claude trusts its search backend way more than ChatGPT trusts Bing.

CLAUDE + BRAVE

CHATGPT + BING

Bottom line: ranking well in Brave Search is ranking well in Claude. But Brave's index has a threshold. Content needs visits from at least 20 unique Brave browser users with data-sharing enabled before becoming eligible for indexing. That gives established domains with diverse traffic an automatic head start.

Three Crawlers, Three Purposes

Anthropic operates three separate crawlers , each with its own purpose. The documentation was last updated February 20, 2026, when the newest crawler was added.

CLAUDE-SEARCHBOT

Indexes and evaluates content quality for search results. Blocking this reduces your visibility and accuracy in Claude-powered search.

CLAUDEBOT

Crawls content for AI model training data. Can be blocked independently from search without affecting visibility.

CLAUDE-USER

Fetches pages when users explicitly ask Claude to read a specific URL. Still honors robots.txt, unlike OpenAI's equivalent.

The big difference from OpenAI: all three of Anthropic's crawlers still honor robots.txt, including Claude-User. OpenAI stopped respecting robots.txt for their ChatGPT-User bot in December 2025. Anthropic still plays by the rules. They also support the non-standard Crawl-delay directive and don't try to bypass CAPTCHAs.

As Search Engine Journal reported , this three-bot system gives site owners finer-grained control than any other AI platform. You can allow search indexing while blocking training, or allow user browsing while restricting automated crawling.

The recommended robots.txt for granular Anthropic control:

robots.txt - Allow search and browsing, block training plain

# Allow Claude search indexing
User-agent: Claude-SearchBot
Allow: /

# Allow Claude user-initiated browsing
User-agent: Claude-User
Allow: /

# Block AI model training (optional)
User-agent: ClaudeBot
Disallow: /

How Claude Actually Selects Sources

Claude's source selection is a multi-step process, documented in Anthropic's API documentation :

Decision to search: Claude autonomously decides whether to search based on three criteria - freshness (does the query need current info?), specificity (how targeted is the question?), and intent (what's the underlying purpose?)
Query execution: The Brave Search API returns the top organic results
Content evaluation: Claude filters and evaluates results based on relevance, clarity, and extractability
Iteration: This cycle can repeat up to ten times in a single conversation turn, refining the search as Claude learns more

An interesting detail from the Groundy analysis : Claude favors content that is "concise, current, and aligns closely with the user's phrasing and intent." Pages need to match conversational query patterns. Content written in a natural, question-answering style performs better than keyword-stuffed copy. Fits right into the shift from traditional SEO toward AI Engine Optimization (AEO) , where writing for how people ask questions matters more than keyword density.

Dynamic Filtering: Why Clean HTML Matters

In February 2026, Anthropic shipped something that caught my attention: dynamic filtering . Claude can now write and execute Python code to post-process raw HTML before it reaches the context window.

In practice, Claude actively strips away:

Navigation menus and sidebars
Footer content and boilerplate
Advertising and tracking markup
Irrelevant metadata

The numbers speak for themselves:

ACCURACY BOOST

TOKEN SAVINGS

Dynamic filtering is currently available only on Opus 4.6 and Sonnet 4.6 via the Claude API and Azure (not Vertex AI), and requires the code execution tool to be enabled alongside web search.

How Claude Cites Sources

Claude uses inline citations with clickable source links. Similar to ChatGPT, but different from Perplexity's footnote-heavy approach. Every web-sourced claim includes:

URL: The source page URL
Title: The source page title
Cited text: Up to 150 characters of the specific content being cited
Encrypted index: A reference for maintaining citations in multi-turn conversations

Nice detail from the API docs: citation metadata (cited_text, title, url) does not count toward input or output token usage. That saves money when building applications with Claude's web search.

Claude "only cites what it can verify" and avoids hallucinated citations. Can't verify a claim against search results? It'll either omit the citation or qualify the statement. No making up plausible-looking references.

The Citations API: A Separate Feature

A distinction that's easy to miss. Claude's web search citations (discussed above) are different from the separate Citations API launched in January 2025. That API lets developers ground Claude's responses in user-provided documents (PDFs, plain text, custom content) with precise character-level references.

Internal evaluations show the Citations API increases recall accuracy by up to 15% compared to custom prompt-based implementations. But it's a developer tool for supplied documents. It has no impact on how your website gets discovered or cited in Claude's web search.

RECALL ACCURACY INCREASE

No Publisher Licensing Deals

OpenAI has formal licensing deals with AP, Conde Nast, Financial Times, News Corp, The Atlantic, Springer, and Washington Post. Anthropic? No announced publisher licensing partnerships.

What Anthropic does have is a $1.5 billion copyright settlement (September 2025). The largest in U.S. history, covering roughly 500,000 copyrighted works. The settlement covers only past use (before August 25, 2025) and explicitly is not a licensing deal for future use.

For website owners, this means there's no "preferred publisher" list for Claude citations. Every site competes on equal footing through Brave Search rankings and content quality. That makes the technical optimization covered in this article all the more relevant.

LEVEL PLAYING FIELD

No publisher licensing deals means every website competes on equal terms. Your visibility in Claude depends entirely on Brave Search rankings and content quality, not on corporate partnerships.

How Claude Compares to ChatGPT and Google

Dimension	Claude	ChatGPT	Google AI Overviews
Search backend	Brave Search	Bing (+ Google for paid)	Google Search
Backend alignment	86.7% with Brave	26.7% with Bing	Native integration
User-browsing bot	Honors robots.txt	Ignores robots.txt (since Dec 2025)	N/A
Content processing	Dynamic filtering (strips boilerplate)	Direct content ingestion	Full index processing
Publisher deals	None	AP, Conde Nast, FT, News Corp	Various licensing agreements
Citation style	Inline clickable links	Inline links in text	Source cards with URLs
Crawl-delay support	Yes	Not documented	No

What You Can Do Today

Based on Claude's architecture, these are the things that make the biggest difference:

1. ALLOW CLAUDE-SEARCHBOT

This is the gate to Claude visibility. Blocking this bot reduces your presence in Claude's search answers.

2. OPTIMIZE FOR BRAVE

With 86.7% alignment, ranking in Brave Search is effectively ranking in Claude. Ensure Brave can index your content.

3. CLEAN YOUR HTML

Claude's dynamic filtering strips boilerplate. Clean semantic HTML with content-first structure gives you an edge.

4. WRITE CONVERSATIONALLY

Claude favors content that matches conversational query patterns. Write naturally, not keyword-stuffed.

Use semantic HTML (<article>, <main>, <section>) to help Claude's filtering understand your content structure
Server-side render your content. Claude's crawlers cannot execute client-side JavaScript
Keep content concise and current. Claude filters for relevance and freshness
Add structured data. Well-structured pages tend to rank and get cited, so structured data correlates with stronger Brave rankings that flow through to Claude. In Google AI Overviews, Wellows observed that schema correlates with a +73% selection rate , and on ChatGPT-visibility data, FAQPage schema appeared on 6.2% of cited sites versus 0.8% of non-cited ones (Insightland), a correlation, not a guarantee of citation . A 2026 Ahrefs study of 1,885 pages found that adding schema did not measurably change AI-search citations, so treat this as a correlation (cited pages tend to be well-structured) rather than schema directly driving citations ( Ahrefs )
Maintain an XML sitemap. Aids content discovery for all crawlers including Claude-SearchBot

Wrapping Up

Of the three major AI platforms, Claude's source selection is the most transparent. 86.7% alignment with Brave Search. No mystery about how to get cited: rank well in Brave, allow Claude-SearchBot, and write clean, well-structured content that matches how people naturally ask questions.

The advantages of optimizing for Claude: Anthropic respects all robots.txt directives (including for user-initiated browsing), offers the most fine-grained crawler control, and has no preferred publisher list. A level playing field where content quality and technical execution determine visibility.

For the full picture across AI platforms, read our analyses of how ChatGPT chooses which websites to cite and how Google AI Overviews selects sources . For broader trends, see key insights from Vercel's 2026 AEO report .

Sources

Anthropic: Crawler Documentation -Three-crawler system and robots.txt guidance
Anthropic: Web Search Tool API Documentation -Official search tool specifications
Anthropic: Web Search Now Available Globally -Official launch announcement
Anthropic: Improved Web Search with Dynamic Filtering -Dynamic filtering technical details
Anthropic: Introducing Citations API -Document-level citations feature
TechCrunch: Anthropic Uses Brave for Web Search -Brave Search backend confirmation
BrightEdge: The Ultimate Guide to Claude Search -86.7% Brave Search alignment analysis
Groundy: Claude's Web Search Changes Everything -Content selection behavioral analysis
Search Engine Journal: Anthropic's Claude Bots -Three-bot system analysis
Search Engine Land: Anthropic Clarifies Claude Bots -Crawler documentation updates
Ropes & Gray: Anthropic Copyright Settlement -$1.5B settlement analysis

How Claude Search Selects Sources to Cite

The Brave Search Backbone

Three Crawlers, Three Purposes

CLAUDE-SEARCHBOT

CLAUDEBOT

CLAUDE-USER

How Claude Actually Selects Sources

Dynamic Filtering: Why Clean HTML Matters

How Claude Cites Sources

The Citations API: A Separate Feature

No Publisher Licensing Deals

LEVEL PLAYING FIELD

How Claude Compares to ChatGPT and Google

What You Can Do Today

1. ALLOW CLAUDE-SEARCHBOT

2. OPTIMIZE FOR BRAVE

3. CLEAN YOUR HTML

4. WRITE CONVERSATIONALLY

Wrapping Up

Sources

SCAN YOUR WEBSITE

RELATED ARTICLES

Does Schema Markup Get You Cited by AI? What the Data Actually Shows

Content Negotiation for AI Agents: Why Sentry Serves Markdown Over HTML

Cloudflare /crawl Endpoint: One API Call to Crawl Any Website

EXPLORE MORE

RANKINGS

COMPARE

ABOUT