How Google AI Overviews Selects Sources to Cite

February 06, 2026 • 11 min read

Bart Waardenburg

AI Agent Readiness Expert & Founder

Google AI Overviews now appear for roughly 30% of U.S. desktop searches. On mobile, frequency is growing 475% year-over-year. These AI-generated summaries sit above traditional search results and cite specific websites. That changes the value of organic rankings quite a bit.

But Google's official documentation says "there are no additional requirements" to appear in AI Overviews. Is that the whole story? I analyzed official sources, large-scale studies covering 500+ million keywords, and 16 months of longitudinal data to find out what actually gets you cited. This is part of our series on how AI platforms select sources, alongside our guides on how ChatGPT selects sources and how Claude selects sources .

The Query Fan-Out Technique

The most distinctive thing about Google AI Overviews is query fan-out. Instead of running a single search, the Gemini-powered system breaks your question into subtopics and runs multiple searches simultaneously.

From the official Google blog :

With the Gemini 3 upgrade , this technique became even more sophisticated. Google's latest model "more intelligently understands user intent" and "can find new content that it may have previously missed."

A study analyzing 173,902 URLs and 33,000 fan-out queries quantified the impact:

MAIN + FAN-OUT

FAN-OUT BOOST

CORRELATION

0.77

Pages ranking for both the main query and at least one fan-out query accounted for 51% of AI Overview citations
Ranking for fan-out queries is 49% more likely to earn a citation than ranking for the head term alone
Spearman correlation of 0.77 between fan-out breadth and citation likelihood

What this means in practice: comprehensive content that covers related subtopics performs significantly better. A page that answers the main question and naturally covers related angles is far more likely to be cited than one narrowly focused on a single keyword. The exact opposite of the hyper-focused, single-keyword content that dominated traditional SEO.

The Organic Ranking Connection

Multiple large-scale studies show a strong but nuanced relationship between traditional organic rankings and AI Overview citations.

What the Studies Found

seoClarity analyzed 500+ million keywords and found that 97% of AI Overviews cite at least one source from the top 20 organic results. The #1 position appears more than half the time.

Originality.ai found that 52% of AI Overview citations come from top-10 Google results. The top-ranked document alone has a 58% chance of being cited. By the top 30, nearly 90% of all citations are covered.

BrightEdge's 16-month longitudinal study across 9 industries found that AI Overview citations from organically-ranking pages grew from 32.3% to 54.5% (a 69% relative increase). Only 16.7% of citations came from top-10 results, meaning positions 21-100 drive most of the overlap growth.

FROM TOP 20

NOT IN TOP 10

YMYL OVERLAP

The surprising finding: 68% of cited pages didn't rank in the top 10 for either the main query or any fan-out query. AI Overviews give deeper-ranking, authoritative content a platform it never had in traditional search.

Industry Variation Is Significant

The BrightEdge study showed major differences by industry:

Healthcare, Education, B2B Tech, Insurance: 68-75% overlap between AI Overview citations and organic rankings (trust-sensitive YMYL content)
E-commerce: Only 22.9% overlap with virtually no change over 16 months
Restaurants and Travel: Under 24% overlap

What Google Says vs. What the Data Shows

Google's official documentation keeps things deliberately vague about what it takes to appear in AI Overviews:

"There are no additional requirements to appear in AI Overviews or AI Mode."
"You don't need to create new machine readable files, AI text files, or markup."
"There's also no special schema.org structured data that you need to add."
To be eligible, a page must be indexed and eligible to be shown in Google Search with a snippet.

But third-party research keeps showing the same thing: certain factors dramatically improve your chances. There's a gap between what's required (nothing special) and what actually works (quite a lot):

Structured Data: +73% Selection Rate

Google says no special schema is needed. But Wellows' analysis found that schema markup correlates with a +73% selection rate for AI Overview citations. A Search Engine Land analysis found that "only the page with well-implemented schema appeared in an AI Overview and achieved the best organic ranking, suggesting that schema quality, not just its presence, may play a role."

So how does that work? Google says no special structured data is required. But structured data improves organic rankings (the primary pathway to AI Overview citations), helps Google understand entities, and makes content more machine-parseable. All of which improve citation likelihood indirectly. The pattern holds across platforms: ChatGPT research shows sites with FAQPage schema are 8× more likely to be cited than those without. More on why structured data matters for all AI platforms in our guide to AI agent readiness .

SELECTION RATE WITH STRUCTURED DATA

+73%

E-E-A-T: 96% of Citations Come from Strong Signals

Wellows' research found that 96% of AI Overview citations come from sources with strong E-E-A-T signals (Experience, Expertise, Authoritativeness, Trustworthiness).

Pages with expert authorship are 3.2x more likely to be cited than generic staff-written content ( Relixir )
Author Schema that connects content to real human experts in the Knowledge Graph strengthens these signals
For YMYL topics (healthcare, finance), E-E-A-T verification is especially important. These industries show the highest overlap with organic rankings

Entity Richness: 4.8x Higher Selection

Pages with 15+ recognized entities show 4.8x higher AI Overview selection probability. Google's Knowledge Graph stores entities and their relationships: people, products, organizations, concepts. Content with lots of well-connected entities is simply easier for Google's AI to verify and cite.

Google's Livegraph system assigns confidence weights to every identified triple (subject-predicate-object). Pages without strong entity signals get filtered out early, even if they're well-written. The AI needs to connect content to verified entities in the Knowledge Graph.

ENTITY RICHNESS SELECTION BOOST

4.8x

Content Format: Answer Units of 134-167 Words

Content that matches how AI Overviews are structured performs best:

Self-contained answer units of 134-167 words perform best
Pages using lists, tables, or FAQs align with how AI summaries are structured
44.2% of citations come from the first 30% of text . Front-load key information
Multi-modal content (text + images + video) shows +156% selection rate
Content scoring 8.5/10+ on semantic completeness is 4.2x more likely to be cited

WORDS PER ANSWER UNIT

134-167

FROM FIRST 30% OF PAGE

MULTI-MODAL BOOST

+156%

The Google-Extended Confusion

A common misconception: blocking Google-Extended in robots.txt does not affect AI Overviews.

Google-Extended is a robots.txt user-agent token specifically for AI model training data collection
AI Overviews use standard Googlebot for crawling, not Google-Extended
Blocking Google-Extended has no impact on search rankings, indexation, or AI Overview visibility
Only blocking Googlebot itself would remove you from search entirely

As confirmed by Playwire's analysis , you can safely block AI training crawlers without hurting your AI search visibility. The opt-out situation is still moving though. As of early 2026, Google is exploring ways to let sites opt out of AI Overviews specifically, separate from traditional search.

Most Cited Domains in AI Overviews

The SurferSEO AI Citation Report (36 million AI Overviews, 46 million citations) shows that video content dominates:

YouTube (~23.3%), the most cited domain across every vertical
Wikipedia (~18.4%)
Google.com (~16.4%)
Reddit, LinkedIn, Facebook round out the top tier

Domain-specific experts like NIH, Shopify, and ScienceDirect show up as trusted names within their niches. AI Overviews distribute citations more evenly among niche sites than ChatGPT does.

The Semrush study (150,000+ citations) found that the top 20 domains account for 66% of all citations. Still concentrated, but leaving real room for specialized, authoritative content.

The Semrush AI Overviews study (10+ million keywords tracked from January-November 2025) found that 84% of AI Overviews appear for informational queries, 12.5% for transactional keywords (rising trend), and just 0.01% for local keywords.

When Do AI Overviews Appear?

AI Overviews don't appear for every search. Knowing the trigger patterns helps you focus your optimization:

seoClarity : 30% of U.S. desktop keywords trigger AI Overviews (September 2025)
Mobile AI Overview frequency grew 475% year-over-year
AI Overviews peaked at ~25% in July 2025, then declined to 15.69% by November. Google is being more selective
Average AI Overview text length dropped 70% (from ~5,300 to ~1,600 characters), producing shorter, more focused summaries

DESKTOP TRIGGER RATE

YOY MOBILE GROWTH

LENGTH REDUCTION

-70%

Google also claims positive engagement: according to the official Google blog , "when people click from search results pages with AI Overviews, these clicks are higher quality (meaning, users are more likely to spend more time on the site)."

What You Can Do Today

1. ADD QUALITY SCHEMA

Implement JSON-LD with Organization, Article, FAQPage, and Author schema. Quality matters more than quantity - schema with errors can hurt.

2. COVER SUBTOPICS

Write comprehensive content that covers related angles. The fan-out technique rewards breadth, not just depth on a single keyword.

3. BUILD E-E-A-T

Add author bios, expert quotes, and connect content to real entities. Expert authorship provides a 3.2x citation boost.

4. FORMAT FOR EXTRACTION

Use lists, tables, FAQs, and self-contained answer units of 134-167 words. These formats align with how AI Overviews are structured.

Front-load key information . 44% of citations come from the first third of content
Use specific entities . Pages with 15+ recognized entities show 4.8x higher selection
Add images and video. Multi-modal content shows +156% selection rate
Don't block Googlebot. Google-Extended is separate and safe to block
Server-side render your content . Make sure important information is in the HTML source. See our deep dive on how AI agents see your website

Wrapping Up

Google officially says "just do good SEO." The data says that's necessary but not sufficient. Structured data, E-E-A-T signals, entity richness, comprehensive content, and the right formatting all measurably improve citation likelihood. Especially outside YMYL topics, where organic ranking overlap is lowest.

The biggest opportunity? 68% of cited pages don't rank in the top 10. AI Overviews are giving deeper-ranking authoritative content a platform. If you've been stuck on page two of Google, AI Overviews might be your way onto page one's equivalent. More on the shift from traditional SEO to AI optimization in our SEO vs AEO comparison .

Sources

Google: AI Features and Your Website -Official documentation on AI Overviews requirements
Google Blog: Expanding AI Overviews and AI Mode -Official description of the fan-out technique
Google Blog: Gemini 3 in Search -Latest model upgrades for AI Overviews
Google Blog: AI in Search Driving Higher Quality Clicks -Click quality from AI Overviews
Search Engine Land: Fan-Out Rankings Study -173,902 URLs analysis
seoClarity: AI Overviews Impact Study -500+ million keywords analysis
Originality.ai: Google Ranking and AI Citations -Organic ranking correlation
BrightEdge: 16 Months of AI Overviews -Longitudinal overlap study by industry
Wellows: Google AI Overviews Ranking Factors -Schema, E-E-A-T, and entity correlation
Relixir: How to Earn Citations in AI Overviews -Expert authorship impact
SurferSEO: AI Citation Report -36 million AI Overviews analyzed
Semrush: Most Cited Domains by AI -150,000+ citations analysis
Semrush: AI Overviews Study -10+ million keywords tracking
Search Engine Land: Schema and AI Overviews Visibility -Schema quality impact
DataDome: Google-Extended Explained -Training vs. search crawler distinction

How Google AI Overviews Selects Sources to Cite

The Query Fan-Out Technique

The Organic Ranking Connection

What the Studies Found

Industry Variation Is Significant

What Google Says vs. What the Data Shows

Structured Data: +73% Selection Rate

E-E-A-T: 96% of Citations Come from Strong Signals

Entity Richness: 4.8x Higher Selection

Content Format: Answer Units of 134-167 Words

The Google-Extended Confusion

Most Cited Domains in AI Overviews

When Do AI Overviews Appear?

What You Can Do Today

1. ADD QUALITY SCHEMA

2. COVER SUBTOPICS

3. BUILD E-E-A-T

4. FORMAT FOR EXTRACTION

Wrapping Up

Sources

SCAN YOUR WEBSITE

RELATED ARTICLES

Content Negotiation for AI Agents: Why Sentry Serves Markdown Over HTML

Cloudflare /crawl Endpoint: One API Call to Crawl Any Website

AI Crawlers Ignore llms.txt — But AI Agents Don't

EXPLORE MORE

RANKINGS

COMPARE

ABOUT