Skip to content

How Google AI Overviews Selects Sources to Cite

11 min read
Bart Waardenburg

Bart Waardenburg

AI Agent Readiness Expert & Founder

Google AI Overviews now appear for roughly 30% of U.S. desktop searches. On mobile, frequency is growing 475% year-over-year. These AI-generated summaries sit above traditional search results and cite specific websites. That changes the value of organic rankings quite a bit.

But Google's official documentation says "there are no additional requirements" to appear in AI Overviews. Is that the whole story? I analyzed official sources, large-scale studies covering 500+ million keywords, and 16 months of longitudinal data to find out what actually gets you cited. This is part of our series on how AI platforms select sources, alongside our guides on how ChatGPT selects sources and how Claude selects sources .

The Query Fan-Out Technique

The most distinctive thing about Google AI Overviews is query fan-out. Instead of running a single search, the Gemini-powered system breaks your question into subtopics and runs multiple searches simultaneously.

From the official Google blog :

With the Gemini 3 upgrade , this technique became even more sophisticated. Google's latest model "more intelligently understands user intent" and "can find new content that it may have previously missed."

A study analyzing 173,902 URLs and 33,000 fan-out queries quantified the impact:

MAIN + FAN-OUT
0
FAN-OUT BOOST
0
CORRELATION
0.77
  • Pages ranking for both the main query and at least one fan-out query accounted for 51% of AI Overview citations
  • Ranking for fan-out queries is 49% more likely to earn a citation than ranking for the head term alone
  • Spearman correlation of 0.77 between fan-out breadth and citation likelihood

What this means in practice: comprehensive content that covers related subtopics performs significantly better. A page that answers the main question and naturally covers related angles is far more likely to be cited than one narrowly focused on a single keyword. The exact opposite of the hyper-focused, single-keyword content that dominated traditional SEO.

The Organic Ranking Connection

Multiple large-scale studies show a strong but nuanced relationship between traditional organic rankings and AI Overview citations.

What the Studies Found

seoClarity analyzed 500+ million keywords and found that 97% of AI Overviews cite at least one source from the top 20 organic results. The #1 position appears more than half the time.

Originality.ai found that 52% of AI Overview citations come from top-10 Google results. The top-ranked document alone has a 58% chance of being cited. By the top 30, nearly 90% of all citations are covered.

BrightEdge's 16-month longitudinal study across 9 industries found that AI Overview citations from organically-ranking pages grew from 32.3% to 54.5% (a 69% relative increase). Only 16.7% of citations came from top-10 results, meaning positions 21-100 drive most of the overlap growth.

FROM TOP 20
0
NOT IN TOP 10
0
YMYL OVERLAP
0

The surprising finding: 68% of cited pages didn't rank in the top 10 for either the main query or any fan-out query. AI Overviews give deeper-ranking, authoritative content a platform it never had in traditional search.

Industry Variation Is Significant

The BrightEdge study showed major differences by industry:

  • Healthcare, Education, B2B Tech, Insurance: 68-75% overlap between AI Overview citations and organic rankings (trust-sensitive YMYL content)
  • E-commerce: Only 22.9% overlap with virtually no change over 16 months
  • Restaurants and Travel: Under 24% overlap

What Google Says vs. What the Data Shows

Google's official documentation keeps things deliberately vague about what it takes to appear in AI Overviews:

  • "There are no additional requirements to appear in AI Overviews or AI Mode."
  • "You don't need to create new machine readable files, AI text files, or markup."
  • "There's also no special schema.org structured data that you need to add."
  • To be eligible, a page must be indexed and eligible to be shown in Google Search with a snippet.

But third-party research keeps showing the same thing: certain factors dramatically improve your chances. There's a gap between what's required (nothing special) and what actually works (quite a lot):

Structured Data: +73% Selection Rate

Google says no special schema is needed. But Wellows' analysis found that schema markup correlates with a +73% selection rate for AI Overview citations. A Search Engine Land analysis found that "only the page with well-implemented schema appeared in an AI Overview and achieved the best organic ranking, suggesting that schema quality, not just its presence, may play a role."

So how does that work? Google says no special structured data is required. But structured data improves organic rankings (the primary pathway to AI Overview citations), helps Google understand entities, and makes content more machine-parseable. All of which improve citation likelihood indirectly. The pattern holds across platforms: ChatGPT research shows sites with FAQPage schema are 8× more likely to be cited than those without. More on why structured data matters for all AI platforms in our guide to AI agent readiness .

SELECTION RATE WITH STRUCTURED DATA
+73%

E-E-A-T: 96% of Citations Come from Strong Signals

Wellows' research found that 96% of AI Overview citations come from sources with strong E-E-A-T signals (Experience, Expertise, Authoritativeness, Trustworthiness).

  • Pages with expert authorship are 3.2x more likely to be cited than generic staff-written content ( Relixir )
  • Author Schema that connects content to real human experts in the Knowledge Graph strengthens these signals
  • For YMYL topics (healthcare, finance), E-E-A-T verification is especially important. These industries show the highest overlap with organic rankings

Entity Richness: 4.8x Higher Selection

Pages with 15+ recognized entities show 4.8x higher AI Overview selection probability. Google's Knowledge Graph stores entities and their relationships: people, products, organizations, concepts. Content with lots of well-connected entities is simply easier for Google's AI to verify and cite.

Google's Livegraph system assigns confidence weights to every identified triple (subject-predicate-object). Pages without strong entity signals get filtered out early, even if they're well-written. The AI needs to connect content to verified entities in the Knowledge Graph.

ENTITY RICHNESS SELECTION BOOST
4.8x

Content Format: Answer Units of 134-167 Words

Content that matches how AI Overviews are structured performs best:

  • Self-contained answer units of 134-167 words perform best
  • Pages using lists, tables, or FAQs align with how AI summaries are structured
  • 44.2% of citations come from the first 30% of text . Front-load key information
  • Multi-modal content (text + images + video) shows +156% selection rate
  • Content scoring 8.5/10+ on semantic completeness is 4.2x more likely to be cited
WORDS PER ANSWER UNIT
134-167
FROM FIRST 30% OF PAGE
0
MULTI-MODAL BOOST
+156%

The Google-Extended Confusion

A common misconception: blocking Google-Extended in robots.txt does not affect AI Overviews.

  • Google-Extended is a robots.txt user-agent token specifically for AI model training data collection
  • AI Overviews use standard Googlebot for crawling, not Google-Extended
  • Blocking Google-Extended has no impact on search rankings, indexation, or AI Overview visibility
  • Only blocking Googlebot itself would remove you from search entirely

As confirmed by Playwire's analysis , you can safely block AI training crawlers without hurting your AI search visibility. The opt-out situation is still moving though. As of early 2026, Google is exploring ways to let sites opt out of AI Overviews specifically, separate from traditional search.

Most Cited Domains in AI Overviews

The SurferSEO AI Citation Report (36 million AI Overviews, 46 million citations) shows that video content dominates:

  1. YouTube (~23.3%), the most cited domain across every vertical
  2. Wikipedia (~18.4%)
  3. Google.com (~16.4%)
  4. Reddit, LinkedIn, Facebook round out the top tier

Domain-specific experts like NIH, Shopify, and ScienceDirect show up as trusted names within their niches. AI Overviews distribute citations more evenly among niche sites than ChatGPT does.

The Semrush study (150,000+ citations) found that the top 20 domains account for 66% of all citations. Still concentrated, but leaving real room for specialized, authoritative content.

The Semrush AI Overviews study (10+ million keywords tracked from January-November 2025) found that 84% of AI Overviews appear for informational queries, 12.5% for transactional keywords (rising trend), and just 0.01% for local keywords.

When Do AI Overviews Appear?

AI Overviews don't appear for every search. Knowing the trigger patterns helps you focus your optimization:

  • seoClarity : 30% of U.S. desktop keywords trigger AI Overviews (September 2025)
  • Mobile AI Overview frequency grew 475% year-over-year
  • AI Overviews peaked at ~25% in July 2025, then declined to 15.69% by November. Google is being more selective
  • Average AI Overview text length dropped 70% (from ~5,300 to ~1,600 characters), producing shorter, more focused summaries
DESKTOP TRIGGER RATE
0
YOY MOBILE GROWTH
0
LENGTH REDUCTION
-70%

Google also claims positive engagement: according to the official Google blog , "when people click from search results pages with AI Overviews, these clicks are higher quality (meaning, users are more likely to spend more time on the site)."

What You Can Do Today

1. ADD QUALITY SCHEMA

Implement JSON-LD with Organization, Article, FAQPage, and Author schema. Quality matters more than quantity - schema with errors can hurt.

2. COVER SUBTOPICS

Write comprehensive content that covers related angles. The fan-out technique rewards breadth, not just depth on a single keyword.

3. BUILD E-E-A-T

Add author bios, expert quotes, and connect content to real entities. Expert authorship provides a 3.2x citation boost.

4. FORMAT FOR EXTRACTION

Use lists, tables, FAQs, and self-contained answer units of 134-167 words. These formats align with how AI Overviews are structured.

  • Front-load key information . 44% of citations come from the first third of content
  • Use specific entities . Pages with 15+ recognized entities show 4.8x higher selection
  • Add images and video. Multi-modal content shows +156% selection rate
  • Don't block Googlebot. Google-Extended is separate and safe to block
  • Server-side render your content . Make sure important information is in the HTML source. See our deep dive on how AI agents see your website

Wrapping Up

Google officially says "just do good SEO." The data says that's necessary but not sufficient. Structured data, E-E-A-T signals, entity richness, comprehensive content, and the right formatting all measurably improve citation likelihood. Especially outside YMYL topics, where organic ranking overlap is lowest.

The biggest opportunity? 68% of cited pages don't rank in the top 10. AI Overviews are giving deeper-ranking authoritative content a platform. If you've been stuck on page two of Google, AI Overviews might be your way onto page one's equivalent. More on the shift from traditional SEO to AI optimization in our SEO vs AEO comparison .

Sources

Ready to check?

SCAN YOUR WEBSITE

Get your AI agent readiness score with actionable recommendations across 5 categories.

  • Free instant scan with letter grade
  • 5 categories, 47 checkpoints
  • Code examples for every recommendation

RELATED ARTICLES

Continue reading about AI agent readiness and web optimization.

Content Negotiation for AI Agents: Why Sentry Serves Markdown Over HTML
9 min read

Content Negotiation for AI Agents: Why Sentry Serves Markdown Over HTML

Sentry co-founder David Cramer shows how content negotiation — a 25-year-old HTTP standard — saves AI agents 80% of tokens. We break down the implementation: Accept headers, markdown delivery, authenticated page redirects, and what this means for every website preparing for agent traffic.

ai-agents seo getting-started
Cloudflare /crawl Endpoint: One API Call to Crawl Any Website
9 min read

Cloudflare /crawl Endpoint: One API Call to Crawl Any Website

Cloudflare launched a /crawl endpoint that crawls entire websites with one API call — returning HTML, Markdown, or AI-extracted JSON. We break down what this means for AI agent readiness: why your robots.txt, sitemap, semantic HTML, and server-side rendering now matter more than ever.

ai-agents seo getting-started
AI Crawlers Ignore llms.txt — But AI Agents Don't
9 min read

AI Crawlers Ignore llms.txt — But AI Agents Don't

Dries Buytaert's data shows zero AI crawlers use llms.txt. But he measured the wrong thing. Crawlers scrape for training data — agents complete tasks. We break down why the crawler vs agent distinction matters, which coding agents already use llms.txt and content negotiation, and what you should implement today.

ai-agents seo getting-started

EXPLORE MORE

Most websites score under 45. Find out where you stand.

RANKINGS
SEE HOW OTHERS SCORE

RANKINGS

Browse AI readiness scores for scanned websites.
COMPARE
HEAD TO HEAD

COMPARE

Compare two websites side-by-side across all 5 categories and 47 checkpoints.
ABOUT
HOW WE MEASURE

ABOUT

Learn about our 5-category scoring methodology.