Skip to content

Does Schema Markup Get You Cited by AI? What the Data Actually Shows

9 min read
Bart Waardenburg

Bart Waardenburg

AI Agent Readiness Expert & Founder

Open almost any guide to AI search optimization and the same advice sits near the top: add JSON-LD, mark up your FAQs, and the AI engines will cite you. It sounds intuitive. Structured data is machine-readable, AI is a machine, so structured data must help AI find and cite you. In 2026 that assumption finally got tested at scale, and the result is uncomfortable for anyone selling schema as a citation lever.

This is the post that reconciles the data. We separate what structured data is actually proven to do from what it is merely correlated with, using a controlled study of 1,885 pages, a retrieval experiment across five AI systems, and the citation research everyone keeps quoting. If you want the broader picture first, start with the guide on SEO vs AEO .

The experiment: schema barely moved citations

In 2026, Ahrefs ran the test the industry had been avoiding. They tracked 1,885 pages that added JSON-LD schema between August 2025 and March 2026, matched them against roughly 4,000 control pages that did not, and measured the change in citations across Google AI Overviews, Google AI Mode, and ChatGPT. If schema were a citation lever, the pages that added it should have pulled ahead.

They did not.

Google AI Overviews
-4.6%

Citations after adding schema, versus controls

Google AI Mode
+2.4%

Too small to separate from random noise

ChatGPT
+2.2%

Too small to separate from random noise

AI Overviews actually dipped, and the small positive moves on AI Mode and ChatGPT were within the range of random variation. Adding schema produced no meaningful lift on any platform.

Why correlation looked like causation

If schema does not cause citations, why does every case study show cited sites covered in structured data? Because the sites that get cited tend to be well-built, and well-built sites tend to have schema. The markup rides along with everything that actually earns the citation: authority, depth, freshness, clean structure. Three figures get quoted constantly, and all three are correlations, not levers.

FAQPage presence
6.2% vs 0.8%

Cited vs non-cited ChatGPT sites (Insightland), an 8x gap

Schema selection rate
+73%

Correlation in Google AI Overviews (Wellows)

Strong E-E-A-T
96%

Share of AI Overview citations (Wellows)

Read these as descriptions of what cited pages look like, not as instructions that produce citations. The 8x FAQPage gap from Insightland means cited sites carry FAQPage schema far more often, not that bolting on FAQPage schema makes you 8x more likely to be cited. The Wellows +73% and 96% figures were measured in Google AI Overviews specifically, and E-E-A-T is a Google quality concept, not proof that any single markup signal causes a citation.

What AI crawlers actually read

Here is the mechanism most guides skip. When an AI assistant fetches your page in real time to answer a question, it does not parse your JSON-LD. A 2025 searchVIU experiment tested ChatGPT, Claude, Perplexity, Gemini, and Google AI Mode, and during direct retrieval every one of them extracted only the visible HTML. JSON-LD, hidden Microdata, and hidden RDFa were all ignored.

So structured data does not reach the model through the page it fetches. Its path is indirect: schema feeds the search index (rich results in Google and Bing), and those indexes feed AI search products. A real pathway, but second-hand, which is exactly why adding schema to an already-indexed page does so little.

So is schema useless? No.

Dropping structured data would be the wrong lesson. It still does three concrete jobs, none of which is "directly cause AI citations."

Rich results

Schema earns rich results in Google and Bing, the indexes that feed AI search products. The benefit is indirect but real.

Agent parseability

When an agent does parse structured data, typed entities and Q&A pairs let it extract facts without guessing them out of prose.

Discovery

For pages AI systems have not seen yet, schema can help them get crawled, parsed, and indexed in the first place.

What actually drives AI citations

If schema is a minor, indirect factor, where should your effort go? Toward the signals the citation research keeps surfacing. These are still mostly correlational, but they describe cited content far more reliably than markup does.

  • Authority and brand. Cited domains skew heavily toward established, frequently-referenced sites. This is the strongest pattern across every study.
  • Comprehensive content. Pages that answer the main question and naturally cover related angles are cited more often than narrow, single-keyword pages.
  • Question-led structure. In Kevin Indig / Growth Memo's analysis of 3M ChatGPT responses, among citations tied to a question, 78.4% came from a heading, and 44.2% of citations came from the first 30% of the page. AI tends to treat an H2 as a prompt and the text beneath it as the answer.
  • Freshness. Recently updated content is associated with substantially more citations. Treat it as a strong correlate, not a guaranteed multiplier.
  • Being in the index. Allow the right crawlers (OAI-SearchBot and friends) so you can be retrieved at all. This is the one true gate.

Where to invest your effort

A practical priority order that matches the evidence:

  1. Make sure the right AI crawlers can reach you. No access means no citation, no debate.
  2. Write comprehensive, well-structured content with question-led headings and the answer up top.
  3. Keep important pages fresh, with clear dateModified signals.
  4. Build authority the slow way: become the source that other sources reference.
  5. Add structured data, but for the right reasons: rich results and agent parseability, not a citation multiplier.

The bottom line

Schema markup is table stakes for machine-readability and a sensible investment for rich results and agent parseability. It is not a switch you flip to get cited by AI. The controlled data is clear: adding it to a page does not move citations. So keep your structured data clean, then spend the rest of your energy on the things that actually correlate with being cited: authority, comprehensiveness, structure, and freshness. Our scanner scores all of these, and it is honest about which are levers and which are correlates.

Curious how the individual AI systems choose sources? Read the companion posts on how ChatGPT chooses which websites to cite , how Google AI Overviews selects sources , and what AI agent readiness means .

Sources

Ready to check?

SCAN YOUR WEBSITE

Get your AI agent readiness score with actionable recommendations across 5 categories.

  • Free instant scan with letter grade
  • 5 categories, 65 checkpoints
  • Code examples for every recommendation

RELATED ARTICLES

Continue reading about AI agent readiness and web optimization.

Content Negotiation for AI Agents: Why Sentry Serves Markdown Over HTML
9 min read

Content Negotiation for AI Agents: Why Sentry Serves Markdown Over HTML

Sentry co-founder David Cramer shows how content negotiation — a 25-year-old HTTP standard — saves AI agents 80% of tokens. We break down the implementation: Accept headers, markdown delivery, authenticated page redirects, and what this means for every website preparing for agent traffic.

ai-agents seo getting-started
Cloudflare /crawl Endpoint: One API Call to Crawl Any Website
9 min read

Cloudflare /crawl Endpoint: One API Call to Crawl Any Website

Cloudflare launched a /crawl endpoint that crawls entire websites with one API call — returning HTML, Markdown, or AI-extracted JSON. We break down what this means for AI agent readiness: why your robots.txt, sitemap, semantic HTML, and server-side rendering now matter more than ever.

ai-agents seo getting-started
AI Crawlers Ignore llms.txt — But AI Agents Don't
9 min read

AI Crawlers Ignore llms.txt — But AI Agents Don't

Dries Buytaert's data shows zero AI crawlers use llms.txt. But he measured the wrong thing. Crawlers scrape for training data — agents complete tasks. We break down why the crawler vs agent distinction matters, which coding agents already use llms.txt and content negotiation, and what you should implement today.

ai-agents seo getting-started

EXPLORE MORE

Most websites score below average. Find out where you stand.

RANKINGS
SEE HOW OTHERS SCORE

RANKINGS

Browse AI readiness scores for scanned websites.
COMPARE
HEAD TO HEAD

COMPARE

Compare two websites side-by-side across all 5 weighted categories.
ABOUT
HOW WE MEASURE

ABOUT

Learn about our 5-category scoring methodology.