A GEO audit checklist identifies the 10 most common reasons AI engines skip your content: blocked crawlers, missing schema, no llms.txt, and weak authority signals. Fix these and your site becomes citable. According to Singh et al. (Princeton/Georgia Tech/IIT Delhi, 2024), adding statistics and citations to existing content increases AI visibility by up to 41% — often with zero new writing required.
Your website gets crawled every week by at least six AI engines: Googlebot (AI Overviews), OAI-SearchBot (ChatGPT), GPTBot (OpenAI training), PerplexityBot, ClaudeBot (Anthropic), and Bingbot (Copilot).
Each one is evaluating whether your content is worth citing in AI-generated answers.
Most of them leave without citing anything — not because your content is bad, but because you have technical blockers, structural gaps, or trust signals that are missing or wrong. According to Otterly.AI’s 2026 analysis of over 1 million AI citations, the majority of fixable citation gaps fall into the 10 categories below.
Here are the 10 most common issues we find when auditing sites for GEO readiness. Fix these before anything else.
What this checklist covers: Every item below addresses a specific signal that AI search engines use to decide whether to cite your content. We have ranked them by impact-to-effort ratio — the first items deliver the biggest gains with the least work. Each item includes why it matters, how to check your current status, and step-by-step implementation instructions.
Time estimate: A technical marketing team can implement all 10 items in 4–8 hours. Items 1–4 can typically be completed in under an hour and address the most common blockers.
The Checklist
-
AI Crawlers Are Allowed in robots.txt
Why If your
robots.txtblocks ChatGPT, Perplexity, or other AI crawlers, your content doesn’t get indexed and can’t be cited. Full stop. This is the single most common reason sites get zero AI citations despite having high-quality content. Otterly.AI’s 2026 analysis of over 1 million AI citations found that 73% of websites inadvertently block at least one major AI crawler.How to check Visit
yourdomain.com/robots.txt. Look forDisallowrules affectingGPTBot,OAI-SearchBot,PerplexityBot,ClaudeBot,anthropic-ai, andGoogle-Extended. Also check for wildcard rules likeUser-agent: * / Disallow: /that block all unrecognized bots.How to fix Add explicit
User-agent: [BotName]/Allow: /rules for each AI crawler. If you have a blanketDisallow: /for unrecognized bots, whitelist AI crawlers individually above the wildcard block. The six critical user agents: GPTBot, OAI-SearchBot, PerplexityBot, ClaudeBot, anthropic-ai, and Google-Extended.Impact: Immediate. AI crawlers typically re-index allowed content within 1–3 weeks. Perplexity reflects changes fastest (often within days).
-
You Have an llms.txt File
Why
llms.txtgives AI models a curated, machine-readable overview of your most important content. Pages listed there are indexed and cited more reliably than pages discovered through standard crawling alone. Anthropic officially endorsed the llms.txt standard in November 2024, making it the first major AI company to formally support a website-to-AI communication protocol.How to check Visit
yourdomain.com/llms.txt. A 404 means you don’t have one.How to fix Create a Markdown-formatted text file at your domain root with: (1) your company name and one-line description, (2) a 2–3 sentence about section, (3) bulleted list of your 10–20 most important URLs with one-line descriptions, (4) contact information. Keep the total file under 2,000 words. Update it whenever you publish significant new content.
Best practices: List your most important pages, not every page. Curate for quality. The cost to implement is under 30 minutes with zero downside risk.
-
Organization Schema Is Present and Complete
Why Organization schema tells AI engines what your business is, not just what your website says. It is the minimum viable trust signal for any business site. Without it, AI engines may not associate your content with a specific business entity — reducing the authority weight your content receives.
How to check Use Google’s Rich Results Test or search your homepage source for
"@type": "Organization". Check that it includes all required fields:name,url,description,logo,contactPoint, andsameAs.How to fix Add JSON-LD Organization schema to your homepage
<head>. Include:name,url,description,logo,contactPoint,sameAs(social profiles),knowsAbout(topics you are authoritative on), andfoundingDate. See our schema markup guide for full implementation details.Key fields most sites miss:
knowsAbouttells AI what topics you are authoritative on,sameAslinks to verified social profiles, andfoundingDateestablishes longevity. Including these fields helps AI engines build an entity graph for your brand. -
FAQPage Schema on Key Pages
Why FAQPage schema pages are 3.2× more likely to appear in Google AI Overviews — the highest citation multiplier of any schema type. Note: incomplete schema creates an 18-point citation penalty vs. no schema, so only implement it fully or not at all.
How to check Look for
FAQPagein your page source or validate with schema.org.How to fix Add an FAQ section to your most important landing and service pages. Mark it up with
FAQPage+Question+AnswerJSON-LD. Keep answers concise (50–150 words) and genuinely helpful.Critical details: Each answer should be self-contained — no “click here to learn more.” Minimum 3 FAQ items per page, maximum 10. The FAQ content in your JSON-LD must match visible content on the page. Google penalizes hidden-schema FAQ content. Sites in our audit sample that added complete FAQPage schema saw an average 28% increase in Google AI Overview appearances within 6 weeks.
-
Content Follows Answer-First Structure
Why AI models extract answers programmatically. Content that buries the answer in paragraph 4 after 300 words of preamble gets skipped. The first sentence after an H2 heading should answer the question that heading poses.
How to check Read your top 5 pages. Does each major heading pose a question? Does the first sentence after each heading answer it directly?
How to fix Rewrite section headings as questions. Rewrite the opening sentence of each section to directly answer before expanding with context. Pages with 120–180 words between headings receive 70% more citations than sections under 50 words.
-
Author and E-E-A-T Signals Are Visible
Why 96% of Google AI Overview citations come from sources with strong E-E-A-T signals. Perplexity rarely cites anonymous content. AI engines need to attribute content to real, credible humans or organizations.
How to check Do your articles show an author name with a bio? Does the bio link to an author page? Is the author's expertise relevant to the topic?
How to fix Add author bylines to every piece of content. Create author profile pages with credentials and links to professional profiles. Mark up with
Personschema includingjobTitle,affiliation, andurl.Minimum viable E-E-A-T: (1) Author byline on every article with name, title, and one-sentence credential. (2) Dedicated author profile page with bio, LinkedIn/Twitter links, and links to published articles. (3) Person schema with
name,jobTitle,affiliation,url, andsameAs. (4) Visible publication dates prominently displayed near the title. -
Publish Dates Are Visible and Schema-Confirmed
Why Freshness is a primary signal for Perplexity and important for Google. AI engines de-prioritize content with no visible publish date or outdated dates.
How to check Are publish dates visible on your blog posts? Is
datePublishedpresent in your Article schema?How to fix Display publish and update dates near the article title. Add
datePublishedanddateModifiedto your Article JSON-LD. UpdatedateModifiedwhen you refresh old content. -
Page Speed Under 3 Seconds
Why Pages with a First Contentful Paint under 0.4 seconds average 6.7 AI citations per page. Pages with FCP over 1.13 seconds average only 2.1 citations — a 3× difference from load time alone. Perplexity treats speed under 3 seconds as a direct ranking signal.
How to check Run your URL through Google PageSpeed Insights. Aim for LCP under 2.5 seconds.
How to fix Quick wins: compress images (WebP format, lazy loading), enable browser caching, defer non-critical JavaScript, use a CDN. For JavaScript-heavy React/Next.js sites, ensure content is server-rendered (SSR or SSG) rather than client-side rendered — AI crawlers often cannot execute JavaScript, so client-side rendered content is invisible to them.
JavaScript frameworks note: If your site uses React, Next.js, Vue, or Angular, AI crawlers may see a blank page because they do not execute JavaScript. Ensure your content is available in the initial HTML response. This is the most common page speed issue for modern web applications.
-
Sitemap Is Submitted and Current
Why A current sitemap ensures AI crawlers can discover all your important pages. Pages not in the sitemap or not indexed by Google are largely invisible to ChatGPT.
How to check Visit
yourdomain.com/sitemap.xml. Confirm it's submitted in Google Search Console.How to fix Regenerate your sitemap and submit it in Search Console. Remove thin content and duplicate pages to concentrate crawl budget on your best content.
Sitemap best practices for GEO: Include
<lastmod>tags with accurate dates — AI engines use these as freshness signals. Exclude paginated archives, tag pages, and low-value URLs. Keep total URLs under 1,000 for small sites. Submit to Bing Webmaster Tools as well — Microsoft Copilot draws from Bing’s index. -
Brand Mentions Exist on Third-Party Sites
Why Brand search volume has the highest correlation with AI citations (0.334 Pearson) — higher than backlinks, domain authority, or content quality. AI engines infer credibility from how often a brand appears in third-party editorial contexts.
How to check Google your brand name. Do third-party editorial sites mention you? Are you in industry roundups, review sites, or news articles?
How to fix Guest articles in industry publications, podcast appearances, and being quoted in news stories. Focus on earned mentions — paid directories and press releases have near-zero correlation with AI citations.
Brand mention hierarchy for AI citation impact: (1) Editorial news mentions (highest impact), (2) Industry roundup inclusion, (3) Expert quotations, (4) Podcast and video appearances, (5) Professional directory listings like G2/Capterra (moderate), (6) Social media mentions (lower but measurable), (7) Press releases and paid placements (near-zero impact). Building brand mentions takes 30–90 days of consistent effort.
Want a deeper analysis?
Our full GEO Audit goes beyond the score — covering crawl access, schema validation, content structure, and a prioritized fix list.
Request a GEO Audit →How Many Did You Pass?
For a deeper look at the business case behind AI citations and how they compound over time, see How to Get Your Business Mentioned by ChatGPT. For the technical schema implementation details behind checklist items 3–4, see Schema Markup for AI: Why JSON-LD Is the New SEO.
Beyond the Checklist: Advanced GEO Tactics
Once you have completed all 10 items, consider these advanced optimizations:
Content block optimization
The optimal content block length for AI citation is 75–150 words. Blocks shorter than 50 words lack enough context for AI to cite meaningfully. Blocks longer than 200 words get truncated or passed over in favor of more concise sources. Audit your top pages and restructure sections to hit the 75–150 word sweet spot.
Citation network building
AI engines evaluate not just your content but the ecosystem of content that references you. Pages that are cited by other authoritative pages receive a compounding citation advantage. Focus on creating original research, proprietary data, and novel frameworks that other publishers will reference.
Multi-platform optimization
Different AI engines have different content preferences. Perplexity favors real-time, data-rich content. ChatGPT Search correlates with Google organic rankings. Google AI Overviews favor content with existing Featured Snippet presence. A comprehensive GEO strategy optimizes for all platforms simultaneously by addressing the common citation signals they share.
Entity disambiguation
If your brand name is similar to other entities (common words, shared names), add explicit entity disambiguation to your Organization schema using sameAs, alternateName, and detailed description fields. This helps AI engines correctly identify your brand when generating citations.
Frequently Asked Questions
GPTBot and OAI-SearchBot. In Cloudflare, navigate to Security → Bots and filter for OpenAI user agents. In Google Search Console, crawl stats don’t show third-party bots, so server logs or a CDN like Cloudflare are your best options. If GPTBot appears in logs, it is crawling. If it’s absent, either your robots.txt is blocking it or your site hasn’t been prioritized yet.Get Your Full 10-Dimension AI Visibility Score
This checklist covers the fundamentals, but a full GEO audit goes deeper — analyzing your content’s citation-readiness, platform-specific gaps, and competitive positioning across all major AI engines. The free AI Visibility Score tool runs all 10 dimensions against your live site in under 60 seconds. You’ll see exactly which of the items above you’re passing and failing, with a prioritized list of fixes ranked by impact.
According to GEO research from Princeton, Georgia Tech, and IIT Delhi (2024), sites that implement structured data, statistics, and answer-first content formatting see an average 41% increase in AI visibility within one content revision cycle. The score report shows you precisely where to start.
Check Free 10-Dimension AI Visibility Score →Sources: Princeton/Georgia Tech/IIT Delhi KDD 2024 GEO research (arXiv:2311.09735), BuzzStream 4M citation analysis, CXL 100-page schema study 2024, Otterly.AI 1M+ citation study 2026, Surfer SEO AI Citation Report 2025, Google E-E-A-T documentation, GEORaiser internal audit data (12-site sample, March 2026).