The 99 Signals We Check (and What Each One Actually Predicts)

person

AI SEO Intelligence

calendar_today May 25, 2026

schedule 14 min read

The 99 Signals We Check (and What Each One Actually Predicts)

Most SEO audit tools open with a number out of 100 and a colour-coded ring. The number means almost nothing. It is a weighted average of dozens of signals the tool will not name, calibrated against a benchmark the tool will not publish, refreshed on a schedule the tool will not commit to. You read it, you nod, you do not know what to fix on Monday morning.

We took the other path. HybridRanking is a page-level SEO and GEO audit tool. 99 atomic checks. Every check targets a discrete failure mode, ships with a documented fix, and exposes a public evidence page. Where Ahrefs, Semrush, Screaming Frog and the AI-visibility incumbents (OtterlyAI, Profound, Semrush AI Toolkit) track domain-level visibility — "am I cited, how often, by which engine" — we surface a per-page diagnostic. The output is page-specific: for the URL you ran the audit on, here is the failing signal and the recommended action. Complementary positioning, not competitive. If the visibility tools hand you the scoreboard, we hand you the fix list.

This post is the complete map of those 99 signals. Twenty of them were refreshed in the last sprint based on Q1–Q2 2026 evidence, and five shipped weight changes in May 2026 alone. The matrix is a living document; this post snapshots it as of May 25, 2026.

Key Takeaways

99 atomic checks across 5 categories. Each one predicts a specific failure mode — broken indexing, missing schema, an empty raw HTML shell, a buried lead — not a vibe.
The top 7 signals by penalty magnitude account for roughly 30% of the audit's non-indexing penalty pool. Fix those first; the long tail is real but it is not where the head sits.
Five weight-policy changes shipped in May 2026 based on Q2 2026 evidence: FAQ rich result deprecation, AI crawler bot list expansion, Discover image eligibility enforcement, source-citation weight bump, language-attribute downweight.
Pro tier unlocks 15 LLM-based content quality checks — original research detection, firsthand experience signals, citation quality, fact density, front-loading — plus other premium signals, on top of free-tier coverage that hits every page type.
Every check links to its own /checks/<slug> evidence page showing what we look for, how to fix it, and which Q1–Q2 2026 sources back the weight calibration.

The 7 Highest-Impact Signals — Fix These First

If you only have a sprint, work this list top to bottom. These are the seven signals carrying the highest individual penalties in the audit, and they account for the bulk of what a failing audit is failing on.

1. Indexing Status — the largest single penalty in the entire audit. A page blocked by noindex, X-Robots-Tag: noindex, or a robots.txt disallow is invisible to Google Search and to most AI crawlers. Every other signal on the page becomes academic. The check looks for the three block patterns and flags any that resolve to "this page will not appear in any index". Fix: remove the directive on pages you want surfaced; keep it deliberately on staging, search results, faceted URLs.

2. JSON-LD Structured Data — the second-largest penalty. JSON-LD is how AI search engines parse "who wrote this, when, about what, on behalf of which organization" without guessing from prose. A page with no JSON-LD at all forces the model to infer all of it from raw HTML, and inference is where citations get attributed to the wrong author or skipped entirely. The check verifies presence of any parseable application/ld+json block. Fix: ship an Article (or appropriate type) schema with at minimum headline, author, datePublished, image.

3. JavaScript Content Gap — among the top three by penalty magnitude. The non-rendering AI crawler cohort (GPTBot, PerplexityBot, ClaudeBot, Bytespider, CCBot) does not execute JavaScript when fetching pages — they only see the raw HTML response. The check detects this risk heuristically: a page that ships with SPA-framework markers (#root, __NEXT_DATA__, data-v-, etc.) and a thin raw-HTML word count is flagged as likely client-side-only. We covered the rendering map in detail in our companion post on AI crawler JavaScript execution; the heuristic catches the common case rather than every edge case. Fix: server-side render, static-generate, or hybrid-render the cited routes.

4. Sources & References — ranked among the top five by penalty magnitude, and the weight went up this quarter. Wellows' 2026 analysis of AI Overview ranking factors and the March 2026 Core Update post-mortem both point to the same pattern: pages backed by authority signals - named citations, explicit sources, brand-backed content - are more likely to be selected by AI Overviews, while thin, uncited pages are passed over. The check looks for a dedicated Sources or References section, or in-text citations. Fix: cite sources you actually used; do not pad with generic links to support a heuristic.

5. Schema Types — having JSON-LD is necessary; having the right types is what closes the loop. An Article schema on a product page, a WebPage where Recipe was expected, a missing FAQPage where the layout is clearly a FAQ — all of these confuse rich result eligibility and AI parsing. Fix: match schema type to actual content type, using the Schema.org type that best describes what the page is.

6. XML Sitemap — among the top non-indexing penalties. A homepage with no discoverable sitemap forces crawlers (search and AI) to rely entirely on internal-link traversal for discovery, which delays indexation of new content and misses orphan pages. The check looks for /sitemap.xml, common variants, and any sitemap referenced from robots.txt. Fix: generate and serve a sitemap; reference it from robots.txt.

7. Question-Format Headings — one of the strongest AI search citation signals we track, weighted up in our March 2026 rebalance. AI search engines preferentially extract answer passages from sections whose headings are phrased as questions, because question-shaped headings let the retrieval layer match the user's literal query. The check counts H2 headings phrased as questions on article-like page types. Fix: rephrase two or three H2s as the actual questions a reader would type — What is X?, How does Y work?, When should you Z?

By Category

Technical SEO (~24 checks)

Foundational signals that govern whether a page can be indexed and rendered at all. Two of the top-three penalties in the entire audit live here (Indexing Status, JavaScript Content Gap), and several others — Canonical URL, Self-Referencing Canonical, HTML Lang Attribute, Viewport Meta Tag, SSL Certificate — carry penalties because each represents a load-bearing piece of metadata that downstream crawlers and renderers depend on. Spotlight signals:

Page Title — predicts whether your page can compete in the SERP at all. Missing or generic titles are the cheapest miss in technical SEO.
Mixed Content — predicts whether modern browsers will silently block resources and degrade rendering. An HTTPS page loading HTTP assets is partially-rendered to most crawlers.
Large Image Preview Directive — ships as a new check this quarter (May 2026). Predicts Google Discover eligibility plus rich image preview rendering in Search. Required, not optional, for Discover; documented as a Search-wide control in Google's 2019 "More controls on Search" post. We unpacked the full story in Discover Optimization Is Not SEO Optimization.
Hreflang Tags — predicts whether multilingual pages get served the right language version, or whether Google serves the wrong one and ranks neither.

Discovery & Infrastructure (~11 checks)

Signals that govern how crawlers find your pages in the first place. Discovery is the layer most teams underinvest in because nothing visibly breaks when it fails — pages just take longer to be indexed, and orphan pages never are. Spotlight signals:

XML Sitemap and Sitemap Lastmod Dates — predict crawl frequency and freshness signal accuracy. A sitemap without lastmod dates is half a sitemap.
Page in Sitemap — predicts whether this specific page is reachable through the official discovery channel. A page reachable only via internal links is a page on borrowed time.
Robots.txt and Robots.txt Sitemap Directive — predict crawler entry behavior. A missing robots.txt is fine (default-allow); a missing Sitemap: directive in an existing robots.txt is a missed signal.
AI Bot Access Rules — predicts which AI crawlers can reach your content. This check expanded from 10 to 15 bot sub-keys this quarter to track Anthropic's ClaudeBot split, OpenAI's OAI-AdsBot launch, and Applebot-Extended's growing share of AI-crawler traffic. See our companion piece on AI crawler blocking through Q3 2026 for the share data and the per-bot breakdown.
llms.txt File and RSS/Atom Feed — predict AI discovery surface area. We weight llms.txt low because a year of evidence shows zero AI crawlers actually read it; RSS gets more credit because real crawlers do consume it.

Content Structure & Quality (~38 checks)

The largest category by check count, and the one most affected by AI search behavior. Spotlight signals:

H1 Heading, H2 Headings, Heading Hierarchy — predict whether crawlers can parse document structure. A page with no H1, ten H2s, and skipped levels reads as unstructured to retrieval models.
Question-Format Headings, FAQ Section, Self-Contained Answer Passages — predict AI citation rate. We covered why FAQ schema alone is dead and what works instead in FAQ Rich Results Are Dead; the checks here look for the patterns that actually get cited, not just the schema markup.
Summary Section and Table of Contents — predict snippet eligibility and TL;DR extraction. AI engines preferentially extract from explicitly-marked summary blocks.
Lists in Content and Tables in Content — predict structured-data extraction. AI engines preferentially cite content with comparison tables, ordered lists, and step-numbered procedures because each row or item is independently extractable as a passage.
Original Research, Firsthand Experience Signals, Fact Density, Content Front-Loading — Pro-tier LLM checks. These predict whether the content has substance an AI engine would actually prefer to cite over a generic round-up. The Experience signal weight was raised in May 2026 to align with the April Core Update analysis convergence. We made the case for the highest-impact one in Original Research as an AI Citation Signal.

Structured Data (~12 checks)

Schema is where AI engines extract entity relationships without guessing. Penalty weighting here is bimodal — JSON-LD Structured Data is the second-largest single penalty in the audit, Schema Types is in the top tier, and the rest cluster low because they are quality refinements on a foundation that already exists. Spotlight signals:

Schema Required Fields — predicts rich result eligibility. An Article schema without headline, author, or datePublished is parseable but not surfaceable.
Schema @id Linking and Schema Trust Chain — predict entity disambiguation. Linking Article → Person author → Organization publisher via @id lets AI engines build a citable trust graph rather than treating each entity as a string match.
Schema inLanguage — predicts language-aware retrieval. AI engines preferentially serve language-matched citations.
Article Schema Image and Video Schema — predict media rich-result eligibility. Required, not optional, if you want image cards or video chips to render.

E-E-A-T & Trust (~18 checks)

The signals that map most directly to Google's E-E-A-T framework and to AI engines' "should I trust this enough to cite it" decision. Spotlight signals:

Author Name, Author Bio Link, Author Credentials, Author E-E-A-T Relevance — predict whether AI engines can identify and verify who wrote what. We covered the full case in E-E-A-T for Non-Article Pages.
Publication Date, Modification Date, Date Visibility, Content Freshness — predict freshness-weighted retrieval. AI engines downweight stale content for time-sensitive queries; missing or hidden dates make freshness uninferrable.
Sources & References and Citation Quality — predict "safe to cite" gatekeeping. Both weights went up in May 2026 based on the Wellows citation analysis and the March 2026 Core Update pattern showing that content backed by named, verifiable citations is preferentially surfaced.
Trust Page Links, About Page Trust Signals, YMYL Disclaimer — predict whole-site trust inference. AI engines treat presence of About / Contact / Privacy as table-stakes signals; absence flags as suspect.
Language Consistency — the weight was reduced in May 2026 for declarative-only mismatches (where <html lang> and schema inLanguage disagree but the content itself is unambiguous). Google has stated publicly that it does not use <html lang> for language detection; it uses on-page content algorithms. We adjusted accordingly.

What Changed in Q2–Q3 2026

The matrix is not static. Five weight-policy changes shipped in May 2026, all driven by external evidence that crossed a confidence threshold:

FAQ schema lost its weight. Google deprecated FAQ rich results on May 7, 2026. Independent research published in May-June 2026 - including a SERoundTable-reported schema study and SEJ's analysis of schema's AI search value - found that schema addition did not significantly improve AI citation rates. Content quality, E-E-A-T signals, and brand authority are stronger predictors than structured data presence. We zeroed out the FAQPage branch of Schema Types and pivoted FAQ Section to credit alternative patterns (Key Takeaways summary, Sources section, comparison table) for long-form Articles where literal FAQ structure does not fit. Full reasoning in our FAQ post-mortem.

Experience signals got heavier. The March 2026 Core Update analysis found that authority signals gained significant weight - official sites, brand-backed content, and data-rich sources were boosted while aggregators and low-quality content sites dropped. We raised Firsthand Experience Signals penalty to match the weight of Original Research, making them the two heaviest Pro-tier content quality checks.

AI crawler catalogue expanded from 10 to 15 bots. Anthropic split ClaudeBot into ClaudeBot, Claude-User, and Claude-SearchBot in February 2026. OpenAI added OAI-AdsBot in April. Applebot-Extended's share of AI-crawler traffic grew enough to make it a critical-tier bot. We added Claude-SearchBot, Claude-User, Applebot-Extended, Perplexity-User, and OAI-AdsBot to AI Bot Access Rules. Three optional bots (Amazonbot, Meta-ExternalAgent, MistralAI-User) deferred to Q4 2026 pending adoption signal.

Discover image signals split out and enforced. Google's "Get on Discover" documentation names two hard eligibility requirements: og:image:width >= 1200 and max-image-preview:large. We split the width requirement out of Open Graph Tags as a new sub-result, and added Large Image Preview Directive as a new check. Kirbie's Cravings recorded a significant Discover traffic lift from enabling the directive, per Google's own case study. Full unpacking in Discover Optimization Is Not SEO Optimization.

Source citation weight went up. Sources & References penalty increased based on Wellows' 2026 AI Overview ranking factors analysis - where pages with stronger entity authority signals show higher AI Overview selection rates - and the March 2026 Core Update pattern of rewarding content backed by named, verifiable sources.

How We Rank Signal Importance

Three commitments govern how weights get set and refreshed:

Evidence stack. Our research registry catalogues 1,192 sources covering ranking factor studies, vendor documentation, controlled experiments, and Core Update teardowns through Q1 2026, refreshed quarterly. Every weight in the matrix traces back to either a registry entry or — for the freshest signals — to source links cited directly in the corresponding internal decision record.

Five-axis scoring. Each check is scored on SEO impact (Google/Bing ranking influence), GEO impact (AI search citation influence), penalty magnitude (how much it costs to fail), scope (which page types it runs on), and tier (free vs Pro). The five axes are independent — a check can be HIGH GEO and LOW SEO, or all-pages and zero-penalty (info only). The matrix surfaces the full vector, not a collapsed single score.

Validated dates. Every check carries a "last verified against external evidence" stamp. Twenty checks are stamped 2026-06 (freshest, refreshed for the Q3 2026 weight refresh sprint), the rest at 2026-03 with a rolling refresh roadmap. Stamps are public in the policy matrix so you can see what is freshly calibrated and what is due for the next pass.

The honest caveat: weights are point-in-time estimates, not laws of nature. We ship updates roughly monthly, and we publish the reasoning. If a weight moves, you can read why.

Complete Reference: All 99 Signals

The full catalogue, grouped by category. Each link goes to a dedicated /checks/<slug> page with the check definition, fix steps, and evidence trail. Pro-tier checks are marked (Pro).

Technical SEO. Page Title, Meta Description, Canonical URL, Self-Referencing Canonical, Open Graph Tags, OG Image Dimensions, Large Image Preview Directive, Indexing Status, Semantic HTML, Hreflang Tags, HTML Lang Attribute, Viewport Meta Tag, SSL Certificate, Mixed Content, Image Lazy Loading, LCP Image Dimensions, Favicon, Image Sitemap Entries, Content Density, JavaScript Content Gap, Resource Hints, Last-Modified Header, Snippet Readiness, Heading Keyword Relevance.

Discovery & Infrastructure. XML Sitemap, Sitemap Lastmod Dates, Page in Sitemap, Sitemap Link Coverage, Robots.txt, Robots.txt Sitemap Directive, Conflicting Robots Rules, AI Bot Access Rules, llms.txt File, Page in llms.txt, RSS/Atom Feed, RSS Feed Items, Page in RSS Feed.

Content Structure. H1 Heading, H2 Headings, Question-Format Headings, Heading Hierarchy, Lead Paragraph Length, Paragraph Atomicity, FAQ Section, Summary Section, Image Alt Text, Lists in Content, Tables in Content, External Links, Noscript Fallback, Content Word Count, Table of Contents, Breadcrumb Navigation, Readability Score, Collection Categorization, Tool Page Description, Homepage Internal Links, Article Internal Links, Collection Page Internal Links, Internal Link Anchor Text, Social Proof, Content Above the Fold.

Content Quality (Pro). Original Research, Fact Density, Introduction Quality, Semantic Completeness, Brand Clarity, Tool Description Quality, Collection Quality, FAQ Quality, Medical Content Quality, Firsthand Experience Signals, Content Front-Loading, Self-Contained Answer Passages.

Structured Data. JSON-LD Structured Data, Schema Types, Schema Required Fields, Schema @id Linking, Schema Trust Chain, Schema inLanguage, Article Schema Image, SearchAction Schema, Video Schema, Speakable Schema, Product Schema Quality, Organization Knowledge Links.

E-E-A-T & Trust. Author Name, Author Bio Link, Author Credentials, Author Photo, Author Social Links, Author Bio Content, Author E-E-A-T Relevance (Pro), Author Bio Quality (Pro), Credential Verifiability (Pro), Citation Quality (Pro), Publication Date, Modification Date, Date Visibility, Content Freshness, Editorial Review Signal, Content Update Annotation, Sources & References, YMYL Disclaimer, Trust Page Links, About Page Trust Signals, Language Consistency.

What This Doesn't Mean

Four caveats, because a 99-signal map is easy to read as a 99-item to-do list and it is not.

99 checks does not mean 99 things to fix tomorrow. Fix the 7 high-impact signals first. Most pages have 3–5 actual blockers and a long tail of info-only signals that change nothing if you "fix" them.
A pass on every check does not guarantee Discover traffic, a top-3 ranking, or daily AI citations. The audit measures the signals that are necessary for visibility, not the signals that are sufficient for outperformance. Topical authority, link equity, and brand demand sit outside the audit's scope and dwarf any individual signal in their influence on ranking.
We do not score brand mentions, backlinks, off-page authority, or content quality at scale. Those are different categories of tool — Ahrefs Brand Radar, OtterlyAI, Semrush AI Visibility Toolkit, Profound. Use them alongside this audit, not instead of it. The complementary positioning is intentional.
Weights are point-in-time estimates, not laws of nature. We refresh against quarterly evidence and publish the reasoning. The matrix as of May 25, 2026 is not the matrix six months from now.

Run a free audit at hybridranking.com on your homepage or a top article. The audit runs the full free-tier signal set on every page and previews which Pro-tier LLM-based content quality checks would help most for your content type. Each finding links back to the specific /checks/<slug> page with what to fix and why we weighted it the way we did. Power users: this whole catalogue is also queryable as native tools inside Claude Desktop / Cursor / Continue.dev — see how we shipped our MCP server for the setup.