Home / Learn / Methodology

How Rankio Measures LLM Visibility

Feb 22, 2026 9 min read Methodology, Transparency

Rankio measures LLM visibility by querying ChatGPT, Gemini, and Perplexity with real-world prompts, analyzing each response for brand citations using 7 weighted metrics (Content Quality, 80% of the score), running a 10-element GEO Content Audit (GEO Readiness, 20% of the score), and blending both into a composite Overall Score (0–100). Every analysis is fully transparent — you can see the raw AI response and exactly how each metric was calculated.

Overall Score = Content Quality (80%) + GEO Readiness (20%). Content Quality is measured from 7 citation metrics across AI models. GEO Readiness is an automated audit of 10 structural elements (Direct Answer, tables, FAQ, JSON-LD, headings, etc.). Both are fully auditable.

ComponentWeightSource
Content Quality80 %7-metric weighted score from AI model responses
GEO Readiness20 %10-element GEO Content Audit score
Content Quality metric Weight What it measures Scoring
Presence25%Is the brand mentioned at all?Binary per prompt, averaged across set
Citation quality20%How the brand is referencedURL = 1.0, name = 0.6, contextual = 0.3
Position15%Where in the response it appears1st = 1.0, 2nd = 0.7, 3rd+ = 0.4
Recommendation15%Is the brand actively endorsed?"We recommend" = 1.0, "is an option" = 0.4
Sentiment10%Tone of the mentionPositive = 1.0, neutral = 0.5, negative = 0.1
Consistency10%Cross-model and cross-prompt presenceHigher if cited across multiple models/prompts
Frequency5%Mentions within a single responseSmall boost for repeated mentions
ConceptDefinitionWhy it matters
Overall ScoreTwo-tier composite (0–100): Content Quality (80%) + GEO Readiness (20%)Single reliable number that captures both citation strength and structural readiness
Content QualityWeighted composite of 7 citation metrics measured from AI model responsesMeasures how AI models actually respond to your brand across multiple prompts
GEO Readiness10-element GEO Content Audit score (Direct Answer, tables, FAQ, JSON-LD, etc.)Measures how well your page is structured for AI extraction and citation
Citation DetectionParsing AI responses for brand mentions using exact, fuzzy, and entity matchingCatches direct mentions, product references, and implied brand references
Cross-Model AnalysisRunning identical prompts on ChatGPT, Gemini, and Perplexity for comparisonDifferent models have different knowledge — full coverage requires all three
Full TransparencyShowing the raw AI response alongside extracted metrics for every analysisAllows manual verification and builds trust in the scoring methodology

Why a transparent methodology matters

When you invest in GEO (Generative Engine Optimization), you need to trust the data. Unlike traditional web analytics where you can verify traffic with server logs, AI visibility is harder to audit. AI models are black boxes — you can't install a tracking pixel inside ChatGPT.

That's why Rankio is built on a principle of full auditability. Every data point traces back to a real AI model response that you can read, verify, and challenge. There is no hidden algorithm — just a structured, reproducible process. The brands in our case studies relied on this transparency to validate their Share of Voice improvements.

The Rankio analysis pipeline

Step 1 — Prompt design

You provide a URL, a brand name, or a topic. Rankio generates (or you define) a set of prompts that represent how your audience interacts with AI. These include:

  • Discovery prompts: "What tools can help with [topic]?"
  • Comparison prompts: "Compare [your brand] vs [competitor]"
  • Branded prompts: "What is [your brand]?"
  • Intent prompts: "Best [category] for [use case]"

Step 2 — Multi-model querying

Each prompt is sent to multiple AI models simultaneously. We use the latest available model versions and configure them with default parameters (temperature, system prompts) to simulate real user interactions. Results are timestamped and stored.

Step 3 — Response parsing

Raw AI responses are analyzed using a multi-layered extraction engine:

  • Brand detection: exact match, fuzzy match, and entity recognition
  • Citation extraction: URLs, domain references, and source attributions
  • Sentiment analysis: is the mention positive, neutral, or negative?
  • Position analysis: where in the response does the mention appear (first, middle, last)?
  • Recommendation strength: is the brand merely mentioned or actively recommended?

Step 4 — Score computation

The Overall Score (0–100) is a two-tier composite:

Tier 1 — Content Quality (80%): The extracted citation data feeds into 7 weighted metrics:

  • Presence (25%): is the brand mentioned at all?
  • Citation quality (20%): URL citations score higher than name-only mentions
  • Position (15%): first-mentioned brands get a higher weight
  • Recommendation (15%): is the brand actively recommended or just listed?
  • Sentiment (10%): positive mentions score higher
  • Consistency (10%): does the brand appear across multiple models and prompts?
  • Frequency (5%): how many times within the response?

Tier 2 — GEO Readiness (20%): In parallel, a GEO Content Audit checks 10 structural elements — Direct Answer, TL;DR, tables, FAQ, heading hierarchy, lists, JSON-LD, internal links, meta description, and entity clarity — producing a GEO Readiness score (0–100).

Final formula: Overall Score = Content Quality × 0.80 + GEO Readiness × 0.20

For a detailed breakdown, see the full methodology page.

Step 5 — Competitive benchmarking

The same analysis runs for your competitors, allowing Rankio to compute your Share of Voice and rank all brands in your space. This gives you a clear picture of where you stand and where your gaps are.

Step 6 — Actionable insights

Rankio doesn't just show you data — it tells you what to do. The Content Studio identifies prompts where you have low visibility but your competitors rank high, and generates content briefs optimized to fill those gaps. See our case studies for real examples of this loop producing +38% visibility gains.

How the Overall Score is weighted

The Overall Score (0–100) is a two-tier composite. Each tier contributes a fixed share:

Content Quality (80%)

7 citation metrics

Presence (25%) — Is the brand mentioned at all? Binary per prompt, averaged across the full prompt set.
Citation quality (20%) — URL citations score 1.0, direct name mentions score 0.6, contextual references score 0.3.
Position (15%) — First-mentioned brand gets 1.0, second gets 0.7, third+ gets 0.4. Top-of-response placement matters.
Recommendation strength (15%) — "We recommend X" scores 1.0 vs. "X is an option" at 0.4. Active endorsement is weighted heavily.
Sentiment (10%) — Positive mentions score 1.0, neutral 0.5, negative 0.1. Negative mentions still count but contribute minimally.
Consistency (10%) — Appearing across multiple AI models and diverse prompt types scores higher than appearing in just one model.
Frequency (5%) — Multiple mentions within a single response provide a small additional boost.

GEO Readiness (20%)

The GEO Content Audit automatically checks 10 structural elements that AI models need to extract and cite content reliably: Direct Answer, TL;DR, tables, FAQ, heading hierarchy, lists, JSON-LD, internal links, meta description, and entity clarity. Each element is scored 0–100, and the average produces the GEO Readiness score.

Example composite calculation

Content Quality = 74 (strong presence and citation quality, moderate recommendation)
GEO Readiness = 62 (has tables and headings, but lacks Direct Answer and FAQ JSON-LD)

Overall Score = 74 × 0.80 + 62 × 0.20 = 59.2 + 12.4 = 72/100

Weights were calibrated against known outcomes — comparing score deltas with actual changes in referral traffic from AI-powered search. They are re-calibrated quarterly as AI search behaviour evolves.

The Content Quality component also powers the AI Share of Voice calculation: SOV is derived from the presence and citation quality dimensions, aggregated across competitors for a given prompt set.

For the full formula, worked examples, and weight calibration details, see the dedicated Methodology page.

Known limitations and caveats

No measurement system is perfect. We believe in transparent disclosure of our methodology's constraints:

  • AI model non-determinism: LLMs produce different responses to the same prompt across runs. Rankio mitigates this by running multiple samples and averaging, but some variance is inherent. Two analyses of the same prompt may yield slightly different scores.
  • Model version changes: When AI providers update their models (e.g., GPT-4 to GPT-4o), response patterns can shift. Historical comparisons should account for model version changes, which Rankio logs.
  • Retrieval vs. parametric knowledge: It is not always possible to distinguish whether an AI model cites your brand from its training data (parametric) or from live retrieval (RAG). Both contribute to visibility, but they respond to different GEO strategies.
  • Prompt coverage: Your Visibility Score is only as representative as your prompt set. A narrow set of 10 prompts will give a less reliable picture than 50-100 diverse prompts. We recommend a minimum of 30 prompts for a reliable baseline.
  • Geographic and language variance: AI models may produce different responses based on inferred user location or language. Current analysis uses English-language, default-locale queries. Multi-language support is on our roadmap.
  • Correlation ≠ causation: A rising Visibility Score after content changes suggests improvement, but external factors (competitor content changes, model updates) can also affect scores. We recommend tracking competitive SOV alongside your own score to isolate your changes from market movements.

We continuously work to reduce these limitations. Every Rankio analysis includes metadata (model version, timestamp, prompt text, raw response) so you can audit and interpret results in full context.

A real analysis walkthrough

Scenario

A cybersecurity company runs a Rankio analysis on their homepage URL. Rankio generates 15 prompts covering "best endpoint security", "enterprise cybersecurity tools", and "compare CrowdStrike vs [brand]".

Results: The company scores 62/100 on ChatGPT but only 31/100 on Perplexity. Digging into the per-prompt data, they discover that Perplexity consistently retrieves a competitor's comparison page instead of theirs. Their brand is mentioned but never recommended.

Action: They use the Content Studio to generate a targeted comparison article, add FAQPage schema, and re-run the analysis 3 weeks later. Their Perplexity score jumps to 54/100.

Key takeaway

The raw response is always visible — you can verify exactly what each AI model said about your brand and why Rankio scored it that way.

Getting the most out of Rankio's methodology

  • Start with a URL analysis to establish your baseline Visibility Score
  • Add your top 3-5 competitors for benchmark comparison
  • Review the raw AI responses to understand how models perceive your brand
  • Set up prompt monitoring for your most important queries
  • Run Share of Voice tests monthly to track competitive trends
  • Use Content Studio recommendations to fill visibility gaps
  • Compare your score across ChatGPT, Gemini, and Perplexity individually
  • Re-analyze after content changes to measure the impact of your GEO efforts

Frequently asked questions

Rankio queries ChatGPT (OpenAI), Gemini (Google), and Perplexity. Each model is queried with the same prompts to enable fair cross-model comparisons. We continuously add support for new models as they become relevant.
The Overall Score (0–100) is a two-tier composite: Content Quality (80%) blended with GEO Readiness (20%). Content Quality is computed from 7 weighted citation metrics (Presence, Citation quality, Position, Recommendation, Sentiment, Consistency, Frequency). GEO Readiness is an automated GEO Content Audit checking 10 structural elements. Both are calibrated against real referral traffic from AI search.
Rankio uses a combination of exact name matching, fuzzy matching for variations (typos, abbreviations), URL detection, and entity recognition. This catches direct mentions, product references, and contextual references where a brand is implied but not named directly.
Monitoring prompts are queried daily. Full Share of Voice tests can be run on-demand or scheduled. Historical data is stored indefinitely so you can track trends over months and years.
Yes, always. Every analysis in Rankio shows the full raw AI response alongside the extracted metrics. You can see exactly what ChatGPT, Gemini, or Perplexity said about your brand and verify how Rankio interpreted it. We believe in full auditability.

See the methodology in action

Run your first analysis and explore the raw data behind your Visibility Score.