Guide · April 26, 2026 · 22 min read

The 15-Prompt AI Visibility Audit: Test Your Brand in ChatGPT, Claude & Perplexity

15 prompts across 3 layers. Only 30% of 40 SaaS brands cleanly passed Layer 1 in our April 2026 audit. Find where ChatGPT, Claude & Perplexity lose deals.

By Joao Da Silva, Co-Founder of friction AI

· April 26, 2026

TL;DR. AI visibility is three layers, not one. Most marketers test it by asking ChatGPT a single question, which is closer to a vibe check than a diagnostic. This 15-prompt audit (5 prompts per layer: entity recognition, visibility, recommendation) tells you exactly which layer is leaking deals and what to fix first. Free template at the bottom. Run it tonight.

Three frosted-glass panels filtering a stream of particles, a visual metaphor for the 3-layer AI visibility audit

When we audited 40 SaaS brands' Layer 1 visibility across two GPT generations (gpt-4o and gpt-5.2), only 12 of 40 (30%) cleanly passed all three sub-tests. The bottleneck was not the model. The same 12 brands passed under both generations; the model upgrade did not move the number. The binding constraint was the Knowledge Graph entity, which does not change when you upgrade your LLM. Every dollar a team spends on Layer 2 and Layer 3 work is wasted while their Layer 1 entity foundation is missing, because AI cannot recommend a brand it cannot recognize.

This guide walks the audit you can run yourself in under an hour, the 15 prompts you should track quarterly, and the playbook for what to fix once you see the results.

What is an AI visibility audit?

An AI visibility audit is a structured test of how large language models (ChatGPT, Claude, Perplexity, Gemini) recognize, rank, and recommend your brand. The audit measures three sequential layers of AI behavior, because a brand can pass one and silently fail another.

Unlike a Google rank check, AI visibility is a moving target. Search interest in "AI Visibility" grew 11.5x in the 12 months ending April 2026, and "Answer Engine Optimization" grew 5.9x (Google Trends, internal pull, Apr 2026).

The 15-prompt framework breaks the audit into three diagnostic filters that map to a buyer's journey through AI search.

A 3-layer funnel diagram showing existence, ranking, and selection as the three filters of AI visibility

The arc is existence, then ranking, then selection:

  1. Entity Recognition (Foundation): does AI know you exist?
  2. Visibility / ToFu (Leaderboard): who is strongest in your space, and where do you sit?
  3. Recommendation / BoFu (Favorite): does AI pick you when forced to commit, and does it surface concerns when asked directly?

The diagnostic value of separating them is brutal. If you fail Layer 1, fixing your homepage copy will not help. If you win Layer 1 but lose Layer 2, your problem is off-site authority (Reddit, listicles, comparison content), not your website. If you win Layers 1 and 2 but lose Layer 3, you have a validation problem with the buyers who already know you. Diagnose before you fix; the wrong fix on the right problem still loses you another quarter.

Why isn't one ChatGPT query an audit?

One ChatGPT query is a single roll of a loaded die. AI responses vary across runs, across models, and across phrasings. Even between gpt-4o and gpt-5.2 in our 40-brand audit, individual brands flipped between PASS and FAIL. Linear was unrecognized by gpt-4o but correctly identified by gpt-5.2. Celest's web search ranking dropped on the newer model. A structured audit averaged across multiple runs produces direction. A single query produces anxiety.

The reference work here comes from Omniscient Digital. Their team analyzed 25,755 AI citations across 200 prompts and identified five universal BoFu prompt patterns that generalize across e-commerce, SaaS, services, and healthcare (Omniscient Digital, 2025). The takeaway is not that you need 200 prompts of your own. It is that the shape of a real audit is structured, repeated, and read across runs, not extracted from one screenshot.

If you want to see what these failures look like in the wild before running the audit yourself, we collected the 11 most common AI visibility failure modes we see in audits.

What does the 15-prompt framework cover?

The framework is five prompts per layer, fifteen total. Each prompt within a layer tests a distinct dimension of that layer's failure mode. The prompt construction matches the failure mode, which is why the layers are kept clean: brand-anchored prompts at Layers 1 and 3, category-only prompts at Layer 2.

The 15-prompt framework: 5 prompts per layer across entity recognition, visibility, and recommendation

The framework draws on two pieces of public research. Citation Labs identified four properties that make a BoFu prompt worth tracking: contrastive reasoning ("better," "worth it"), offer-anchoring (a specific brand named), category anchoring, and constraint clauses (Citation Labs, 2025). Omniscient's five universal BoFu patterns slot into the same Layer 3 set: pricing, comparison, social proof, fit, and verdict. Both frameworks are reflected in the prompts below.

# Layer Prompt
1.1 Entity Who is [your brand]?
1.2 Entity What does [your brand] do?
1.3 Entity Who founded [your brand]?
1.4 Entity What is [your brand] known for?
1.5 Entity Tell me about [your brand]'s product.
2.1 Visibility What are the best [category] tools for [ICP]?
2.2 Visibility Which [category] software is best for [use case]?
2.3 Visibility What's the best [category] for [tight niche]?
2.4 Visibility What [category] platforms are most popular right now?
2.5 Visibility What tools help with [buyer's problem]?
3.1 Recommendation How much does [your brand] cost?
3.2 Recommendation How does [your brand] compare to [competitor]?
3.3 Recommendation What do users say about [your brand]?
3.4 Recommendation Is [your brand] good for [my use case]?
3.5 Recommendation Is [your brand] worth it?

Run each prompt three to five times, average the results, then read the patterns. Three runs is the minimum that surfaces variance; five gives you confidence on the top entry. Single-run readings are how marketers convince themselves they are winning when they are not.

Layer 1: Entity Recognition — Does AI Know Your Brand Exists?

Layer 1 is the foundation. It tests whether AI has your brand in its model at all, whether the description is accurate, and whether the founder, founding date, and product knowledge are right. Fail it, and Layers 2 and 3 are moot. In our 40-brand Layer 1 audit (April 2026, run on both gpt-4o and gpt-5.2), 28 of 40 brands (70%) had at least one Layer 1 failure. The most common failures were entity foundation missing (no Knowledge Graph entry) and a pattern called CONFUSED_IDENTITY: the LLM picking the wrong company with the same name and describing it confidently.

A diagram showing the three sub-levels within Layer 1: entity foundation, training data, and web search

Layer 1 prompts: 5 brand-anchored questions to test if AI recognizes your brand

Layer 1 actually splits into three sub-levels, each with a different fix lever:

For most teams, the audit treats these as one layer. For advanced teams diagnosing root causes, the sub-distinction tells you whether the fix is structural, historical, or live-retrieval. Most fixes start with the first: get a Wikipedia page, add Organization and Founder schema, get founder profiles consistent across LinkedIn, podcasts, and press.

What our 40-brand audit found about Layer 1

Three findings from running the Layer 1 audit on 40 SaaS brands across two GPT generations:

1. The Knowledge Graph is the binding constraint, and it does not move with model upgrades. The same 12 brands cleanly passed all three sub-tests on both gpt-4o and gpt-5.2:

All 12 have high-confidence Knowledge Graph entries (resultScore over 100). The model upgrade improved recall, but the Layer 1 strict pass count was identical. For a brand pursuing AI visibility, fixing the Knowledge Graph entity is the highest-leverage Layer 1 action; it gates everything downstream.

The Knowledge Graph bottleneck.
The Knowledge Graph bottleneck. Same 12 brands cleanly pass all three Layer 1 sub-tests on both gpt-4o and gpt-5.2. The model upgrade improves recall on borderline cases, but the strict pass count is identical because the binding constraint is upstream of the LLM.

2. As LLMs get better at recall, the failure mode shifts from "never heard of you" to "wrong company with your name." On gpt-4o, NOT_RECOGNIZED ("I'm not familiar with...") accounted for 65% of Layer 1 failures. On gpt-5.2, that dropped to 52%, while CONFUSED_IDENTITY rose to nearly half of all failures. As model recall improves, generic brand naming becomes the dominant failure mode. This is the bad kind of progress: a confidently wrong answer is more dangerous than an honest "I don't know."

3. Generic-named brands stay broken across model upgrades. Four brands in the cohort (Bud, Forge, Roark, Trim) fail CONFUSED_IDENTITY on both training-data and web-search tests, on both gpt-4o and gpt-5.2. The model picks Howard Roark, Atlassian Forge, Trim County in Ireland. Even a Series A startup with strong product traction stays invisible in AI when its name collides with a famous fictional character or an established corporate trademark. Generic naming is a Layer 1 visibility issue that no model upgrade is likely to fix.

The takeaway for your own Layer 1 work: prioritize the Knowledge Graph entry first. And if your brand name is generic (a common English word, a famous person, an existing trademark), assume AI visibility will be structurally harder for you than for a uniquely-named competitor.

The five Layer 1 prompts (1.1 through 1.5) each test a different dimension:

Layer 2: Visibility — Where Do You Sit on the Leaderboard?

Layer 2 is competitive intelligence in disguise. Every "best [category] for [ICP]" prompt reveals the leaderboard AI sees for that category, problem, or niche. The diagnostic value is not "do I show up?" (the passive question), it is "who does AI consider strongest in this space, and where do I rank against them?" (the active question). The active framing is what makes Layer 2 worth running quarterly.

Illustrative example of a ChatGPT response to 'What are the best AI search visibility tools for SaaS', showing a numbered list of brands
Illustrative example of a Layer 2 audit response. What a ChatGPT answer to a Layer 2 prompt might look like for the AI visibility tooling category. Actual responses vary across runs, models, and prompt phrasings — this is not a verbatim screenshot.

Layer 2 prompts: 5 category-only questions that reveal AI's leaderboard for your space

Each Layer 2 prompt reveals a different leaderboard. These are not five versions of the same ranking. They are five competitive landscapes running in parallel.

A brand can be #1 on the head term and invisible on the problem-led version. The pattern of where you DO rank versus where you do not tells you which content gap to close first. This is also where most AI search volume actually lives. Every "best X for Y" query a buyer types into ChatGPT routes through Layer 2, and most brands have invisible gaps here because they assume Google rankings carry over. They do not. AI sources from Reddit, "best of" listicles, comparison sites, and podcast transcripts in addition to traditional SEO surfaces, and that mix can surface different brands than Google does.

When you run Layer 2 across multiple platforms, the leaderboards diverge significantly. We covered the workflow for tracking this across ChatGPT, Claude, and Perplexity together in a separate guide on multi-platform mention tracking.

Layer 3: Recommendation — Does AI Pick You and Hide Concerns?

Layer 3 is the selection filter and the layer where deals are won or lost in real time. By the time a buyer is at Layer 3, they already know you exist and they have surfaced you in their research. AI is the last filter before they commit. If it hedges, hallucinates pricing, or describes a competitor more favorably, you lose deals you almost won. And you never find out why.

Layer 3 prompts: 5 BoFu questions that test if AI picks you and hides concerns about you

Layer 3 is not a single test. It splits into two distinct lenses, each diagnosing a different AI behavior:

Lens 1 is the Favorite Test. When forced to compare you against a named competitor, what does AI do? Tested by 3.2 ("How does X compare to Y?") and 3.5 ("Is X worth it?").

Lens 2 is the Concerns Test. When asked about you in isolation, what reservations does AI quietly surface? Tested by 3.1, 3.3, 3.4, and 3.5.

Both lenses kill deals through different mechanisms and require different fixes. A brand can win Lens 1 (AI picks you head-to-head) and still lose Lens 2 because AI surfaces outdated complaints every time it describes you. A brand can pass Lens 2 cleanly (no surfaced concerns) and still lose Lens 1 because a competitor is AI's favorite and gets picked over you when forced to choose. A Layer 3 audit should produce two scores, not one.

# Prompt Lens Diagnoses
3.1 How much does [brand] cost? Concerns Pricing accuracy, hidden cost objections
3.2 How does [brand] compare to [competitor]? Favorite Head-to-head competitive framing
3.3 What do users say about [brand]? Concerns Outdated negative reviews surfacing
3.4 Is [brand] good for [use case]? Concerns Fit hedging vs. clean commitment
3.5 Is [brand] worth it? Both Verdict commitment plus surfaced caveats

The most common Layer 3 failure modes:

The second-order point on Layer 3 is that fixing it almost always means more PR, not more on-page SEO. Off-site authority compounds slowly, and the brand reputation AI sees lags reality by 1 to 3 quarters. Search Engine Land made this case directly: PR is becoming more essential for AI search visibility than traditional optimization because AI reads the entire web and weights publication authority.

How often should you re-run the AI visibility audit?

Monthly is the minimum cadence; weekly is right if your category is moving fast or your competitive landscape is shifting in real time. AI's answers move every few weeks as content gets indexed and competitors enter, so a quarterly audit catches the inflection too late to act on — by the time you spot the shift, you've lost two quarters of pipeline. The pattern of which prompts shift over time tells you whether your investments are working. If Layer 2 prompts improve month over month while Layer 3 stays flat, your off-site authority work is paying off, but your validation surface still has gaps. Manual workflows tap out around one to two brands at this cadence; that's the threshold where automated tracking starts paying for itself.

Quarterly trend in AI search interest, Q2 2025 to Q1 2026.
Quarterly trend in AI search interest, Q2 2025 to Q1 2026. "AI Visibility" grew 11.5× over the period, the fastest-growing of the AI-search-adjacent terms tracked.

You should also run the audit across multiple models. The same 15 prompts in ChatGPT, Claude, Perplexity, and Gemini can produce different leaderboards because each model has a different training mix and different live-retrieval logic. Cross-model agreement is anecdotally low; in our own testing, the top-brand pick diverges substantially across ChatGPT, Perplexity, and Claude. A rigorous cross-model study with published n is on our v3 audit backlog.

The category itself is maturing fast enough that tooling has shifted. Peec AI raised $21M Series A in November 2025 to build out an Actions feature. Profound is building enterprise multi-model tracking. Semrush AI Visibility Toolkit and Ahrefs Custom Prompt Tracking shipped to existing customer bases in the same window. The point is not "use a tool" (you can run this audit by hand). The point is that the category has serious money behind it now, which means the methodology is calcifying. Re-run after every major model update (GPT, Claude, or Gemini); earlier results may not transfer.

The free 15-prompt template (copy-paste version)

Copy the block below into a doc and fill in your placeholders: [brand], [category], [ICP], [use case], [tight niche], [buyer's problem], [competitor]. Then run each prompt 3 to 5 times in your model of choice and average the readings.

Layer 1 — Entity Recognition (run with web search OFF first, then ON)
1.1  Who is [your brand]?
1.2  What does [your brand] do?
1.3  Who founded [your brand]?
1.4  What is [your brand] known for?
1.5  Tell me about [your brand]'s product.

Layer 2 — Visibility (no brand name in any prompt)
2.1  What are the best [category] tools for [ICP]?
2.2  Which [category] software is best for [use case]?
2.3  What's the best [category] for [tight niche]?
2.4  What [category] platforms are most popular right now?
2.5  What tools help with [buyer's problem]?

Layer 3 — Recommendation (run head-to-head against your top 3 competitors)
3.1  How much does [your brand] cost?
3.2  How does [your brand] compare to [competitor]?
3.3  What do users say about [your brand]?
3.4  Is [your brand] good for [my use case]?
3.5  Is [your brand] worth it?

Score each prompt simply: pass, fail, or partial. Aggregate by layer. The pattern is the audit. If you would rather automate the run across ChatGPT, Claude, and Perplexity in parallel and track scores quarterly without doing it by hand, that is the problem friction AI was built to solve. The manual workflow above is genuinely enough to get a first read.

What should you do with your audit results?

Fix Layer 1 first, even if Layers 2 and 3 look worse on paper. The reason is sequencing. Layer 1 fixes (Wikipedia, schema, founder presence) compound forward into Layer 2 and 3 visibility, while Layer 2 and 3 fixes do nothing for Layer 1. Spending on a Reddit content campaign before AI knows your brand exists is a waste.

Once Layer 1 is clean, prioritize Layer 2 work for your three weakest prompts (the ones where you are invisible), not your three strongest (where you are already mentioned). The marginal return on closing a visibility gap is higher than the marginal return on improving an existing rank. Save Layer 3 fixes for last because they are the slowest to compound; PR cycles, fresh reviews, and updated case studies show up in AI's answers on 60 to 90 day lags.

Two adjacent reads if you want to go deeper. We laid out the 5 principles for choosing the prompts that matter: real buyer language beats marketing copy, problem-led queries beat category-led ones, and three more. The tactical companion is where to find the questions your buyers actually ask AI using Reddit, sales call mining, and support ticket pulls.

Frequently Asked Questions

What's the difference between AI visibility, AEO, and GEO?

AI visibility is the broad term for how brands appear in AI answers. Answer Engine Optimization (AEO) is the optimization discipline that improves citation in answer engines like ChatGPT, Perplexity, and Google AI Overviews. Generative Engine Optimization (GEO) is the same thing under a different name, used more often in academic and SEO-tool contexts. All three solve for the same outcome: AI mentions your brand when buyers ask.

How long does fixing Layer 1 entity recognition take?

Structural fixes (schema markup, Wikipedia, knowledge graph submissions) can show up in AI's web-search answers within days to weeks. Training-data fixes (what AI knows without web search) take 12 to 24 months because they only flow into the next training cycle. Web-search fixes (fresh content, recent press) appear fastest. Most teams see meaningful Layer 1 movement within a quarter if they prioritize the structural and live-retrieval levers together.

Should I run this audit on ChatGPT, Claude, or Perplexity first?

Start with whichever model your buyers use most, and that is usually ChatGPT for B2B SaaS audiences. Once you have a baseline, run the same 15 prompts in Claude and Perplexity. Cross-model variance is the second-most useful signal in the audit, because a leaderboard that agrees across all three is a much stronger signal of true rank than one that only shows up in ChatGPT.

Is the 15-prompt audit enough for enterprise brands?

The 15 prompts are the universal core. Enterprise brands with multiple products, multiple ICPs, or multiple geographies should run a separate audit per product or segment, not stack everything into one. Add 1 to 2 vertical-specific prompts on top (free trial questions for SaaS, shipping questions for e-commerce, insurance questions for healthcare). The framework holds; the inputs change.

Can I run this audit for free?

Yes. The manual workflow above costs nothing besides your time and a ChatGPT, Claude, and Perplexity account. Each platform has a free tier that supports the 15 prompts. Tooling automates the run across models, tracks results quarterly, and flags shifts, which matters at scale; it is not required to get a first read.

How does this differ from traditional SEO ranking checks?

Traditional rank tracking watches one thing: your position on Google for a keyword. The AI visibility audit watches three: whether you exist in the model, where you rank in AI's answer mix, and whether AI commits to recommend you. AI sources from Reddit, podcasts, and comparison sites in addition to traditional SEO surfaces, so a brand can rank #1 on Google and be invisible in ChatGPT (and the reverse). Both audits are useful; they measure different things.

What if my brand has multiple products or ICPs?

Run the audit per product, not per parent brand. ChatGPT might know HubSpot the company perfectly and be vague on HubSpot Marketing Hub specifically. Same for multi-ICP brands: if you sell to startups and enterprise, run separate audits because buyer language and AI's recommendation patterns shift completely between segments. The 15 prompts stay the same; the [brand], [category], and [ICP] inputs change per audit.


Methodology footnote. Layer 1 statistics in this post come from a 40-brand cohort audit conducted by friction AI in April 2026. Full methodology, dataset, raw response logs, and limitations live in the case study writeup. Quick version:


About the author. Joao da Silva is co-founder of friction AI alongside Camilla Wirth. friction AI tracks brand visibility across ChatGPT, Claude, Perplexity, and Gemini for SaaS and DTC brands. Joao writes about AI search, entity recognition, and the operational side of getting recommended by LLMs. Connect with him on LinkedIn.

Read on frictionai.co · View all posts