The 15-Prompt AI Visibility Audit: Test Your Brand in ChatGPT, Claude & Perplexity

By Joao da Silva · April 26, 2026

TL;DR. AI visibility is three layers, not one. Most marketers test it by asking ChatGPT a single question, which is closer to a vibe check than a diagnostic. This 15-prompt audit (5 prompts per layer: entity recognition, visibility, recommendation) tells you exactly which layer is leaking deals and what to fix first. Free template at the bottom. Run it tonight.

Three frosted-glass panels filtering a stream of particles, a visual metaphor for the 3-layer AI visibility audit

When we audited 40 SaaS brands' Layer 1 visibility across two GPT generations (gpt-4o and gpt-5.2), only 12 of 40 (30%) cleanly passed all three sub-tests. The bottleneck was not the model. The same 12 brands passed under both generations; the model upgrade did not move the number. The binding constraint was the Knowledge Graph entity, which does not change when you upgrade your LLM. Every dollar a team spends on Layer 2 and Layer 3 work is wasted while their Layer 1 entity foundation is missing, because AI cannot recommend a brand it cannot recognize.

This guide walks the audit you can run yourself in under an hour, the 15 prompts you should track quarterly, and the playbook for what to fix once you see the results.

What is an AI visibility audit?

An AI visibility audit is a structured test of how large language models (ChatGPT, Claude, Perplexity, Gemini) recognize, rank, and recommend your brand. The audit measures three sequential layers of AI behavior, because a brand can pass one and silently fail another.

Unlike a Google rank check, AI visibility is a moving target. Search interest in "AI Visibility" grew 11.5x in the 12 months ending April 2026, and "Answer Engine Optimization" grew 5.9x (Google Trends, internal pull, Apr 2026).

The 15-prompt framework breaks the audit into three diagnostic filters that map to a buyer's journey through AI search.

A 3-layer funnel diagram showing existence, ranking, and selection as the three filters of AI visibility

The arc is existence, then ranking, then selection:

Entity Recognition (Foundation): does AI know you exist?
Visibility / ToFu (Leaderboard): who is strongest in your space, and where do you sit?
Recommendation / BoFu (Favorite): does AI pick you when forced to commit, and does it surface concerns when asked directly?

The diagnostic value of separating them is brutal. If you fail Layer 1, fixing your homepage copy will not help. If you win Layer 1 but lose Layer 2, your problem is off-site authority (Reddit, listicles, comparison content), not your website. If you win Layers 1 and 2 but lose Layer 3, you have a validation problem with the buyers who already know you. Diagnose before you fix; the wrong fix on the right problem still loses you another quarter.

Why isn't one ChatGPT query an audit?

One ChatGPT query is a single roll of a loaded die. AI responses vary across runs, across models, and across phrasings. Even between gpt-4o and gpt-5.2 in our 40-brand audit, individual brands flipped between PASS and FAIL. Linear was unrecognized by gpt-4o but correctly identified by gpt-5.2. Celest's web search ranking dropped on the newer model. A structured audit averaged across multiple runs produces direction. A single query produces anxiety.

The reference work here comes from Omniscient Digital. Their team analyzed 25,755 AI citations across 200 prompts and identified five universal BoFu prompt patterns that generalize across e-commerce, SaaS, services, and healthcare (Omniscient Digital, 2025). The takeaway is not that you need 200 prompts of your own. It is that the shape of a real audit is structured, repeated, and read across runs, not extracted from one screenshot.

If you want to see what these failures look like in the wild before running the audit yourself, we collected the 11 most common AI visibility failure modes we see in audits.

What does the 15-prompt framework cover?

The framework is five prompts per layer, fifteen total. Each prompt within a layer tests a distinct dimension of that layer's failure mode. The prompt construction matches the failure mode, which is why the layers are kept clean: brand-anchored prompts at Layers 1 and 3, category-only prompts at Layer 2.

The 15-prompt framework: 5 prompts per layer across entity recognition, visibility, and recommendation

The framework draws on two pieces of public research. Citation Labs identified four properties that make a BoFu prompt worth tracking: contrastive reasoning ("better," "worth it"), offer-anchoring (a specific brand named), category anchoring, and constraint clauses (Citation Labs, 2025). Omniscient's five universal BoFu patterns slot into the same Layer 3 set: pricing, comparison, social proof, fit, and verdict. Both frameworks are reflected in the prompts below.

#	Layer	Prompt
1.1	Entity	Who is [your brand]?
1.2	Entity	What does [your brand] do?
1.3	Entity	Who founded [your brand]?
1.4	Entity	What is [your brand] known for?
1.5	Entity	Tell me about [your brand]'s product.
2.1	Visibility	What are the best [category] tools for [ICP]?
2.2	Visibility	Which [category] software is best for [use case]?
2.3	Visibility	What's the best [category] for [tight niche]?
2.4	Visibility	What [category] platforms are most popular right now?
2.5	Visibility	What tools help with [buyer's problem]?
3.1	Recommendation	How much does [your brand] cost?
3.2	Recommendation	How does [your brand] compare to [competitor]?
3.3	Recommendation	What do users say about [your brand]?
3.4	Recommendation	Is [your brand] good for [my use case]?
3.5	Recommendation	Is [your brand] worth it?

Run each prompt three to five times, average the results, then read the patterns. Three runs is the minimum that surfaces variance; five gives you confidence on the top entry. Single-run readings are how marketers convince themselves they are winning when they are not.

Layer 1: Entity Recognition — Does AI Know Your Brand Exists?

Layer 1 is the foundation. It tests whether AI has your brand in its model at all, whether the description is accurate, and whether the founder, founding date, and product knowledge are right. Fail it, and Layers 2 and 3 are moot. In our 40-brand Layer 1 audit (April 2026, run on both gpt-4o and gpt-5.2), 28 of 40 brands (70%) had at least one Layer 1 failure. The most common failures were entity foundation missing (no Knowledge Graph entry) and a pattern called CONFUSED_IDENTITY: the LLM picking the wrong company with the same name and describing it confidently.

A diagram showing the three sub-levels within Layer 1: entity foundation, training data, and web search

Layer 1 prompts: 5 brand-anchored questions to test if AI recognizes your brand

Layer 1 actually splits into three sub-levels, each with a different fix lever:

Entity foundation is structural recognition. Does AI have you in its knowledge graph? Failure here means the fix is schema markup, Wikipedia presence, and structured data on your site.
Training data is what AI learned historically, before its training cutoff. You cannot retroactively change what current models know, but you can influence future training rounds by publishing high-authority content now (12 to 24 month horizons).
Web search / live retrieval is what AI fetches when search is enabled (ChatGPT Search, Perplexity, Claude with web). Failure here means a freshness gap: your indexable web footprint is thin or stale.

For most teams, the audit treats these as one layer. For advanced teams diagnosing root causes, the sub-distinction tells you whether the fix is structural, historical, or live-retrieval. Most fixes start with the first: get a Wikipedia page, add Organization and Founder schema, get founder profiles consistent across LinkedIn, podcasts, and press.

What our 40-brand audit found about Layer 1

Three findings from running the Layer 1 audit on 40 SaaS brands across two GPT generations:

1. The Knowledge Graph is the binding constraint, and it does not move with model upgrades. The same 12 brands cleanly passed all three sub-tests on both gpt-4o and gpt-5.2:

HubSpot, Pipedrive (CRM)
Asana, Monday.com, Notion, ClickUp (project management)
Mixpanel, Amplitude, Heap, Hotjar (product analytics)
Copy.ai, Writesonic (AI content)

All 12 have high-confidence Knowledge Graph entries (resultScore over 100). The model upgrade improved recall, but the Layer 1 strict pass count was identical. For a brand pursuing AI visibility, fixing the Knowledge Graph entity is the highest-leverage Layer 1 action; it gates everything downstream.

**The Knowledge Graph bottleneck.** Same 12 brands cleanly pass all three Layer 1 sub-tests on both gpt-4o and gpt-5.2. The model upgrade improves recall on borderline cases, but the strict pass count is identical because the binding constraint is upstream of the LLM.

2. As LLMs get better at recall, the failure mode shifts from "never heard of you" to "wrong company with your name." On gpt-4o, NOT_RECOGNIZED ("I'm not familiar with...") accounted for 65% of Layer 1 failures. On gpt-5.2, that dropped to 52%, while CONFUSED_IDENTITY rose to nearly half of all failures. As model recall improves, generic brand naming becomes the dominant failure mode. This is the bad kind of progress: a confidently wrong answer is more dangerous than an honest "I don't know."

3. Generic-named brands stay broken across model upgrades. Four brands in the cohort (Bud, Forge, Roark, Trim) fail CONFUSED_IDENTITY on both training-data and web-search tests, on both gpt-4o and gpt-5.2. The model picks Howard Roark, Atlassian Forge, Trim County in Ireland. Even a Series A startup with strong product traction stays invisible in AI when its name collides with a famous fictional character or an established corporate trademark. Generic naming is a Layer 1 visibility issue that no model upgrade is likely to fix.

The takeaway for your own Layer 1 work: prioritize the Knowledge Graph entry first. And if your brand name is generic (a common English word, a famous person, an existing trademark), assume AI visibility will be structurally harder for you than for a uniquely-named competitor.

The five Layer 1 prompts (1.1 through 1.5) each test a different dimension:

1.1 (Who is...) tests misidentification. Watch for AI confusing you with another company, especially if your brand name is short or shared.
1.2 (What does...do). The first sentence of AI's answer is what most buyers absorb. If that sentence is wrong, your homepage copy is not doing its job.
1.3 (Who founded...). Hallucinations here erode credibility silently. Founders being recognized bleeds into recommendation likelihood downstream.
1.4 (What is...known for). Highest-leverage Layer 1 prompt. If "known for X" does not match your positioning, third-party content is shaping the narrative without you.
1.5 (Tell me about...product). Note what AI omits. Missing recent features signal stale training data and weak retrieval coverage.

Layer 2: Visibility — Where Do You Sit on the Leaderboard?

Layer 2 is competitive intelligence in disguise. Every "best [category] for [ICP]" prompt reveals the leaderboard AI sees for that category, problem, or niche. The diagnostic value is not "do I show up?" (the passive question), it is "who does AI consider strongest in this space, and where do I rank against them?" (the active question). The active framing is what makes Layer 2 worth running quarterly.

Illustrative example of a ChatGPT response to 'What are the best AI search visibility tools for SaaS', showing a numbered list of brands — **Illustrative example of a Layer 2 audit response.** What a ChatGPT answer to a Layer 2 prompt might look like for the AI visibility tooling category. Actual responses vary across runs, models, and prompt phrasings — this is not a verbatim screenshot.

Layer 2 prompts: 5 category-only questions that reveal AI's leaderboard for your space

Each Layer 2 prompt reveals a different leaderboard. These are not five versions of the same ranking. They are five competitive landscapes running in parallel.

2.1 (Best [category] for [ICP]) tests the head term plus your buyer segment.
2.2 (Best for [use case]) tests use-case specificity. A brand can win the head term and still lose use-case-specific queries.
2.3 (Best for [tight niche]) is where smaller brands often win. Niche dominance shows up even when the brand is invisible at the head term.
2.4 (Most popular right now) tests temporal bias. AI defaults to incumbents. Newer brands fail this even when growing fast; the fix is press cycle and recency content.
2.5 (Tools that help with [buyer's problem]) tests problem-led discovery. If your homepage describes you in marketing language ("AI-powered platform for modern teams") instead of buyer language, this prompt fails.

A brand can be #1 on the head term and invisible on the problem-led version. The pattern of where you DO rank versus where you do not tells you which content gap to close first. This is also where most AI search volume actually lives. Every "best X for Y" query a buyer types into ChatGPT routes through Layer 2, and most brands have invisible gaps here because they assume Google rankings carry over. They do not. AI sources from Reddit, "best of" listicles, comparison sites, and podcast transcripts in addition to traditional SEO surfaces, and that mix can surface different brands than Google does.

When you run Layer 2 across multiple platforms, the leaderboards diverge significantly. We covered the workflow for tracking this across ChatGPT, Claude, and Perplexity together in a separate guide on multi-platform mention tracking.

Layer 3: Recommendation — Does AI Pick You and Hide Concerns?

Layer 3 is the selection filter and the layer where deals are won or lost in real time. By the time a buyer is at Layer 3, they already know you exist and they have surfaced you in their research. AI is the last filter before they commit. If it hedges, hallucinates pricing, or describes a competitor more favorably, you lose deals you almost won. And you never find out why.

Layer 3 prompts: 5 BoFu questions that test if AI picks you and hides concerns about you

Layer 3 is not a single test. It splits into two distinct lenses, each diagnosing a different AI behavior:

Lens 1 is the Favorite Test. When forced to compare you against a named competitor, what does AI do? Tested by 3.2 ("How does X compare to Y?") and 3.5 ("Is X worth it?").

Lens 2 is the Concerns Test. When asked about you in isolation, what reservations does AI quietly surface? Tested by 3.1, 3.3, 3.4, and 3.5.

Both lenses kill deals through different mechanisms and require different fixes. A brand can win Lens 1 (AI picks you head-to-head) and still lose Lens 2 because AI surfaces outdated complaints every time it describes you. A brand can pass Lens 2 cleanly (no surfaced concerns) and still lose Lens 1 because a competitor is AI's favorite and gets picked over you when forced to choose. A Layer 3 audit should produce two scores, not one.

#	Prompt	Lens	Diagnoses
3.1	How much does [brand] cost?	Concerns	Pricing accuracy, hidden cost objections
3.2	How does [brand] compare to [competitor]?	Favorite	Head-to-head competitive framing
3.3	What do users say about [brand]?	Concerns	Outdated negative reviews surfacing
3.4	Is [brand] good for [use case]?	Concerns	Fit hedging vs. clean commitment
3.5	Is [brand] worth it?	Both	Verdict commitment plus surfaced caveats

The most common Layer 3 failure modes:

Pricing hallucination. A $99 product described as $999 silently kills deals. Make pricing pages crawlable and structured; avoid "contact sales" black holes.
Comparison ordering bias. Your competitor consistently appears first in vs-comparisons, and AI describes their attributes more favorably. Fix with your own published comparison content.
Outdated negatives. A two-year-old complaint that you have since fixed is still quietly killing deals today. The fastest fix is fresh G2/Capterra reviews and updated case studies that explicitly address the old narrative. We wrote a deeper playbook for improving how AI describes your sentiment when this is your dominant failure mode.
The hedge. AI uses "It depends..." or "Some users say..." softening when answering 3.4 or 3.5. Soft loss, not a clean recommendation. Indicates weak conviction in the source data; the fix is stronger third-party advocacy (named case studies, podcast appearances, expert quotes).

The second-order point on Layer 3 is that fixing it almost always means more PR, not more on-page SEO. Off-site authority compounds slowly, and the brand reputation AI sees lags reality by 1 to 3 quarters. Search Engine Land made this case directly: PR is becoming more essential for AI search visibility than traditional optimization because AI reads the entire web and weights publication authority.

How often should you re-run the AI visibility audit?

Monthly is the minimum cadence; weekly is right if your category is moving fast or your competitive landscape is shifting in real time. AI's answers move every few weeks as content gets indexed and competitors enter, so a quarterly audit catches the inflection too late to act on — by the time you spot the shift, you've lost two quarters of pipeline. The pattern of which prompts shift over time tells you whether your investments are working. If Layer 2 prompts improve month over month while Layer 3 stays flat, your off-site authority work is paying off, but your validation surface still has gaps. Manual workflows tap out around one to two brands at this cadence; that's the threshold where automated tracking starts paying for itself.

Quarterly trend in AI search interest, Q2 2025 to Q1 2026. "AI Visibility" grew 11.5× over the period, the fastest-growing of the AI-search-adjacent terms tracked.

You should also run the audit across multiple models. The same 15 prompts in ChatGPT, Claude, Perplexity, and Gemini can produce different leaderboards because each model has a different training mix and different live-retrieval logic. Cross-model agreement is anecdotally low; in our own testing, the top-brand pick diverges substantially across ChatGPT, Perplexity, and Claude. A rigorous cross-model study with published n is on our v3 audit backlog.

The category itself is maturing fast enough that tooling has shifted. Peec AI raised $21M Series A in November 2025 to build out an Actions feature. Profound is building enterprise multi-model tracking. Semrush AI Visibility Toolkit and Ahrefs Custom Prompt Tracking shipped to existing customer bases in the same window. The point is not "use a tool" (you can run this audit by hand). The point is that the category has serious money behind it now, which means the methodology is calcifying. Re-run after every major model update (GPT, Claude, or Gemini); earlier results may not transfer.

The free 15-prompt template (copy-paste version)

Copy the block below into a doc and fill in your placeholders: [brand], [category], [ICP], [use case], [tight niche], [buyer's problem], [competitor]. Then run each prompt 3 to 5 times in your model of choice and average the readings.

Layer 1 — Entity Recognition (run with web search OFF first, then ON)
1.1  Who is [your brand]?
1.2  What does [your brand] do?
1.3  Who founded [your brand]?
1.4  What is [your brand] known for?
1.5  Tell me about [your brand]'s product.

Layer 2 — Visibility (no brand name in any prompt)
2.1  What are the best [category] tools for [ICP]?
2.2  Which [category] software is best for [use case]?
2.3  What's the best [category] for [tight niche]?
2.4  What [category] platforms are most popular right now?
2.5  What tools help with [buyer's problem]?

Layer 3 — Recommendation (run head-to-head against your top 3 competitors)
3.1  How much does [your brand] cost?
3.2  How does [your brand] compare to [competitor]?
3.3  What do users say about [your brand]?
3.4  Is [your brand] good for [my use case]?
3.5  Is [your brand] worth it?

Score each prompt simply: pass, fail, or partial. Aggregate by layer. The pattern is the audit. If you would rather automate the run across ChatGPT, Claude, and Perplexity in parallel and track scores quarterly without doing it by hand, that is the problem friction AI was built to solve. The manual workflow above is genuinely enough to get a first read.

What should you do with your audit results?

Fix Layer 1 first, even if Layers 2 and 3 look worse on paper. The reason is sequencing. Layer 1 fixes (Wikipedia, schema, founder presence) compound forward into Layer 2 and 3 visibility, while Layer 2 and 3 fixes do nothing for Layer 1. Spending on a Reddit content campaign before AI knows your brand exists is a waste.

Once Layer 1 is clean, prioritize Layer 2 work for your three weakest prompts (the ones where you are invisible), not your three strongest (where you are already mentioned). The marginal return on closing a visibility gap is higher than the marginal return on improving an existing rank. Save Layer 3 fixes for last because they are the slowest to compound; PR cycles, fresh reviews, and updated case studies show up in AI's answers on 60 to 90 day lags.

Two adjacent reads if you want to go deeper. We laid out the 5 principles for choosing the prompts that matter: real buyer language beats marketing copy, problem-led queries beat category-led ones, and three more. The tactical companion is where to find the questions your buyers actually ask AI using Reddit, sales call mining, and support ticket pulls.

Frequently Asked Questions

What's the difference between AI visibility, AEO, and GEO?

AI visibility is the broad term for how brands appear in AI answers. Answer Engine Optimization (AEO) is the optimization discipline that improves citation in answer engines like ChatGPT, Perplexity, and Google AI Overviews. Generative Engine Optimization (GEO) is the same thing under a different name, used more often in academic and SEO-tool contexts. All three solve for the same outcome: AI mentions your brand when buyers ask.

How long does fixing Layer 1 entity recognition take?

Structural fixes (schema markup, Wikipedia, knowledge graph submissions) can show up in AI's web-search answers within days to weeks. Training-data fixes (what AI knows without web search) take 12 to 24 months because they only flow into the next training cycle. Web-search fixes (fresh content, recent press) appear fastest. Most teams see meaningful Layer 1 movement within a quarter if they prioritize the structural and live-retrieval levers together.

Should I run this audit on ChatGPT, Claude, or Perplexity first?

Start with whichever model your buyers use most, and that is usually ChatGPT for B2B SaaS audiences. Once you have a baseline, run the same 15 prompts in Claude and Perplexity. Cross-model variance is the second-most useful signal in the audit, because a leaderboard that agrees across all three is a much stronger signal of true rank than one that only shows up in ChatGPT.

Is the 15-prompt audit enough for enterprise brands?

The 15 prompts are the universal core. Enterprise brands with multiple products, multiple ICPs, or multiple geographies should run a separate audit per product or segment, not stack everything into one. Add 1 to 2 vertical-specific prompts on top (free trial questions for SaaS, shipping questions for e-commerce, insurance questions for healthcare). The framework holds; the inputs change.

Can I run this audit for free?

Yes. The manual workflow above costs nothing besides your time and a ChatGPT, Claude, and Perplexity account. Each platform has a free tier that supports the 15 prompts. Tooling automates the run across models, tracks results quarterly, and flags shifts, which matters at scale; it is not required to get a first read.

How does this differ from traditional SEO ranking checks?

Traditional rank tracking watches one thing: your position on Google for a keyword. The AI visibility audit watches three: whether you exist in the model, where you rank in AI's answer mix, and whether AI commits to recommend you. AI sources from Reddit, podcasts, and comparison sites in addition to traditional SEO surfaces, so a brand can rank #1 on Google and be invisible in ChatGPT (and the reverse). Both audits are useful; they measure different things.

What if my brand has multiple products or ICPs?

Run the audit per product, not per parent brand. ChatGPT might know HubSpot the company perfectly and be vague on HubSpot Marketing Hub specifically. Same for multi-ICP brands: if you sell to startups and enterprise, run separate audits because buyer language and AI's recommendation patterns shift completely between segments. The 15 prompts stay the same; the [brand], [category], and [ICP] inputs change per audit.

Methodology footnote. Layer 1 statistics in this post come from a 40-brand cohort audit conducted by friction AI in April 2026. Full methodology, dataset, raw response logs, and limitations live in the case study writeup. Quick version:

Sample (n=40): stratified across 20 G2 category leaders (CRM, project management, product analytics, AI content) and 20 Y Combinator W24/W25 B2B SaaS startups. The "average SaaS brand" is neither; the headline 30% pass rate is read against this specific sample frame, not the universe of all SaaS.
Three sub-tests per brand: entity foundation via the Google Knowledge Graph Search API, training-data recognition via OpenAI gpt-4o and gpt-5.2 (no tools), web-search recognition via the same models with web_search tool use forced.
Prompt scope: the LLM tests used a single brand-anchored prompt ("Who is [brand]?") corresponding to prompt 1.1 of the 15-prompt framework. Prompts 1.2 through 1.5 were not run in this cohort.
Scoring: single-rater, no inter-rater reliability check. 40 brands is directional rather than statistically powered for sub-segment claims.
Out of scope: Layer 2 (visibility) and Layer 3 (recommendation) were not tested. Other LLMs (Anthropic Claude, Google Gemini) were not tested.

About the author. Joao da Silva is co-founder of friction AI alongside Camilla Wirth. friction AI tracks brand visibility across ChatGPT, Claude, Perplexity, and Gemini for SaaS and DTC brands. Joao writes about AI search, entity recognition, and the operational side of getting recommended by LLMs. Connect with him on LinkedIn.