Guide · April 26, 2026 · 14 min read

How to Track AI Brand Mentions Across ChatGPT, Claude & Perplexity

A 45-minute manual workflow to track brand mentions across ChatGPT, Claude, and Perplexity. Plus the 2-hour-per-week threshold where automation pays for itself.

By Joao Da Silva, Co-Founder of friction AI

· April 26, 2026

TL;DR. Tracking brand mentions in one model is a tutorial. Tracking across three is a workflow. This guide walks the 45-minute manual process for ChatGPT, Claude, and Perplexity in one round, the spreadsheet that holds it together, and the threshold (about 2 hours per week) where automation pays for itself. No tools required to get started.

Three glass spheres on a cream background, each holding a different AI logo, representing multi-platform brand tracking

When a buyer evaluates your product through AI today, they probably do not stop at one model. They ask ChatGPT, copy the answer to Claude for a second opinion, then double-check on Perplexity because they want sourced citations. Your brand has to show up cleanly across all three, and the leaderboard you sit on in one is rarely identical to the leaderboard in the next.

Most marketers track only one model, and they pick the one their CMO uses. That is a fraction of the surface where buyers are evaluating you. This is the workflow we use to get the full picture in under an hour.

Why track brand mentions across multiple AI platforms?

Cross-model variance is the single biggest blind spot in AI brand tracking. ChatGPT and Perplexity surface different brand leaderboards from the same prompt. They tend to agree on the top recommended brand somewhere between 60% and 80% of the time, and disagree on positions 2 through 5 most of the time. A brand can be mentioned first in ChatGPT, fifth in Claude, and missing entirely in Perplexity. Tracking only one model hides two-thirds of the surface where buyers are evaluating you.

The three models source differently:

Each model exposes a different failure mode. If your brand is missing from ChatGPT but present in Perplexity, your training-data and entity-recognition surface is the gap. If you are present in ChatGPT but missing in Perplexity, your live web footprint is thin. The diagnostic value of running all three is what tells you which fix to invest in first. The full layered diagnostic methodology lives in the 15-prompt AI visibility audit; this guide focuses on the operational side of running it across multiple platforms.

What do you need before you start tracking?

Five things, none of them paid:

  1. A free account on each platform. ChatGPT, Claude, and Perplexity all have free tiers that work for the audit. Pro Search on Perplexity is not required for a first read.

  2. Your prompt set. Use the 15 prompts from the pillar audit as your starting set. Or build a custom 10 to 15 from sales call transcripts and Reddit thread titles.

  3. A spreadsheet template. Five columns: prompt, model, run number, mentioned (yes/no), position (1-N), notes.

  4. About 45 minutes of focused time. Three platforms × 15 prompts × 3 averaged runs = 135 prompts. With copy-paste workflow and parallel browser tabs, the round takes 35 to 50 minutes.

  5. A second monitor or split-screen setup. Not strictly required. It cuts the time roughly in half because you can run the next prompt while the current one is generating.

That is the full kit. No subscriptions, no API keys, no tools. The whole thing runs in a browser.

How do you build your prompt set?

Step 1 is to pick your 10 to 15 prompts. The cleanest starting point is the 15-prompt audit from the pillar. That set has five brand-anchored prompts (Layer 1), five category-only prompts (Layer 2), and five comparison and validation prompts (Layer 3). The prompts are intentionally short and natural-language, because that is how buyers actually phrase queries to AI.

If you want to customize, pull thread titles from your category subreddit and questions from your last 50 sales calls. The criterion for a tracking-worthy prompt comes from Citation Labs. It should have contrastive reasoning ("better," "worth it"), category anchoring, and a constraint clause (Citation Labs, 2025). HubSpot's Answer Engine Optimization guide makes the same point in different language: prompts that work for tracking are the ones a real buyer would type, not the ones a marketer would write. Generic "tell me about X" queries waste your run budget. They produce different answers every time without revealing useful patterns.

Whatever set you pick, freeze it. The point of tracking is to read deltas across runs and over time, and you cannot do that if the prompts move every quarter. Lock the list, and only swap a prompt if your category vocabulary genuinely shifts (a new use case becomes mainstream, a new competitor enters the head term, etc.).

For the rest of this guide, assume you are running the standard 15.

How do you track brand mentions in ChatGPT?

Step 2 is the ChatGPT pass. Open chat.openai.com, start a new chat for every prompt (this matters more than people realize because conversation memory contaminates subsequent answers), and paste the first prompt. Wait for the full response. Then in your spreadsheet, log:

Critical setting: run each prompt twice, once with the web search toggle OFF and once with it ON. The two answers will differ, often substantially. Web-search-off shows you what ChatGPT "knows" from training data; web-search-on shows you what it pulls live. This split is what tells you whether your gap is historical content authority or live retrieval. We covered the platform-specific setup in detail in how to track ChatGPT brand visibility, which is worth reading if ChatGPT is your highest-priority platform.

Run each prompt three times total (averaging for variance). If results vary wildly between the three runs, log the variance itself; high variance is a signal that AI is unsure about your brand, which is a Layer 1 finding.

How do you track brand mentions in Claude?

Step 3 is the Claude pass. Open claude.ai, again start a new conversation for every prompt, and paste the same prompt set you used for ChatGPT. Same scoring, same spreadsheet columns.

Two Claude-specific quirks to watch for:

Claude has the smallest population of public AI search benchmarks, so the third-party data on what "good" looks like here is thinner. Track your own deltas quarter over quarter rather than benchmarking against an external number.

How do you track brand mentions in Perplexity?

Step 4 is the Perplexity pass. Open perplexity.ai, use Quick Search (not Pro Search) for the first round so you have a fair baseline against the other two free-tier platforms. Same prompt set, same scoring.

Perplexity is structurally different from ChatGPT and Claude in two ways that matter for tracking:

For deeper coverage of Perplexity-specific tactics (filters, source-domain analysis, Pro Search differences), see our standalone guide on tracking brand visibility in Perplexity.

How do you score and aggregate your results?

Step 5 is aggregation. After three platforms × 15 prompts × 3 runs, you have 135 data points. Reduce them to a per-platform leaderboard score. Count how many of the 15 prompts produced a clean mention (averaged across the 3 runs). Then track your average position when mentioned.

A useful summary table looks like this:

Table 1: Illustrative example — what your output might look like.

Platform Prompts mentioned (of 15) Avg position when mentioned Layer 1 pass Layer 2 pass Layer 3 pass
ChatGPT 11 2.3 5 of 5 3 of 5 3 of 5
Claude 8 3.1 4 of 5 2 of 5 2 of 5
Perplexity 13 1.9 5 of 5 4 of 5 4 of 5

The pattern in the table tells you everything. In a hypothetical run like this one, Claude is the weakest surface. The gap concentrates in Layers 2 and 3 (visibility and recommendation). That means the fix is more off-site authority and refreshed third-party content, not on-page entity work. Perplexity is the strongest surface, which suggests live retrieval is healthier than training data.

Calculate one cross-platform metric: the percentage of prompts where all three models mention your brand. That number is your true coverage. From the brands we have looked at, this number sits consistently below 50%. Most brands have meaningful coverage in one or two models but not all three.

How do you read the patterns and prioritize fixes?

Step 6 is reading the patterns and choosing what to fix first. Three rules:

We catalogued the specific patterns to look for in our 11 AI visibility failure modes guide. It is a useful companion read for diagnosing what your raw data is telling you.

When should you graduate from manual to automated tracking?

The manual workflow is good. It costs nothing, takes under an hour per round, and produces clean directional data. It also breaks down at three thresholds:

  1. Time. If running the audit is taking more than 2 hours per week (across multiple brands or weekly cadence), the manual workflow is costing you more than tooling would.
  2. Brand count. Once you are tracking more than one brand (yours plus competitors, or multiple products), the manual workflow becomes unmanageable. The data points multiply linearly with brand count, but the spreadsheet hygiene work multiplies faster.
  3. Reporting cadence. If your CMO or board wants weekly or monthly numbers, you need automation. Quarterly is the natural manual cadence; weekly is automation territory.

Automation also unlocks two things manual cannot. The first is historical run-over-run deltas, with every change tracked automatically and timestamped. The second is multi-prompt aggregation across hundreds of variants, which is where the Omniscient and Citation Labs research on prompt patterns becomes operationally useful at scale (Omniscient Digital, 2025).

This is the problem friction AI was built to solve. We track the 15-prompt framework (and your custom prompts) across ChatGPT, Claude, Perplexity, and Gemini on a quarterly, monthly, or weekly schedule, and surface the deltas that matter. If you are already at the 2-hour-per-week threshold, the manual-to-automated math is straightforward. If you are not, run the manual workflow above and revisit when the spreadsheet stops fitting.

Frequently Asked Questions

How long does the full multi-platform tracking workflow take?

About 35 to 50 minutes per round for one brand and 15 prompts run three times across ChatGPT, Claude, and Perplexity. The biggest time drain is opening a new chat for every prompt. Running prompts in parallel browser tabs roughly halves the wall-clock time once you have practiced the workflow.

Can I use ChatGPT Plus, Claude Pro, and Perplexity Pro instead of free tiers?

Yes, but it changes the readings. Pro tiers route to different, typically larger models than the free tiers. Check each platform's settings page for the current lineup if you need to lock that variable for benchmarking. If your buyers are mostly using free tiers, free-tier readings are more representative. If your audience skews technical or pro-tier, run both and track them as separate data sets.

Should I include Gemini in the audit?

Yes if your buyers are heavy Google ecosystem users (Workspace, Android, AI Overviews surfaces). Skip it if your audience skews ChatGPT or Anthropic-first. Gemini's leaderboards differ enough from the other three that adding it is meaningful. The marginal time cost is another 15 to 20 minutes per round, which only pays off when your audience uses it.

What's the minimum useful prompt count?

Ten prompts is the floor. Below that, you cannot meaningfully separate Layer 1, 2, and 3 signals. Fifteen is the standard set we recommend (5 per layer). Above 30 prompts, the diagnostic value plateaus and the time cost becomes a tax on the workflow.

How do I track if my brand isn't mentioned at all in a prompt?

Log it as a "not mentioned" and capture which brands were mentioned, in what order. The competitive map is just as valuable as your own appearance data. Over time, the brands consistently appearing where you are absent are the ones whose third-party content surface you need to study.

Can I just check once a quarter and skip the run-three-times averaging?

You can, but you will get noisier readings. Single-run audits drift between 60% and 80% on the top brand from one minute to the next; averaging three runs gets you closer to the true signal. If time is the constraint, drop to three platforms × 15 prompts × 1 run (45 prompts total) rather than three platforms × 5 prompts × 3 runs.

What's the difference between this audit and just searching Google for my brand?

Google measures click-driven discovery; this audit measures recommendation-driven discovery. The two surfaces share inputs (Reddit, listicles, comparison content) but weight them very differently. A brand can rank #1 on Google for "best CRM" and be missing from ChatGPT, and the reverse. Both audits are useful; they answer different questions about how buyers are finding you.


About the author. Joao da Silva is co-founder of friction AI alongside Camilla Wirth. friction AI tracks brand visibility across ChatGPT, Claude, Perplexity, and Gemini for SaaS and DTC brands. Joao writes about AI search, entity recognition, and the operational side of getting recommended by LLMs. Connect with him on LinkedIn.

Read on frictionai.co · View all posts