Key takeaways
- Tracking AI visibility and improving it are fundamentally different activities -- most tools only do the first one
- Vanity metrics like citation counts and mention volume rarely connect to business outcomes; the gap between what's measured and what matters is growing
- AI referral traffic has declined roughly 25% since its July 2025 peak, making traffic-based reporting increasingly unreliable
- The teams making real progress share a common pattern: they find content gaps, create content engineered for AI citation, and track whether it works
- A handful of platforms now support the full loop; most stop at the dashboard
There's a moment most marketing teams hit a few months into their AI visibility journey. The dashboard is live. The scores are updating. Someone has built a slide for the monthly report. And then... nothing changes.
The brand's share of voice in ChatGPT stays flat. Competitors keep showing up in Perplexity responses. The AI Overviews still cite the same three industry blogs that have been around since 2019. And the team has no clear idea what to do next.
This is the tracking trap. And in 2026, it's where a lot of otherwise capable teams are stuck.
What "tracking AI visibility" actually means
When people say they're tracking AI visibility, they usually mean one or more of these things:
- Running prompts through ChatGPT, Perplexity, Claude, or Gemini and noting whether their brand appears
- Counting how many times their domain gets cited in AI responses
- Measuring share of voice -- what percentage of responses mention them vs. competitors
- Watching a visibility score trend up or down week over week
These are real measurements. They're not useless. But they share a common limitation: they describe what's happening without telling you why, and they give you no path to changing it.
A citation count going up is good. But which page earned those citations? For which prompts? Against which competitors? And if the number goes down next month, what broke?
Most monitoring dashboards can't answer those questions. They're built to report, not to diagnose.

The metrics problem is getting worse, not better
Here's something that should make teams rethink their reporting setup: AI referral traffic -- the metric many teams built their early AI visibility dashboards around -- peaked in July 2025 at roughly 498,000 sessions and has since dropped about 25%.
That doesn't mean AI search is less important. It means the behavior has shifted. AI models are including fewer outbound links. Users are staying inside the AI interface longer. And some traffic is landing in embedded browsers that analytics platforms don't capture as referral sessions.

The vertical-level picture is even more interesting. Finance, legal, and health -- which led in AI traffic penetration through 2025 -- saw the sharpest declines. SaaS nearly quadrupled from 0.14% to 0.54% over the same period. The story isn't uniform, and a single traffic metric won't capture it.
What this means practically: if your AI visibility reporting is anchored to referral traffic as the primary KPI, you're measuring a signal that's becoming less reliable by the quarter. The teams that are ahead of this have moved to page-level citation tracking, prompt-level visibility scores, and -- critically -- content gap analysis that shows where they're losing to competitors before the traffic numbers reflect it.
Why most tools stop at step one
The AI visibility tool market has grown fast. There are now dozens of platforms that will show you a dashboard of your brand's mentions across ChatGPT, Perplexity, Google AI Overviews, and other models. Some are quite good at it.
But most of them stop there.
Otterly.AI

Profound

These tools can tell you that a competitor is appearing in 60% of responses to a high-value prompt while you appear in 12%. What they can't tell you is what content that competitor has that you don't, which specific pages are driving their citations, or what you'd need to publish to close the gap.
That's not a knock on monitoring -- you need to know where you stand before you can improve. But monitoring without a path to action is just an expensive way to watch competitors win.
The distinction matters more now because AI search is maturing. In 2024, just showing up in AI responses at all was a novelty. In 2026, brands that have been optimizing for AI citation for 18 months have a meaningful head start, and catching up requires more than a better dashboard.
What "improving AI visibility" actually requires
Improving AI visibility is a different kind of work. It requires understanding three things that monitoring tools typically don't surface:
What content is missing. AI models cite sources because those sources answer questions well. If a model isn't citing your brand for a particular prompt, it's usually because your website doesn't have content that adequately addresses what that prompt is asking. The gap isn't always obvious -- it might be a specific angle, a comparison, a use case, or a question your content technically covers but doesn't answer directly enough.
Which prompts are worth targeting. Not all prompts are equal. Some have high query volume and are currently dominated by competitors. Others are easier to break into. Without volume estimates and difficulty scoring, teams end up guessing -- often targeting the most obvious prompts while ignoring winnable ones.
Whether new content is working. Publishing content and waiting to see if citations improve is slow and imprecise. Page-level tracking -- seeing exactly which pages are being cited, by which models, and for which prompts -- closes the loop. Without it, you're optimizing blind.
This is why the most effective teams in 2026 aren't just running monitoring tools. They're running a cycle: find the gaps, create content that addresses them, track whether citations improve, repeat.
The tools that support the full loop
A small number of platforms have built toward this full cycle rather than stopping at monitoring.
Promptwatch is the most complete example. Its Answer Gap Analysis shows exactly which prompts competitors are visible for that you're not -- not just the fact that a gap exists, but the specific content your site is missing. From there, a built-in AI writing agent generates articles, listicles, and comparisons grounded in citation data from over 880 million analyzed citations. And page-level tracking shows whether the new content actually moves your visibility scores.

The other piece that most teams underestimate: AI crawler logs. Knowing which pages ChatGPT, Claude, and Perplexity are actually reading -- and which ones they're ignoring or encountering errors on -- is foundational to any optimization effort. If a crawler isn't reading a page, that page can't be cited. Most monitoring tools don't surface this at all.

AirOps and Search Atlas both take a more content-engineering approach to AI visibility, building workflows around creating content that's structured to get cited. They're worth evaluating if your team's primary bottleneck is content production rather than gap identification.
A comparison of where tools sit on the monitoring-to-optimization spectrum
| Tool | Monitors AI visibility | Gap analysis | Content generation | Page-level tracking | Crawler logs |
|---|---|---|---|---|---|
| Promptwatch | Yes (10 models) | Yes | Yes (AI writing agent) | Yes | Yes |
| Profound | Yes (9+ models) | Limited | No | Limited | No |
| Otterly.AI | Yes | No | No | No | No |
| Peec AI | Yes | No | No | No | No |
| AirOps | Partial | Yes | Yes | Partial | No |
| Search Atlas | Partial | Yes | Yes | Partial | No |
| AthenaHQ | Yes | Limited | No | No | No |
The pattern is clear: the further right you go in the table, the fewer tools support it. Most of the market is clustered in the first column.
What the teams making progress are doing differently
Across the brands that have meaningfully improved their AI visibility in 2026, a few patterns show up consistently.
They treat prompts like keywords. Every prompt a potential customer might type into ChatGPT or Perplexity is an opportunity -- or a gap. The teams that are winning have built prompt lists the same way SEO teams built keyword lists: by volume, by intent, by competitive difficulty. They know which prompts they own, which they're losing, and which are worth fighting for.
They publish content specifically for AI citation, not just for Google. This sounds obvious but the execution is different. Content that gets cited by AI models tends to be direct, structured, and comprehensive on a specific question. It's often not the same content that ranks well in traditional search. Teams that are repurposing SEO content and hoping it works in AI search are usually disappointed.
They track at the page level, not just the brand level. Brand-level visibility scores are useful for executive reporting. But the actual optimization work happens at the page level -- knowing that a specific article is being cited by Perplexity for three prompts but not by Claude, or that a competitor's comparison page is outperforming your equivalent page across five models. That granularity is what tells you what to fix.
They connect visibility to revenue. This is the hardest part and where most teams are still figuring things out. The cleanest approaches use a combination of traffic attribution (code snippet or server log analysis) and pipeline data to draw a line from AI citations to actual conversions. It's not perfect, but it's better than reporting visibility scores in isolation and hoping leadership connects the dots.
The honest state of AI visibility measurement in 2026
It's worth being clear about what's still genuinely hard. Attribution from AI search to revenue remains messy. Zero-click behavior means a lot of AI influence on purchase decisions never shows up in analytics at all. And the models themselves change frequently -- a content strategy that works well for Claude's current citation behavior might need adjustment after the next model update.
None of this means tracking is pointless. It means the teams that will be ahead in 12 months are the ones treating AI visibility as an optimization discipline -- not a reporting exercise. They're running experiments, publishing content, measuring what moves, and iterating. The dashboard is the starting point, not the destination.
The gap between teams that monitor and teams that optimize is widening. The tools to close it exist. The question is whether your team is using them.


