Introducing Gorilla AI Log

Home / Gorilla News / Introducing Gorilla AI Log
Kyle Clifford
16 March 2026
Read Time: 12
Article Summary

AI crawlers are hitting your website every single day and your analytics tools can’t see them. Gorilla AI Log reads your server’s raw access logs to identify every AI, search engine and SEO bot visiting your site. Here’s how it works, what each bot actually means, and why server logs are the most accurate data you’ll ever get.

Key Takeaways

AI crawlers are hitting your website right now. You just can’t see them.

Right now, as you read this, AI systems are crawling your website. ChatGPT, Claude, Perplexity, Gemini, and dozens of other bots are requesting your pages, reading your content, and feeding it into their models. Some are scraping your content for training data. Others are pulling your pages in real-time because someone just asked an AI chatbot a question about your business.

And you almost certainly have no idea it’s happening.

Google Analytics doesn’t show it – it filters out bot traffic by design. Google Search Console gives you a limited subset of Googlebot data, delayed by days. Third-party SEO tools estimate and sample. None of them show you the full picture.

The only place this data actually lives – the complete, unfiltered, real-time record of every single bot that hits your server – is your server’s raw access logs. And virtually nobody is reading them.

That’s why we built Gorilla AI Log.

The Problem: You’re Being Crawled and You Don’t Know It

The volume of AI crawler traffic has exploded. Cloudflare’s 2025 data showed GPTBot traffic increasing by 305% year-on-year. PerplexityBot surged by over 157,000%. Overall AI bot traffic grew roughly 300% during 2025 alone. ByteDance’s Bytespider, OpenAI’s various bots, Anthropic’s ClaudeBot, Meta’s ExternalAgent – the list grows every quarter.

Meanwhile, AI crawlers now account for a measurable share of all web traffic. Hostinger’s analysis of 66.7 billion bot requests showed OpenAI’s search crawler coverage growing from 4.7% to over 55% of sites in their sample. These aren’t fringe bots hitting a handful of pages – they’re systematic, high-volume crawlers consuming your content at scale.

And here’s the critical gap: your existing analytics tools are blind to all of it. GA4 filters bots out. Search Console only reports on Googlebot (and even then, with significant delay and sampling). Third-party monitoring tools rely on JavaScript tags that most AI bots don’t even execute. You’re flying completely blind on one of the most significant shifts in how your content is being consumed.

What Is Gorilla AI Log?

Gorilla AI Log is a self-hosted, password-protected dashboard that reads your server’s raw access logs and identifies every AI crawler, search engine bot, and SEO tool visiting your site. It’s two files – an HTML dashboard and a PHP API – uploaded directly to your cPanel hosting. No third-party services. No external JavaScript tags. No data leaving your server.

The PHP backend parses your server’s access logs in real-time, matching user agent strings against a comprehensive database of known bots. The frontend visualises everything in a clean, interactive dashboard: hit volumes, daily trends, hourly patterns, crawler breakdowns, status codes, top pages, and individual bot behaviour.

Because it works directly with cPanel hosting, you get access to historic log data too – not just what’s happening today, but what’s been happening for weeks or months, depending on your host’s log retention. And it feeds directly into Gorilla Reports, our reporting platform, giving you ongoing crawler intelligence as part of your regular performance reporting.

The setup is straightforward: we place a file on your server that speaks to the platform. It works exclusively with cPanel hosting environments, reading the access logs that your server is already generating. No additional server configuration, no ongoing maintenance, no third-party dependencies.

The Bots: What They Are and What They Mean

Not all bots are created equal. Each serves a fundamentally different purpose, and understanding the distinction is essential if you want to make informed decisions about your content’s visibility in the AI ecosystem.

Gorilla AI Log categorises every detected bot into one of three types – AI, Search, or SEO Tool – and further distinguishes between automated crawlers and user-triggered fetches. That second distinction is particularly important: it’s the difference between a bot scraping your content for training data and a real person asking an AI chatbot about your business.

Gorilla AI Log - All Detected Crawlers table showing AI, Search and SEO Tool bots with hit volumes, crawler vs fetch breakdown, unique pages, and date ranges

AI Crawlers – Training Bots

These are automated bots that systematically crawl the web to collect content for AI model training. They run on schedules, follow links, and gather data at scale. Seeing these in your logs means your content is being consumed as training material for large language models.

Bot Name Company What It Does
GPTBot OpenAI Collects content for training GPT models (GPT-4, GPT-5 etc). Does not index for search results. Blocking this stops your content being used in OpenAI’s model training but won’t affect ChatGPT search visibility.
ClaudeBot Anthropic Collects web content for Claude AI model training. Blocking signals your content should be excluded from future Anthropic training datasets.
anthropic-ai Anthropic Deprecated predecessor to ClaudeBot – still appears in some older log entries. Same purpose: model training data collection.
Google-Extended Google Training crawler for Google’s Gemini AI models. Entirely separate from Googlebot – blocking this has zero impact on your Google Search rankings.
Bytespider ByteDance ByteDance’s AI data collector, linked to TikTok and related AI systems. Was historically one of the most aggressive crawlers on the web.
CCBot Common Crawl Open dataset crawler. Common Crawl’s dataset is used as foundational training data by many AI companies.
Meta-ExternalAgent Meta Meta’s bot for collecting data to train and fine-tune their large language models (Llama etc).
FacebookBot Meta Meta’s general web crawler, used for content understanding and AI applications.
Amazonbot Amazon Amazon’s crawler for AI and search applications including Alexa and Amazon’s product recommendations.
cohere-ai Cohere Training crawler for Cohere’s enterprise AI models.
Diffbot Diffbot Knowledge graph builder – structures web data into machine-readable formats for AI applications.
AppleBot-Extended Apple Apple’s AI training crawler for Apple Intelligence features. Separate from regular Applebot (Siri/Spotlight search).
YouBot You.com Crawler for You.com’s AI-powered search engine.
ImagesiftBot Imagesift Image-focused AI data collection crawler.

AI Crawlers – Search & Retrieval Bots

These bots index your content for AI-powered search features. When someone searches within ChatGPT, Claude, or Perplexity, these are the bots that have pre-indexed your pages so they can appear as cited sources in the results.

Bot Name Company What It Does
OAI-SearchBot OpenAI Indexes content for ChatGPT’s search features. Allowing this means your pages can appear as cited, linked sources when users search within ChatGPT. Blocking it removes your site from ChatGPT search results.
PerplexityBot Perplexity Powers Perplexity’s AI answer engine. Perplexity relies heavily on real-time web data to generate cited answers, making this one of the more active AI search crawlers.
Gemini Google Google’s Gemini-specific crawler for AI search and retrieval features.

AI Crawlers – User-Triggered Fetches

This is the category that matters most commercially. These aren’t automated crawlers running on a schedule – these fire when a real person asks an AI chatbot a question and the AI fetches your page in real-time to answer it.

If you see ChatGPT-User in your logs hitting a specific page, that means someone typed a question into ChatGPT, and ChatGPT pulled your page to formulate the answer. That’s the strongest signal of AI visibility you can get – your content is being actively referenced in live conversations.

Bot Name Company What It Does
ChatGPT-User OpenAI Fires when a real user asks ChatGPT a question and it fetches your page live to answer. This is the strongest signal of AI visibility – someone actively asked about your content or business.
Claude-Web Anthropic Same concept for Claude – a user asked Claude something and it pulled your page in real-time to inform the response. (Being replaced by the newer Claude-User agent.)

The critical takeaway: the difference between GPTBot and ChatGPT-User is the difference between your content being passively scraped for training data and your content being actively served as an answer to a real person’s question. Gorilla AI Log distinguishes between these automatically, so you can see both the volume and the nature of your AI crawler traffic.

Traditional Search Engine Crawlers

The bots you’re already familiar with. Gorilla AI Log tracks these alongside AI crawlers so you get a complete picture of all bot activity on your site in one dashboard.

Bot Name Company What It Does
Googlebot Google Core search indexing crawler. The primary bot responsible for indexing your pages for Google Search results.
Googlebot-Image Google Specifically crawls and indexes images for Google Image Search.
Googlebot-Video Google Indexes video content for Google Video and YouTube search results.
Googlebot-News Google Indexes content for Google News. Only relevant if your site is a registered news publisher.
Storebot-Google Google Product and merchant page indexing for Google Shopping results.
AdsBot-Google Google Ads Checks landing page quality for Google Ads campaigns. Important for maintaining ad quality scores.
Mediapartners-Google Google Ads AdSense content matching – analyses your pages to serve relevant display ads.
bingbot Microsoft Bing search indexing. Also powers Microsoft Copilot’s web search results.
Applebot Apple Siri and Spotlight search indexing. Separate from AppleBot-Extended (AI training).
YandexBot Yandex Russia’s primary search engine crawler.
DuckDuckBot DuckDuckGo Privacy-focused search engine crawler.
Baiduspider Baidu China’s largest search engine crawler.
PetalBot Huawei Huawei’s search engine crawler.
Sogou Sogou Chinese search engine crawler.

SEO Tool Crawlers

Third-party SEO platforms that crawl your site for competitive analysis, backlink data, and domain metrics. These aren’t indexing you for search results – they’re building their own databases for their paid tools.

Bot Name Company What It Does
SemrushBot Semrush Crawls for SEO analytics, keyword data, and competitive research intelligence.
AhrefsBot Ahrefs Backlink analysis and SEO data collection. One of the more frequent SEO tool crawlers.
MJ12bot Majestic Link intelligence and trust flow metrics.
DotBot Moz Domain authority calculation and SEO metrics.

What Gorilla AI Log Shows You

The dashboard is built to give you both the high-level overview and the granular detail, without needing to switch between tools or dig through raw log files manually.

Gorilla AI Log dashboard charts - Hourly Distribution bar chart, AI vs Search vs Tools doughnut chart, Crawler Leaderboard horizontal bar chart, and HTTP Status Codes breakdown

Overview Stats

Six stat cards at the top give you the immediate picture: total crawler hits, AI crawler hits (with percentage of total), search crawler hits, SEO tool hits, the number of unique crawlers detected, and the date range of the log data being analysed.

Daily Timeline

A stacked line chart showing crawler visits per day, broken down by crawler group. This is where you spot trends – a sudden spike in GPTBot activity, a new crawler appearing for the first time, or seasonal patterns in how bots interact with your site.

Hourly Distribution

When are crawlers most active on your site? The bar chart shows the distribution across a 24-hour period in server time. Useful for understanding crawl patterns and identifying if bots are hitting your server during peak traffic hours.

AI vs Search vs Tools Split

A doughnut chart showing the proportion of crawler types. At a glance, you can see whether AI crawlers are dominating your bot traffic or if traditional search bots still account for the majority.

Crawler Leaderboard

The top crawlers ranked by hit volume. This tells you immediately which bots are consuming the most server resources and bandwidth.

HTTP Status Code Breakdown

A doughnut chart plus visual status bar showing what responses your server is returning to crawlers. Healthy sites should see predominantly 200 (OK) and 301/302 (redirects). A significant proportion of 404s or 5xx errors is a problem that needs addressing.

Full Crawler Table

Every detected crawler listed with its type, total hits, crawler vs fetch breakdown, unique pages crawled, first/last seen dates, and top status code. Click any row to expand and see the bot breakdown (individual bot variants within a group), every page that crawler visited, the full user agent string, and a drillable status code breakdown showing exactly which URLs returned which status codes.

Top Pages – All Crawlers & AI-Specific

Two side-by-side panels showing the most crawled pages across all bots and the most crawled pages specifically by AI bots. Both are searchable and filterable. The “Hide WP core files” toggle strips out WordPress system files so you can focus on the content URLs that actually matter.

Gorilla AI Log - Most Crawled Pages and AI Crawler Focus Pages side-by-side panels showing top URLs by hit volume with search filters

Google Crawl Intelligence: The Real Power

While the AI crawler data is the headline feature, the Google crawl intelligence you get from Gorilla AI Log is arguably just as powerful for day-to-day SEO work.

When Googlebot hits your pages, the access log records the exact URL requested, the exact timestamp, and the exact HTTP status code your server returned. This means you can see things that Search Console either doesn’t show you or shows you days later with significant sampling.

Gorilla AI Log - Google crawler detail view showing bot breakdown, all crawled pages with hit counts, user agent strings, and status code drilldown with clickable URL lists for 200, 301, 304, and 404 responses

Track When New Content Gets Crawled

Published a new page or blog post? You can see precisely when Googlebot first discovers and crawls it. Cross-reference that with when it first appears in search results and you have real data on your crawl-to-index pipeline. If new content is taking days or weeks to get crawled, that’s actionable intelligence – you may need to address crawl budget, internal linking, or sitemap configuration.

Find Every 404 Google Is Hitting

This is one of the highest-value insights. When Googlebot requests a URL and gets a 404 response, it’s wasting crawl budget on a page that doesn’t exist. Gorilla AI Log shows you every single one of these, with the exact URL and the number of times Google has tried to crawl it. You can drill down into the status code breakdown for Googlebot specifically and see the full list of 404 URLs – then fix them with redirects, restore the content, or clean up the references.

Understand Crawl Frequency

Pages that Google crawls frequently are pages Google considers important. Pages that barely get crawled may be thin, poorly linked, or deprioritised. The hit data per page tells you exactly where Google is spending its crawl budget on your site. Compare that against your own priorities – if your most important commercial pages are getting crawled less than your blog archives or category pages, your internal linking strategy needs work.

Spot Crawl Budget Waste

WordPress sites are notorious for generating crawlable URLs that shouldn’t be crawled: parameter URLs, paginated archives, tag pages, author pages, feed URLs, and wp-admin paths. The “Hide WP core files” filter in Gorilla AI Log strips these out of the view, but when you turn it off you can see exactly how much of Google’s crawl budget is being burned on non-content URLs. For large sites, this can be a significant proportion of total crawl activity.

Monitor All Search Variants

Gorilla AI Log doesn’t just track “Googlebot” as a single entity. It breaks out Googlebot-Image, Googlebot-Video, Googlebot-News, Storebot-Google, and AdsBot-Google as separate crawlers. This means you can see specifically when Google’s image crawler is indexing your visual content, when AdsBot is checking your landing pages, and whether Google News is picking up your articles.

Why Server Logs Are the Most Accurate Data Source

This is the single most important point about Gorilla AI Log: it reads data directly from your server’s access logs. There is no more accurate data source in existence for understanding what’s hitting your website.

Every other analytics and monitoring approach works through a middleman, and every middleman introduces inaccuracy.

Google Analytics (GA4)

Requires JavaScript execution to record a visit. AI crawlers do not execute JavaScript – confirmed by multiple studies including Vercel and MERJ’s analysis of over half a billion GPTBot fetches, which found zero evidence of JS execution. GA4 also filters out known bot traffic by default and is subject to ad blockers, consent banners, and sampling at high volumes. Bot traffic is invisible in GA4 by design.

Google Search Console

Only reports on Googlebot activity. Doesn’t show AI crawlers, Bing, SEO tools, or any other bots. Data is delayed by several days. Coverage is sampled – you’re not seeing every crawl event. And it gives you no status code breakdown by URL, no hourly distribution, and no way to see which specific pages are returning errors to Googlebot.

Third-Party Monitoring Tools

SEO platforms and AI crawler monitors typically rely on JavaScript tags, API integrations, or estimated data. Many can’t see bots that don’t render JS. Some rely on aggregated data from their own customer base rather than your actual server data. None of them have access to the definitive record of what actually happened on your server.

Server Access Logs: The Ground Truth

Your server’s access logs record every single HTTP request that reaches your server. No JavaScript required. No sampling. No filtering. No delay. Every bot visit is logged with the IP address, timestamp, exact URL requested, HTTP method, status code returned, response size, and full user agent string.

This is the definitive, forensic-grade record of what happened. When a bot hits your server, the log captures it regardless of whether the bot executes JavaScript, regardless of whether your analytics tag fired, regardless of ad blockers or consent banners. It’s the raw, unmanipulated truth.

Gorilla AI Log reads this data directly. It doesn’t estimate, sample, or infer. It parses the actual log entries your server wrote and presents them in a format you can act on. You cannot get more accurate data than this.

Actionable Insights: What You Can Do With This Data

Gorilla AI Log isn’t just a monitoring tool – it gives you data you can act on immediately.

Make Informed robots.txt Decisions

The days of “block all AI crawlers” are over. With Gorilla AI Log, you can see exactly which bots are hitting your site and make granular decisions. You might want to block GPTBot (training) while allowing OAI-SearchBot (search visibility). Or allow ClaudeBot but block Bytespider. The data shows you what’s actually crawling your site, so you can make decisions based on evidence rather than guesswork.

Fix Crawl Errors

Every 404 a search engine or AI bot hits is wasted opportunity. The status code drilldown shows you exactly which URLs are returning errors, how often, and to which crawlers. Fix these with proper redirects and you recover crawl budget and potential visibility.

Identify AI Visibility Opportunities

If ChatGPT-User or Claude-Web are fetching certain pages, those pages are being actively referenced in AI conversations. Double down on that content. If AI bots are ignoring your most important pages, investigate why – they may not be well-linked, may be JavaScript-rendered (invisible to AI bots), or may not be structured in a way that’s useful for AI retrieval.

Control Bandwidth and Server Load

Aggressive crawlers can consume significant bandwidth. If Bytespider or CCBot are hammering your server with thousands of requests per day on content you don’t want them to have, block them. Gorilla AI Log shows you the exact volume so you can make proportionate decisions.

Track AI Crawler Growth Over Time

The daily timeline chart shows you how crawler activity changes over time. As AI adoption accelerates, you should expect AI crawler traffic to grow. Monitoring this trend helps you plan for server capacity, make proactive robots.txt decisions, and understand how your content’s role in the AI ecosystem is evolving.

Validate Fresh Content Indexing

When you publish new content, use Gorilla AI Log to confirm it’s being picked up. Check when Googlebot first crawls it. Check whether AI crawlers discover it. If new pages aren’t getting crawled within a reasonable timeframe, you’ve identified a technical SEO issue that needs fixing.

Feeding Into Gorilla Reports

Gorilla AI Log doesn’t exist in isolation. It feeds directly into Gorilla Reports, our reporting platform, so crawler intelligence becomes a standard part of your ongoing performance reporting.

This means your monthly reports don’t just show rankings, traffic, and conversions – they show you how search engines and AI systems are interacting with your site at server level. You can track crawler trends over time, monitor for new bots appearing, and ensure that crawl health is maintained as part of your broader SEO strategy.

The connection between the two platforms is seamless: the file on your server speaks directly to Gorilla Reports, and the data flows through automatically. No manual exports, no screenshots, no copy-pasting data between tools.

Setup: Two Files, Five Minutes

Getting Gorilla AI Log running on your site is deliberately simple:

1. We upload two files to your cPanel hosting – an HTML dashboard and a PHP API file.

2. The PHP file is configured with a password and automatically detects your server’s log file locations.

3. That’s it. Visit the dashboard URL, enter the password, and you’re looking at your crawler data.

It works exclusively with cPanel hosting environments, which covers the vast majority of shared and managed hosting setups. The tool reads your existing access logs – no additional logging configuration needed. It’s lightweight enough that it won’t impact server performance, and because it’s entirely self-hosted, your log data never leaves your server.

Because cPanel retains historic access logs, you’re not starting from zero. Depending on your host’s retention policy, you may have weeks or months of historic data available immediately.

The Bottom Line

AI-powered search isn’t coming – it’s here. ChatGPT handles billions of queries. Perplexity, Claude, and Gemini are growing rapidly. AI Overviews now appear in over 13% of Google search results. The way people discover and consume content is fundamentally changing, and the bots powering these systems are already on your server.

The question isn’t whether AI crawlers are hitting your site. They are. The question is whether you can see them, understand what they’re doing, and make informed decisions about how your content participates in the AI ecosystem.

Gorilla AI Log gives you that visibility – from the most accurate data source possible: your own server.

Want to see what’s crawling your website? Get in touch with Gorilla Marketing and we’ll get Gorilla AI Log set up on your site.

Kyle Clifford
Kyle has been in search marketing for over 17 years, specialising in technical and on-page SEO. A father of two and a massive rugby fan, Kyle founded Gorilla Marketing in Manchester in 2015.

Related Articles