How to Fix Crawl Errors and Improve Your Site’s Indexability

Home / SEO News / How to Fix Crawl Errors and Improve Your Site’s Indexability
David Galvin
17 March 2025
Read Time: 12 Minutes
Article Summary

Crawl errors prevent Google from accessing and indexing your pages properly, directly impacting search visibility. This guide covers how to find, prioritise, and fix errors using Google Search Console.

Key Takeaways

Crawl errors happen when a search engine tries to access a page on your site and can’t. Sometimes the page doesn’t exist. Sometimes the server chokes. Sometimes you’ve accidentally told Google not to look at it. Whatever the cause, the result is the same: if Google can’t crawl a page, it can’t index it. And if it can’t index it, that page won’t rank for anything.

The good news is that most crawl errors follow predictable patterns and have straightforward fixes. The bad news is that most guides on this topic treat every error the same – a flat list of problems without any sense of what to panic about and what to ignore. We’ll take a different approach here. This guide groups errors by severity so you can focus on what’s actually blocking your visibility, not chase every warning in Google Search Console.

What Are Crawl Errors?

A crawl error is any issue that prevents Googlebot (or another search engine bot) from successfully fetching a URL on your site. When Googlebot requests a page and gets anything other than a clean 200 OK response – or gets blocked before it can even make the request – that’s logged as a crawl error.

Crawlability and indexability are related but distinct. Crawlability is whether a search engine can access a page at all. Indexability is whether it’s allowed to add that page to its index after crawling it. You can have a perfectly crawlable page that’s set to noindex, meaning Google can see it but won’t list it. And you can have a page that should be indexable but isn’t crawlable because something’s blocking access.

Both matter. But crawlability comes first. If Googlebot can’t reach the page, nothing else you’ve done – your content, your keywords, your internal links – makes any difference.

Where to Find Crawl Errors

Crawl Errors

Google Search Console is the primary tool for identifying crawl errors on your site, and it’s free. Two reports matter most:

The Page Indexing report (under “Indexing” in the left navigation) shows you every URL Google knows about and its current status. You’ll see categories like “Crawled – currently not indexed”, “Not found (404)”, “Blocked by robots.txt”, “Excluded by noindex tag”, and others. Each category includes a count and a list of affected URLs. This is your starting point for understanding what’s happening across the site.

The Crawl Stats report (under Settings > Crawl stats) gives you a higher-level view of how Googlebot is interacting with your site. It shows crawl rate, response times, and the breakdown of response codes. If you’re seeing a spike in 5xx errors or a drop in crawl rate, this is where you’ll spot it.

Beyond Search Console, tools like Screaming Frog let you run your own site crawl to find errors before Google does. Screaming Frog mimics how a search engine crawls your site, flagging broken links, redirect chains, missing pages, and server errors in a single sweep. It’s particularly useful for catching problems after a site migration or a large content update.

For larger sites with complex crawl patterns, log file analysis is worth considering. Server logs show you exactly which URLs Googlebot requested, when, and what response it got. This reveals what Google is actually crawling versus what you think it’s crawling – and the two don’t always match.

The Severity Framework: What to Fix First

Crawl Errors

This is where most guides fall short. Not all crawl errors are equal. A soft 404 on a page that was deleted two years ago isn’t the same as a robots.txt rule blocking your entire product category. You need a way to triage.

We group crawl errors into three tiers:

Tier 1: Blocking Indexing Entirely

These errors prevent pages from appearing in search results at all. Fix them first.

Robots.txt misconfiguration blocking important pages or sections

Noindex tags on pages that should be indexed

5xx server errors on key landing pages

Canonical tags pointing to the wrong URL

“Crawled – currently not indexed” on high-value pages

If a page you need ranking is affected by any of these, that’s your priority. Every day it sits unfixed is a day that page generates zero organic traffic.

Tier 2: Degrading Crawl Efficiency

These don’t block individual pages outright, but they waste Googlebot’s time and attention. On smaller sites, the impact is minimal. On larger sites with thousands of pages, they compound quickly.

Redirect chains and loops (multiple hops between URLs)

Soft 404s (pages that return a 200 status but show error content)

Excessive URL parameters generating duplicate or near-duplicate pages

Orphan pages with no internal links pointing to them

Crawl budget waste on low-value pages

Tier 3: Housekeeping

Worth tidying up, but they’re not hurting your rankings in any meaningful way.

404 errors on old, low-value pages that nobody links to

Blocked resources that don’t affect rendering

Warnings in Search Console that don’t map to actual indexing problems

Start at Tier 1 and work down. It’s tempting to chase a clean Search Console report, but the goal isn’t zero errors – it’s zero errors that matter.

404 Errors and Soft 404s

Standard 404 errors happen when a URL returns a “not found” response. The page doesn’t exist, and the server correctly says so. On their own, 404s aren’t a ranking problem. Google expects them. Sites change, pages get deleted, URLs get restructured. A 404 on a page you deliberately removed is working as intended.

Where 404s become a problem:

The deleted page had backlinks. External sites linking to a page that now 404s means you’re losing that link equity. Set up a 301 redirect to the most relevant alternative page.

Internal links still point to the 404. Every internal link to a dead page is a wasted crawl and a dead end for users. Update or remove these links.

The page was high-traffic. If a page that was pulling in organic visits suddenly 404s, you’ve got an immediate traffic problem. Either restore it or redirect it.

Soft 404s are trickier. A soft 404 is when a page returns a 200 OK status code but the content is effectively an error page – “no results found”, an empty product listing, or a generic “this page doesn’t exist” message that wasn’t properly configured. Google identifies these and flags them because they waste crawl resources without providing any value.

To fix soft 404s, either return a proper 404 status code for pages that don’t exist, or add real content to pages that should exist. Search Console’s Page Indexing report lists every URL flagged as a soft 404, so you can audit them in bulk.

5xx Server Errors

A 5xx error means the server failed to handle Googlebot’s request. The most common are 500 (Internal Server Error), 502 (Bad Gateway), and 503 (Service Unavailable). These are Tier 1 problems when they affect important pages.

Occasional 5xx errors happen to every site. Servers hiccup. But if Googlebot consistently gets 5xx responses when trying to crawl your key pages, it will eventually stop trying. Google reduces its crawl rate for sites that return a lot of server errors, and pages that remain inaccessible for long enough get dropped from the index entirely.

Common causes:

Overloaded hosting. Shared hosting or under-provisioned servers buckling under traffic – including bot traffic. If Googlebot’s crawl rate is enough to strain your server, that’s a hosting problem, not a Google problem.

Broken server-side code. PHP errors, database connection failures, or misconfigured server settings. These need dev attention.

Intermittent outages. CDN issues, DNS problems, or maintenance windows that happen to coincide with crawl activity.

Check the Crawl Stats report in Search Console. If you’re seeing a consistent pattern of 5xx responses, your server team needs to investigate. For intermittent issues, your server access logs will show exactly when failures occur and which URLs are affected.

Robots.txt Misconfiguration

Your robots.txt file tells search engines which parts of your site they’re allowed to crawl. Get it wrong and you can accidentally block Googlebot from entire sections of your site. We’ve got a dedicated article on robots.txt that covers the full syntax and best practices, so we’ll keep this brief.

The most common mistakes: blocking your CSS and JavaScript files (which prevents Google from rendering your pages properly), using overly broad disallow rules that catch pages you didn’t intend to block, and forgetting to update robots.txt after a site migration or URL restructure. You can test your robots.txt using the URL Inspection tool in Search Console to check whether specific URLs are blocked.

XML Sitemap Issues

Your XML sitemap is a roadmap for search engines – it tells them which pages exist and which ones you consider important. Sitemap problems don’t directly cause crawl errors, but they contribute to indexing issues by sending Googlebot confusing signals. We’ll cover XML sitemaps in depth in a separate article, but the key mistakes to watch for are listing URLs that return 404s or redirects, including noindexed pages, and failing to update the sitemap when you add or remove content.

Redirect Chains and Loops

A redirect chain happens when URL A redirects to URL B, which redirects to URL C, which finally reaches the destination. Each hop adds latency and dilutes link equity. Google will follow up to about 10 redirects before giving up, but even two or three hops create unnecessary friction.

A redirect loop is worse: URL A redirects to URL B, which redirects back to URL A. Googlebot hits the loop, gives up, and the page doesn’t get indexed.

Both problems typically accumulate over time. A site migration creates redirects. Then another migration redirects the redirects. Then someone changes the URL structure again. Before long, you’ve got chains three or four hops deep.

The fix is straightforward: audit your redirects and flatten every chain so each old URL points directly to the final destination with a single 301 redirect. Screaming Frog makes this easy – run a crawl, filter by redirect chains, and you’ll see exactly where the hops are.

Canonical Tag Problems

Canonical tags tell Google which version of a page is the “main” one when duplicate or near-duplicate versions exist. When they’re wrong, Google either ignores your preference or indexes the wrong version. We’ve got a full article on canonical tags coming, so here’s the short version: make sure every indexable page has a self-referencing canonical, check that canonicals don’t point to noindexed or 404’d URLs, and watch for CMS plugins that auto-generate canonicals incorrectly.

Noindex Tags in the Wrong Places

A noindex meta tag or X-Robots-Tag header tells Google not to index a page. Perfectly useful when applied deliberately to admin pages, staging content, or filtered listing views. Less useful when someone adds it to your money pages by accident.

This happens more often than you’d think. A developer adds noindex to the staging environment (correct), the staging site gets pushed to production (happens), and suddenly your entire site is deindexed. Or a CMS setting gets toggled during a redesign and nobody notices for weeks.

Check for noindex issues in the Page Indexing report under “Excluded by noindex tag.” If any pages there should be indexed, remove the tag and request reindexing through the URL Inspection tool. WordPress sites are particularly prone to this – there’s a “Discourage search engines from indexing this site” checkbox in Settings > Reading that applies a site-wide noindex. One accidental tick and you’re invisible.

Internal Linking and Orphan Pages

An orphan page is one that has no internal links pointing to it. From Googlebot’s perspective, if there’s no link to follow, the page might as well not exist. Googlebot discovers pages primarily by following links from other pages it already knows about. A page with no inbound internal links relies entirely on the XML sitemap or direct backlinks for discovery, which isn’t reliable.

Strong internal linking does two things for crawlability. First, it helps Googlebot discover and reach all your important pages. Second, it distributes PageRank (link equity) across your site, which influences how often Google crawls each page and how it prioritises them.

To find orphan pages, run a Screaming Frog crawl and cross-reference the pages it finds (by following links) with the pages in your sitemap or CMS. Any page in your sitemap that Screaming Frog didn’t discover through links is effectively orphaned. Fix it by adding relevant internal links from related content.

Crawl Budget: Does It Matter for Your Site?

Crawl budget is the number of pages Googlebot will crawl on your site within a given timeframe. It’s determined by two things: crawl rate (how many requests per second your server can handle without degrading user experience) and crawl demand (how much Google wants to crawl your site based on its popularity and freshness).

Here’s the honest answer on crawl budget: if your site has fewer than a few thousand pages, it’s probably not something you need to worry about. Google will get to all of them. Crawl budget becomes a genuine concern for larger sites – e-commerce stores with tens of thousands of product pages, news sites publishing dozens of articles daily, or any site with a significant number of parameterised URLs generating near-infinite URL variations.

If crawl budget is relevant to you, the priority is reducing waste. Don’t let Googlebot spend its time on faceted navigation pages, session ID URLs, or internal search results. Block these in robots.txt or handle them with proper canonical tags so Google focuses its crawling on the pages that actually matter.

JavaScript Rendering and Crawlability

Modern websites increasingly rely on JavaScript to render content. That’s fine for users with browsers that execute JavaScript instantly, but it creates a two-step process for Googlebot. First, Googlebot fetches the HTML. Then it queues the page for rendering, which means executing the JavaScript to see the final content. That second step takes additional resources and time.

Google has confirmed that it renders JavaScript, but there’s a delay between the initial crawl and the render pass. Content that only exists after JavaScript execution may take longer to get indexed. And if the JavaScript fails – due to errors, blocked resources, or timeouts – Googlebot sees an empty or incomplete page.

For technical SEO purposes, the safest approach is server-side rendering or static generation for content you need indexed quickly. If your site is built on a JavaScript framework (React, Angular, Vue), test how Googlebot sees your pages using the URL Inspection tool’s “View Crawled Page” feature. If critical content is missing from the rendered HTML, that’s a rendering problem that needs solving.

Mobile-First Indexing and Crawl Errors

Google predominantly uses the mobile version of your site for indexing and ranking. That means crawl errors on your mobile site matter more than crawl errors on desktop. If Googlebot can access a page on desktop but not on mobile – because of a different robots.txt, different URL structure, or mobile-specific server issues – the mobile version is the one that counts.

Check your mobile crawl errors separately. If you’re on a responsive site (same URLs, same content, layout adjusts), this is less of a concern. But if you have a separate mobile site (m.example.com) or serve different content based on user agent, make sure the mobile version is fully crawlable and matches the desktop version in terms of content.

“Crawled – Currently Not Indexed”: What It Means

This is one of the most frustrating statuses in Search Console. Google found your page, crawled it successfully, and then decided not to index it. No technical error. No blocking directive. Google just didn’t think the page was worth adding to its index.

This can mean several things:

The content is thin or duplicative. Google didn’t find enough unique value to justify indexing it.

The page has low authority signals. Few or no internal links, no external backlinks, minimal engagement signals.

The site has quality issues overall. If Google perceives broader quality problems with your domain, it may be more selective about which pages it indexes.

Google simply hasn’t got around to it. For newer pages or sites with lower crawl demand, it can take time.

The fix depends on the cause. If the content is thin, improve it. If the page is orphaned, add internal links. If it’s genuinely valuable content that Google is overlooking, building some external links to it or linking to it prominently from high-authority pages on your own site can help. There’s no button to force indexing – you can request it through the URL Inspection tool, but Google makes its own decision.

Putting It All Together

Fixing crawl errors isn’t a one-off task. Sites change, content gets added and removed, plugins get updated, and new errors appear. Build a quarterly crawl health check into your routine at minimum – a Screaming Frog crawl, a review of the Page Indexing report, and a glance at crawl stats.

Focus your effort where it counts. Use the severity tiers to prioritise: fix anything blocking indexing on important pages first, then work on crawl efficiency, then tidy up the rest. A site with a few old 404s isn’t broken. A site with noindex tags on its product pages is.

If you’re looking at a Search Console full of errors and aren’t sure where to start, that’s exactly what a technical SEO audit covers. At Gorilla Marketing, crawl health and indexability diagnostics are built into every audit we run. Senior strategists translate the data into prioritised, plain-English recommendations your team can act on – not just a spreadsheet of URLs. If your pages aren’t getting indexed and you want to know why, talk to us about an SEO audit.

David Galvin
David has been in search marketing for over 8 years, specialising in technical SEO. He focuses on the technical foundations that impact visibility, including site structure, performance, and tracking. With a solid technical grounding and hands-on experience across Linux, PHP, JavaScript, and CSS, he works to identify and resolve the issues that genuinely hold websites back. If he’s not in front of a laptop, you’ll usually find him hiking up a mountain or visiting his son in Dublin.

Related Articles