What is Crawlability?
Remember, Googlebot travels the web by following links on websites it visits. This is called web crawling.
The crawlability of a website is the ease with which Googlebot can browse a site and analyze it. The better your crawlability is, the fewer resources Googlebot needs to expend, and the deeper it will be able to explore your site.
If your crawlability is bad, Googlebot will need a lot of resources and probably won’t make it all the way through.
Determine Your Website’s Crawl Budget
Googlebot has a crawl budget; a limit it sets for crawling your site.
How is this budget determined?
It depends on the quality of your site in the eyes of Google. For example, the higher the quality, and the more you update it with new content, the more Googlebot will be encouraged to spend time indexing your site's updated content.
The crawl budget is a good indicator of the overall quality of your website. The higher the budget, the more Google will value your website. Conversely, the lower it is, the less time Googlebot will spend on your site.
Start Using the Search Console
To access the crawl stats report in the Search Console, select:
Crawl > Crawl Stats.
Three graphs are displayed:
The first is the number of pages crawled per day.
The second is the number of kilobytes downloaded per day.
The third is the time spent downloading a page (in milliseconds).
Here are some general remarks about these graphs:
All the graphs must be smooth: if you see a peak or a sharp dip, then something happened on your website.
The graphs cover the last 90 days. It's a good idea to record and track the averages of all three graphs in an Excel file over several periods.
In general, the graphs are relatively similar. However, you want to focus on the pages crawled per day to measure your crawlability.
1. Pages Crawled per Day
This graph shows you how many pages on your site Googlebot visited each day. This figure should remain relatively constant, or even increase over the months as you add content. If you notice a sudden dip, it's likely because Googlebot can’t access your content.
So, make sure your site is always accessible, that a robots.txt file or a metatag doesn't block your pages, and that there are no glitches.
If your site does not have new content or it's rarely updated, Googlebot may crawl your site less.
2. Kilobytes Downloaded per Day
This graph should closely track with the previous one. The more page crawls Googlebot performs and indexes, the more pages and number of kilobytes downloaded.
This graph corresponds to the size of your website. A site with a lot of content will have a higher number of kilobytes downloaded than a smaller one. The more pages and kilobytes Googlebot downloads, the more Google indexes your site, which is good.
However, keep in mind that it may also be that your pages are too data heavy, which is not a good thing in Google’s opinion. So, you should compare it to the graph showing the time spent downloading a page.
3. Time Spent Downloading a Page
It is necessary to note here that this time is not directly related to your page load speed. This graph shows how much time Googlebot takes to carry out HTTP requests on your pages.
The shorter the download time, the less time it takes Googlebot to crawl your site. By keeping an eye on this and the previous graph, you can detect potential problems. If you notice a drop in the number of kilobytes downloaded but a peak in this graph, it is likely a problem with your server, which means Googlebot takes longer to download less data.
Example
As you can see in the image above, the number of pages scanned is stable and increasing.
On the other hand, downloaded kilobytes peaked at the beginning of January 2019, which doesn't show in the number of crawled pages.
Similarly, there is no significant spike in time spent downloading a page.
This peak is not caused by a big problem, because you would have noticed a recurrence or an impact on the download time of pages.
It could be due to the large addition of content all at once; however, you would have seen a greater impact of pages crawled.
The most likely hypothesis is that some overly large content, such as a video, was added to a page, and then quickly deleted.
Fix Common Errors
Don't let errors accumulate in the Search Console, because they will have a direct impact on the crawlability of your site.
The two main types of errors are:
404 errors: Googlebot crawls a link, but the page in question no longer exists. This can come from a simple typo, or from changing a page's URL without redirecting it first.
500 errors are server errors: Check that they and exceptions don't exist in your code.
Let's Recap!
Crawlability is the ease with which Googlebot crawls your website.
Googlebot has a limited budget to crawl your site, so you need to simplify its job.
You can detect changes in behavior and errors in the Search Console.
Now that you know more about your website’s crawlability, let's see how to check that Googlebot is crawling your entire site by analyzing your server logs.