Google gives some clarification on the crawl limit of 15 MB of HTML code – SEO & Engine News

It is amazing to see how an ultra-innocuous ad from Google (the crawl limit of an HTML file at 15 MB) can cause a stir and endless discussions in an ultimately very sterile way (knowing that the average size of ‘an HTML file on the Web is 30 KB, that leaves the margin…). So much so that Google’s teams had to defend themselves with a press release to clarify a number of things…

On Monday, we were talking to you about the 15 MB limit (against 10 MB previously) of the HTML code crawled by Googlebot. Oddly enough, what has become information of little interest (which site has web pages offers an HTML code of more than 15 MB??) has nevertheless created a kind of mini-tsunami on the Web and the SEO community, as if this limit was discovered (it has existed for many years and was even more restrictive at the time; some SEOs do not read Abundance every morning ๐Ÿ˜€ ) and very restrictive (while, of course, it is not in practice). Google further states: This threshold is not new; it has been around for many years. We just added it to our documentation because it might be useful to some people when debugging, and because it will rarely change. ยป

The improved noise – certainly due to a misunderstanding of the information provided – was loud enough for Google teams to publish a post offering some clarification on this 15MB HTML code size limit, in the form of an explanatory FAQ Here is (taken and translated from the original document):

What does this 15MB threshold mean?
This limit only applies to the bytes (content) received during the initial request from Googlebot, not to the resources referenced in the page. For example, when you open https://example.com/puppies.html, your browser first downloads the bytes of the HTML file and, based on those bytes, it can make further requests to external JavaScript, d ‘images or any other resource referenced by a URL in the HTML. Googlebot does the same. The 15MB limit applies to HTML code only.

What does this 15MB limit mean for my site?
Most likely nothing. There are very few pages on the internet that are larger in size. You, dear reader, are unlikely to own one, since the median size of an HTML file is about 500 times smaller: 30 kilobytes (kb). However, if you are the owner of an HTML page over 15MB, you could at least move some inline scripts and CSS dust to external files, please.

What happens to content over 15MB?
Content after the first 15 months is dropped by Googlebot, and only the first 15 months are passed on for indexing.

What types of content does the 15MB limit apply to?
The 15MB limit applies to searches performed by Googlebot (Googlebot Smartphone and Googlebot Desktop) when retrieving file types supported by Google Search.

Does this mean Googlebot is not seeing my image or video?
No. Googlebot fetches videos and images that are referenced in the HTML code with a URL (for example, cute puppy looking very disappointed separately with consecutive accesses.

Do data URIs increase HTML file size?
Yes. The use of data URIs contributes to the size of the HTML file since they are in the HTML file.

How can I check the size of a page?
There are several ways, but the easiest is probably to use your own browser and its developer tools. Load the page as you normally would, then launch the developer tools and switch to the Network tab. Reload the page, and you should see all the requests your browser had to make to render the page. The top query is the one you’re looking for, with the size in bytes of the page in the Size column.

For example, in Chrome’s dev tools it might look like this,
with 150 KB in the “Size” column. Source: Google

Well, that way it’s clear and we’ll be able to move on to more interesting and effective things in SEO… ๐Ÿ™‚

Leave a Comment