Subscribe to the SEOLAXY YouTube channel to never miss a new lesson and to see our YouTube community posts!​

Get Indexed Faster By Using THIS Sitemap Structure

Sitemaps can speed up content discovery and indexing, but they’re no magic fix. Their real value depends on content quality and a well-structured setup that guides Googlebot to your most important pages.

Icon
Watch Next Manually Control the SEO Keywords You Rank For
Ask a Question (members only) Use the “Ask a Questions” feature to ask questions about this lesson or anything else.
Lesson Timestamps

"Get Indexed Faster By Using THIS Sitemap Structure" Transcript

Sitemaps: Essential Or Outdated?

Do sitemaps really help get content indexed faster in Google? Yes. They can clearly help discover your content almost instantly after publishing it, if the content is worth indexing. To make things clear, if your content is not good enough to be indexed, sitemaps will not help, no matter how they are formatted or submitted. XML sitemaps are popular among SEOs due to a long history of statements from Google’s spokespeople like Matt Cutts, John Mueller, and Danny Sullivan. But today, in most cases, there is no reason to use XML sitemaps. On the contrary, in many cases there are reasons not to use them. Depending on the kind of SEO you are responsible for, sitemaps play different roles. For news SEO they are essential, for Ecommerce SEO they are important, but for local SEO they are less important. So let us remind ourselves what sitemaps are for.

What Sitemaps Are For?

Sitemaps act as a direct line of communication with search engines like Google, significantly accelerating the process of getting your website’s pages discovered and indexed. By providing a clear and organized map of your content, you eliminate the guesswork for Google’s web crawlers, ensuring they can efficiently find and understand all the pages you want to appear in search results. Think of your website as a new city and Google’s crawlers as tourists. Without a map, the tourists might wander aimlessly, potentially missing important landmarks. A sitemap is that detailed map, guiding them directly to all the must see locations on your site.

In other words, if a sitemap exists, Google will first crawl those URLs until there is no crawler budget left. Sometimes, when they have gone through all sitemaps, there is some crawler budget left and they go to the homepage and crawl further. But often there is no crawler budget left. Let’s say you launch a new online store with 100,000 products on a new domain. In that case you usually have a crawler budget of 10,000 URLs, so there is no chance that Googlebot crawls the complete online store at once. Sure, they will raise the crawler budget, first to 30,000, then to 50,000 after each next visit, and so on, but every website has a crawler budget limit. So the mission of an SEO is to make sure that Googlebot crawls the most important URLs and hopefully indexes them.

What are the most important pages of an online store? For most online stores, categories and subcategories are the most important, followed by category filter combination URLs, blog articles, and then product URLs. Least important are legal, contact, and similar pages with no real ranking intention. A big online store might have thousands of categories and subcategories, tens of thousands of category filter combination URLs, hundreds of blog articles, and hundreds of thousands of product URLs. So if Googlebot starts by crawling the product URLs first, it will not even crawl all product URLs and the online store will run out of crawler budget. We need to make sure that we control the crawling order. We use the main sitemap, also known as the mother of all sitemaps, and link it only from the robots.txt file, for example like this:

Controlling the Crawl: The Sitemap Order Strategy

So before I address why using TXT sitemaps is better, let’s take a look at the structure and the order in which the sitemaps are listed. Googlebot crawls from top to bottom, and that is true for sitemaps and for URLs inside the sitemaps. By ordering them by the priority they have for you, you make sure they get crawled. We have here eight sitemaps with products, the first six sitemaps each with 15,000 product URLs and the last one with 10,000 URLs. In total, 100,000 URLs. So we know with the first crawl there is no chance to get everything crawled, but we make sure the most important URLs are crawled and hopefully indexed. The next time Googlebot comes, it will crawl first the ones it has not crawled before. So soon everything is going to be crawled, so we should change the sitemap structure to this:

We want new products to be discovered as soon as possible. Some even create an out of stock sitemap and put it at the end, including products that are currently out of stock, not to waste crawler budget on them, because if they are out of stock, they will not rank well, sometimes even not be indexed, so keeping them out of the way is not a bad idea at all.

TXT vs. XML Sitemaps

Okay, now let us address why TXT and not XML sitemaps. But before that, let us remind ourselves that a third type of sitemap also exists, the HTML sitemap. However, that one is rarely used and hard to control, besides the fact that you cannot submit it to Google Search Console. So the main place we should link our main sitemap is the robots.txt file. We should not submit the main sitemap to Google Search Console. Why? Because you will have only one report about how well indexing is working for your website. It is far better to submit each sitemap separately, so you will have a better overview of the indexing process.

Once Google has your sitemap URL, it opens it and, as already described, Google’s crawlers are sent to the URLs from top to bottom. So if you have an XML sitemap or a TXT sitemap, the same will happen. In the XML file we have additional metadata like changefreq (meaning change frequency), lastmod (meaning last modified), and priority. But they are just noise, they do not matter. Google will do the same anyway, opening the URLs from top to bottom.

A few years ago Google tried to take the lastmod metadata in XML sitemaps seriously, but they noticed that websites did not know how to handle it and others misused it. Some websites didn’t update the date, and others updated it even when no change had been made. Change frequency and priority were never important for Google. So today, for ecommerce SEO, having an XML sitemap has no advantage, but also no disadvantage, besides being a bit harder to generate. Most online stores make this topic too easy for themselves and let some plugins generate sitemaps that do not have the right order and often include URLs that should be skipped, like all sorts of pagination URLs and other URLs set to noindex.

Why Google Ignores Your Sitemaps

If you want to see in what order Googlebot crawls your website, you need to analyze your log files manually or use a tool like JetOctopus, where you can connect your Google Search Console and server log files and see exactly how Googlebot crawls your website and what does and does not get indexed. With that kind of overview, you can also find out more easily why Google does not index certain URLs and, even more important, which URLs are crawled at all and why. I recommend investing time in creating smart sitemaps and putting irrelevant URLs in separate sitemaps and putting them at the end. Also, inside the sitemaps, for example the categories sitemap, put URLs with important changes like those with many new products at the beginning, like new category filter URLs.

Those changes look minor, like using the TXT sitemap, but they all signal to Google that you care about important changes and would like to make the job for Googlebot easier. And if that is the case, Google will most likely use your sitemaps more often to crawl. On the contrary, if your sitemaps don’t make sense, like listing URLs that are set to noindex or have a canonical to another URL, Google will start to ignore your sitemaps more and more, so you will lose control over them and Google will have to crawl beginning with your homepage, and that will result in multiple problems, especially with indexing. So if your indexing is slow, first check if your content is good enough and then if your sitemaps are well organized. If you liked this Ecommerce SEO tip, you will surely like this one.