
WordPress sites with more than 1,000 URLs typically use a sitemap index rather than a single sitemap file. The index references multiple sub-sitemaps, each containing up to 50,000 URLs. The structure is technically required at high URL counts and operationally helpful at moderate counts.
The sitemap index pattern is standard but the implementation details matter for crawl efficiency. Sites that get the index right use crawl budget effectively; sites that don't waste it on sub-sitemaps with stale lastmod or unimportant URLs.
A sitemap index is an XML file that lists other sitemap files. The structure:
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>https://example.com/post-sitemap.xml</loc>
<lastmod>2026-05-08</lastmod>
</sitemap>
<sitemap>
<loc>https://example.com/page-sitemap.xml</loc>
<lastmod>2026-04-15</lastmod>
</sitemap>
...
</sitemapindex>
Each sub-sitemap is itself a standard sitemap file containing URL entries. The structure is hierarchical: index → sub-sitemaps → URLs.
Yoast SEO and Rank Math both generate sitemap indexes automatically. The typical structure has separate sub-sitemaps for: posts (often paginated if there are many), pages, categories, tags, custom post types, authors (if author archives are public), and any other content types registered.
The plugins update the index dynamically as content changes. New posts trigger updates to the relevant sub-sitemap and to its lastmod entry in the index.
The lastmod field in the sitemap index tells search engines when each sub-sitemap was last updated. A correctly-maintained lastmod helps Google prioritize which sub-sitemaps to re-crawl.
The common failure: the lastmod for the post sitemap shows yesterday, but actually only a tag URL was updated within that sitemap. The lastmod is technically correct but the post sub-sitemap didn't have new posts. Google wastes crawl budget checking for new posts that aren't there.
The discipline: lastmod should reflect meaningful content changes, not just any database write. Plugin updates that don't change post content shouldn't update lastmod. Author bio changes shouldn't update lastmod on the post sitemap.
Most SEO plugins handle lastmod conservatively (updating only on real content changes) but some don't. Verify by checking the lastmod values over time; they should be stable except when content actually changes.
The sub-sitemaps should include URLs that you want Google to crawl and index. Specifically:
Exclude:
SEO plugins offer settings to control inclusion. The defaults are usually reasonable; verify they match what you want.
Submit the sitemap index URL (not the individual sub-sitemaps) in Google Search Console. The submission tells Google where to start. Google discovers sub-sitemaps automatically through the index.
Search Console's Sitemap report shows: when each sitemap was last fetched, how many URLs it contains, how many URLs are indexed from it. The report is the diagnostic for sitemap-related issues.
The submission isn't strictly required (Google can discover the sitemap from robots.txt) but it gives you the Search Console report which is useful for monitoring.
robots.txt should reference the sitemap location:
Sitemap: https://example.com/sitemap.xml
The reference helps Bing and other search engines that don't have the same Search Console submission. The reference also helps when Google's crawler doesn't have the cached sitemap location.
WordPress with Yoast or Rank Math sets this automatically. Verify by visiting yoursite.com/robots.txt and checking that the sitemap line is present.
After initial setup, verify the sitemap is working correctly:
1. Visit the sitemap index URL directly. The XML should load and parse correctly.
2. Click through to each sub-sitemap. Verify the URLs are correct and that important content is included.
3. Spot-check 5-10 URLs from the sitemap. They should be the canonical URLs (with HTTPS, with the correct www/non-www pattern, without tracking parameters).
4. Check Search Console's Sitemap report. The URLs reported as discovered should approximately match the URLs in the sitemap.
5. Check Search Console's Coverage report. The indexed URLs should approximately match the sitemap URLs minus excluded ones.
Sub-sitemaps that include excluded URLs. The plugin's defaults don't always match your exclusion preferences. Configure SEO plugin settings to exclude author archives, date archives, and other categories you don't want indexed.
Sub-sitemaps that exclude URLs you want indexed. Sometimes custom post types are excluded by default and need to be enabled in the SEO plugin settings.
Stale lastmod values. Some plugins update lastmod whenever any post is touched, even when the content didn't change. The pattern dilutes the signal.
Massive sub-sitemaps with 40,000+ URLs. Approaching the 50,000 URL limit. Splitting into smaller sub-sitemaps (post-sitemap-1.xml, post-sitemap-2.xml) is required at the limit and helpful before reaching it.
The sitemap is a hint to search engines about which URLs you consider important. A sitemap that contains 50,000 URLs but only 2,000 are actually important wastes signal. A sitemap that contains 2,000 carefully-chosen URLs is stronger.
The discipline: prune the sitemap as the site evolves. Old URLs that no longer have value (out-of-date articles you don't intend to update, thin pages that exist for legacy reasons) can be excluded from the sitemap.
The exclusion doesn't deindex the URLs; it just removes them from the "I consider these important" list. URLs not in the sitemap might still be crawled and indexed, just with less prioritization.
The strategic sitemap is curated rather than complete. The signal is stronger when it represents the site's best content rather than every URL that exists.
The sitemap is one of the most important SEO signals for sites with significant content volume. The default behavior of WordPress SEO plugins is usually adequate but worth verifying.
The 15 minutes of attention pays off across the site's lifetime. The sitemap that's set up correctly continues to send the right signals; the sitemap that's misconfigured continues to send wrong signals indefinitely.
The patterns that produce results: regular verification, lastmod discipline, exclusion of low-value URLs, submission in Search Console, monitoring of the Search Console reports. The work is small per period; the cumulative effect is significant.
Site
Tools
We do not sell your email. We do not spam.
© 2026 RevealTheme. All rights reserved.