RevealTheme logo
Back to Blog

WordPress XML Sitemaps: What Actually Belongs Inside

WordPress XML Sitemaps: What Actually Belongs Inside
The RevealTheme Team

By

··Updated May 27, 2026·3 min read

Ask ten WordPress site owners what their XML sitemap contains and most will shrug. The file gets generated automatically, submitted to Google Search Console once, and never opened again. That is a mistake, because the sitemap is not a passive technical artifact. It is an editorial statement: a list of the URLs you are formally nominating for indexing. Padding it with junk dilutes that statement. This article is about deciding, URL type by URL type, what earns a place inside.

First, know which sitemap you are actually shipping

Since WordPress 5.5, core ships its own sitemap at /wp-sitemap.xml with no plugin required. It is an index file that links out to paginated sub-sitemaps, capped at 2,000 URLs each. By default, core includes a surprisingly broad set: posts, pages, every public custom post type, categories, tags, every public custom taxonomy, and author (user) archives. As of WordPress 6.5, core also emits accurate lastmod dates, a feature deliberately left out in 5.5 and added back once Google and Bing clarified they actually use the signal. Core intentionally does not emit priority or changefreq, because search engines ignore both.

The catch: most serious sites run an SEO plugin, and Yoast SEO, Rank Math, All in One SEO, and SEOPress each generate their own sitemap (usually /sitemap_index.xml or /sitemap.xml) and disable the core one. So before you tune anything, open both URLs in a browser and see which is live. If you have a plugin active, the plugin's sitemap is your source of truth and the core file should be returning the plugin's version or a 404. Running two competing sitemaps is the single most common misconfiguration I see, and it sends search engines mixed signals about which URL inventory to trust.

The decision rule that settles every case

You do not need a memorized list of allowed URL types. You need one rule, applied consistently:

A URL belongs in your sitemap if, and only if, it is the canonical, indexable version of a page you would be genuinely glad to rank in Google.

That single sentence carries three constraints. Canonical means the URL must point to itself with rel="canonical"; never list a URL that canonicalizes elsewhere. Indexable means it must not carry a noindex directive. Listing a noindex page in your sitemap is a self-contradiction: you are telling Google "crawl this, it matters" and "do not index this" in the same breath. Google resolves the conflict by distrusting your sitemap a little more. Glad to rank is the editorial filter. If you would be embarrassed to have a URL appear as a search result for a real query, it does not belong, regardless of whether a plugin includes it by default.

Running the contested URL types through the rule

Posts and pages: almost always yes

Published posts and static pages (homepage, about, contact, services, product, pillar guides) are the core of any sitemap. The only posts to exclude are ones you have deliberately set to noindex, such as thin announcement stubs or thank-you pages behind a form. Drafts, private posts, and trashed content are never included by any competent generator.

Category and tag archives: it depends, and tags usually fail

This is where defaults and good judgment diverge. A taxonomy archive earns inclusion only when the archive page is something a searcher would want to land on. A category like "Email Marketing" with a written intro paragraph, a curated description, and a coherent set of posts can be a legitimate ranking target; include it. A tag archive that exists because someone once typed #tuesday into the tag box is a thin, auto-generated list with no editorial value. The practical move on most blogs is to noindex tag archives and let them fall out of the sitemap automatically. In Yoast, this is Search Appearance > Taxonomies > Tags > Show in search results: Off; Rank Math exposes the same toggle under Titles & Meta. Because a noindexed URL fails the indexable constraint, the sitemap exclusion follows for free, which is exactly how the two settings should be wired together.

Author archives: usually no

Core includes author archives by default, and on a single-author blog they are pure duplication of your post feed. Unless you run a genuine multi-author publication where individual writers have their own followings and bylines worth surfacing, set author archives to noindex and drop them from the sitemap. WooCommerce and membership sites should be especially careful here, since user-archive URLs can inadvertently expose account-style pages.

Attachment pages: never

WordPress historically created a standalone page for every uploaded image (/?attachment_id=4523 or a pretty permalink). These pages contain a single image and no context, and they are a classic source of low-quality indexed URLs. Modern Yoast and Rank Math redirect attachment URLs to the parent post by default; verify the setting is on and confirm attachment URLs are absent from your sitemap.

Paginated archives, search results, and filter URLs: never

Archive pagination (/page/2/, /category/news/page/3/) does not belong in the sitemap. Its contents shift every time you publish, so any lastmod you attach is immediately stale, and Google discovers your deep posts through the post sitemap and internal links anyway. Internal search-result pages (?s=keyword) and faceted filter URLs (?color=blue&size=large) should be excluded as a category; they are effectively infinite, near-duplicate, and a crawl-budget sink.

Specialized sitemaps: image, video, and news

"What belongs inside" has a second answer for sites with rich media. An image sitemap (image entries attached to the page that contains them, not as standalone URLs) helps Google Images surface photography, product shots, and infographics; Yoast and Rank Math add image references to the relevant page entries automatically. A video sitemap uses dedicated <video:> tags with a thumbnail, title, and duration, and is worth configuring if self-hosted or embedded video is central to your content. A Google News sitemap is a different beast entirely: it lists only articles from the last 48 hours and is restricted to sites accepted into Google News. Do not bolt one on speculatively. The rule still holds in each case, you are just describing canonical, indexable assets in the format Google expects for that asset type.

The operational layer most people skip

The sitemaps.org and Google specification caps a single sitemap file at 50,000 URLs or 50MB uncompressed. Past that you need a sitemap index splitting content into sub-files, which every WordPress generator handles automatically (core at 2,000 per file, plugins typically configurable). You rarely need to touch this, but you should know it exists when a large store hits the ceiling.

Two finishing checks make the sitemap actually useful rather than merely present. First, add a Sitemap: directive pointing to your index URL in robots.txt, and submit the same URL in Google Search Console under Sitemaps. Second, use the GSC Pages report as your audit instrument. If a large share of your "Crawled, currently not indexed" or "Discovered, currently not indexed" URLs are tag or author archives, your sitemap is actively steering Google toward pages it does not value, and the fixes above are the remedy. A clean sitemap will not magically lift rankings, but a noisy one quietly erodes the trust that makes every other SEO signal worth more.