
WordPress's database character encoding matters more than most site operators realize. Sites still running on the older utf8 (which is actually 3-byte UTF-8) encoding lose data when users post emoji, certain Asian characters, or other 4-byte characters. The data isn't always visibly broken; sometimes it's silently truncated.
The fix is migration to utf8mb4 (true 4-byte UTF-8). WordPress has supported utf8mb4 since 4.2 and uses it by default for new installations. Sites that started on older WordPress versions or that were migrated from other systems might still be on the older encoding.
Run this query against your WordPress database:
SHOW TABLE STATUS WHERE Name LIKE 'wp_%';
The Collation column shows each table's encoding. Tables with collation starting with "utf8_" (like utf8_general_ci) are on the older 3-byte encoding. Tables with collation starting with "utf8mb4_" (like utf8mb4_unicode_520_ci) are on the newer 4-byte encoding.
For a fully utf8mb4 site, all WordPress tables should have utf8mb4_ collation. Mixed collations across tables indicate partial migration.
Emoji characters get truncated or replaced with question marks. A comment with "Great post! 👍" gets stored as "Great post! " (the emoji bytes get dropped).
Some Asian characters that require 4 bytes get truncated. Most common Chinese, Japanese, Korean characters fit in 3 bytes, but a portion of the Unicode Han characters require 4 bytes.
Mathematical symbols, musical symbols, and other less common Unicode get affected similarly.
The breakage is silent. The query succeeds; the data is just incomplete. Site owners often don't realize anything is wrong until a user reports that their comment got truncated.
The migration converts existing tables to utf8mb4. The process is straightforward but should be backed up before starting.
Step 1: backup the database. This is non-negotiable. The migration is irreversible without a backup.
Step 2: verify the database server supports utf8mb4. MySQL 5.5.3+ and MariaDB 5.5+ both support it. If you're on something older, upgrade the database server first.
Step 3: convert each table to utf8mb4. The SQL for each table:
ALTER TABLE wp_posts CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_520_ci;
Repeat for each WordPress table. WP-CLI can automate this:
wp db query "ALTER TABLE wp_posts CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_520_ci;"
Or use a single command to convert all WordPress tables:
wp db query "SELECT GROUP_CONCAT('ALTER TABLE ', table_name, ' CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_520_ci;' SEPARATOR ' ') FROM information_schema.tables WHERE table_schema = DATABASE();"
The output is the SQL to execute; copy it and run via wp db query or phpMyAdmin.
Step 4: update wp-config.php to specify utf8mb4 for new tables:
define('DB_CHARSET', 'utf8mb4');
define('DB_COLLATE', '');
WordPress detects the available collation and uses utf8mb4_unicode_520_ci when available. The empty DB_COLLATE lets WordPress make the right choice.
Step 5: verify the migration. Re-run the SHOW TABLE STATUS query and confirm all tables show utf8mb4_ collation.
Plugins create their own tables, and those tables might still be on utf8 even after WordPress core tables are converted. The same ALTER TABLE syntax applies; convert each plugin table individually.
The list of plugin tables varies by site. After running the conversion on wp_ tables, check for any other tables using SHOW TABLE STATUS and convert those too.
utf8mb4 uses up to 4 bytes per character; utf8 uses up to 3. The storage difference is usually small in practice because most characters in typical content are 1-3 bytes. The 4-byte characters are rare in normal content.
Index storage: utf8mb4 indexes can use more space than utf8 indexes. Some MySQL versions have constraints on index sizes that affect long varchar columns. The fix in newer MySQL versions is innodb_large_prefix=1 (which is default in modern versions).
For sites on older MySQL where the index size is a problem, the workaround is to shorten the indexed columns or to use a hash column. Most modern hosting handles this correctly without intervention.
After conversion, test that 4-byte characters now save correctly. The simplest test: post a comment with an emoji, then view the comment. The emoji should display correctly. Open the database and verify the emoji is stored as the emoji bytes (rather than being replaced or dropped).
For sites with significant non-English content: test with characters known to require 4 bytes. The CJK Unified Ideographs Extension B range (U+20000–U+2A6DF) is good for this.
Sites that haven't migrated yet should do it. The reasons:
1. Future content is more likely to include 4-byte characters. Emoji usage is increasing; Asian-language users use characters that may include 4-byte forms.
2. The migration becomes harder with more accumulated content. A small site migrates in minutes; a large site migrates in hours.
3. The silent data loss continues until migration. Every comment with truncated emoji is a small UX failure.
4. Migration is one-time work. Once done, the issue is permanently resolved.
The investment is 15-60 minutes of focused work depending on site size. The payoff is correct handling of all Unicode going forward.
Quality hosting providers (Kinsta, WP Engine, modern SiteGround) have migrated their WordPress sites to utf8mb4 already. The migration was done as part of normal infrastructure maintenance.
Verify by checking the table collations. If they're already utf8mb4, no work needed. If they're still utf8, the host hasn't migrated and you should do it yourself.
Budget hosts and shared hosts often haven't migrated. The infrastructure investment isn't visible to them and they don't prioritize it.
The utf8mb4 migration is one of those infrastructure tasks that doesn't break anything visible until it does. Sites that haven't migrated might run for years without noticing the issue, until a user posts a comment with an emoji and it disappears.
The migration is mechanical and low-risk. The benefit is correct Unicode handling permanently. The investment is small.
For sites that operate at any scale, this should be a standard infrastructure check rather than an after-the-fact fix when a user reports a problem.
The discipline that prevents recurrence: when migrating WordPress to new hosting, verify the new database is utf8mb4 from the start. When restoring from old backups, verify the restored tables get the right encoding. The check takes one minute; the prevention is worth it.
Site
Tools
We do not sell your email. We do not spam.
© 2026 RevealTheme. All rights reserved.