Title: Character encoding hell
Last modified: August 18, 2016

---

# Character encoding hell

 *  [Husky](https://wordpress.org/support/users/husky/)
 * (@husky)
 * [19 years, 5 months ago](https://wordpress.org/support/topic/character-encoding-hell/)
 * Hello everyone.
 * I’ve been having lots of problems with setting up a new WordPress installation.
   The problem is that i want to switch to utf-8 for character encoding. Unfortunately,
   most of my MySQL database is latin-1 encoded, so i needed to convert my existing
   posts.
 * I found a GNU utility called iconv which can transform encodings, unfortunately
   it was only then that i found out that some parts of my postings are encoding
   in latin-1, and some others are already encoded in utf-8. This might be because
   i switched from another webhost who probably had another default encoding.
 * I’m getting scared of the thought that i have to go manually over all of my 300
   postings to restore any problems with weird characters, so please, if anyone 
   has any suggestions to solve this problem it would make me very happy.

Viewing 12 replies - 1 through 12 (of 12 total)

 *  [moshu](https://wordpress.org/support/users/moshu/)
 * (@moshu)
 * [19 years, 5 months ago](https://wordpress.org/support/topic/character-encoding-hell/#post-496294)
 * Just a thought: if you convert it, would the process affect the posts that already
   are utf-8 encoded? I’d try it anyway – of course, after making a backup copy 
   in case anything goes wrong.
 *  Thread Starter [Husky](https://wordpress.org/support/users/husky/)
 * (@husky)
 * [19 years, 5 months ago](https://wordpress.org/support/topic/character-encoding-hell/#post-496313)
 * Yeah, i tried that. The problem is that encoding utf-8 again makes it double-
   encoded, meaning that every single character is encoded again, making it garbage
   in the process.
 *  [moshu](https://wordpress.org/support/users/moshu/)
 * (@moshu)
 * [19 years, 5 months ago](https://wordpress.org/support/topic/character-encoding-hell/#post-496316)
 * Oh, I see.
    How comes you didn’t notice the encoding mismatch during your previous
   host migration? If it wasn’t OK, all your posts should have shown a lot of garbage
   code…
 *  [Samuel B](https://wordpress.org/support/users/samboll/)
 * (@samboll)
 * [19 years, 5 months ago](https://wordpress.org/support/topic/character-encoding-hell/#post-496320)
 * I’m out of my depth here and admit it. However, I don’t…get it.
    My database 
   is `latin-1` and my blog is `UTF-8` and everything works fine. Should it not 
   be?
 *  Thread Starter [Husky](https://wordpress.org/support/users/husky/)
 * (@husky)
 * [19 years, 5 months ago](https://wordpress.org/support/topic/character-encoding-hell/#post-496324)
 * Moshu, i’m a bit riddled by that as well. It’s just after i exported the SQL 
   file and stepped trough it in an editor that i noticed the differences. Some 
   posts are encoded in UTF-8 (with weird characters), while others are in latin-
   1 (where the accented characters remain ok).
 * Whenever i test it on a local installation of WP and set my character encoding
   to UTF-8, those firsts postings appear correct and the latin-1 encoded ones do
   not, and vice versa. It could also have something to do with a difference in 
   how i set things up locally in MySQL and how things are set up at my host.
 * And samboll, i guess that works fine for most of us (it did for me too), but 
   it seems weird to me that the database is latin-1 and the character encoding 
   is utf-8.. I don’t know, maybe i still don’t know enough about character encodings
   🙂
 *  [moshu](https://wordpress.org/support/users/moshu/)
 * (@moshu)
 * [19 years, 5 months ago](https://wordpress.org/support/topic/character-encoding-hell/#post-496326)
 * Well, in MySQL there are two things that are related to the character set:
    –
   the charset – the connection collation (quite often they are mixed…) Furthermore,
   I have 4 DBs with charset utf-8 (standard setup by host) that says on the entry
   page of phpMyadmin: charset utf-8 and the collation varies, but it is mostly “
   latin1_swedish_ci”. It works with all kind of accented latin characters and even
   non-latin alphabets.
 *  Thread Starter [Husky](https://wordpress.org/support/users/husky/)
 * (@husky)
 * [19 years, 5 months ago](https://wordpress.org/support/topic/character-encoding-hell/#post-496373)
 * Ok. I’m getting a bit confused now 🙂 What is the difference between a ‘collation’
   and a ‘charset’ in MySQL? And which of the two relates to the character encoding
   in the HTML file?
 *  [moshu](https://wordpress.org/support/users/moshu/)
 * (@moshu)
 * [19 years, 5 months ago](https://wordpress.org/support/topic/character-encoding-hell/#post-496377)
 * This page explains it better than I ever could…
    [http://dev.mysql.com/doc/refman/5.0/en/charset-connection.html](http://dev.mysql.com/doc/refman/5.0/en/charset-connection.html)
 *  [Samuel B](https://wordpress.org/support/users/samboll/)
 * (@samboll)
 * [19 years, 5 months ago](https://wordpress.org/support/topic/character-encoding-hell/#post-496382)
 * Thanks for that link, moshu. Now I have a huge headache.
 *  Thread Starter [Husky](https://wordpress.org/support/users/husky/)
 * (@husky)
 * [19 years, 5 months ago](https://wordpress.org/support/topic/character-encoding-hell/#post-496400)
 * Well, i think i fixed the bug. Here’s what i did:
 * 1) change the default collation and character encoding of MySQL to utf8_general_ci
   
   2) Export the whole wp_posts table, remove all references to latin1 (so that 
   the db automatically uses the now default utf8 encoding) 3) Re-import the whole
   thing again (make sure to DROP the table ‘wp_posts’ first)
 * This lead to two problems: the ë (e with an ‘umlaut’) still displayed as ? so
   i did a search-and-replace across all ë characters, converting them to &euml;.
   Furthermore, the text editor i was using (Programmer’s Notepad) had some problems
   with UTF-8 too, so i used MadEdit instead to do the search and replace. Everything
   seems to be working fine now. Thanks for your input!
 *  [vkaryl](https://wordpress.org/support/users/vkaryl/)
 * (@vkaryl)
 * [19 years, 5 months ago](https://wordpress.org/support/topic/character-encoding-hell/#post-496402)
 * Sam, I *think* that if one’s only generally using English for posting, then latin-
   1 in the db and utf-8 in the blog isn’t a problem.
 * Thing I want to know is, why have these later versions of mysql done this with
   the collation? Somewhere back maybe 8 months to a year, the collation was utf-
   8 in the db if the blog was set to utf-8. Then it changed.
 * I’m as likely to see latin1_swedish_ci in the db collation now as anything…. 
   VERY weird, but it works so I guess I shouldn’t complain.
 *  [drmike](https://wordpress.org/support/users/drmike/)
 * (@drmike)
 * [19 years, 1 month ago](https://wordpress.org/support/topic/character-encoding-hell/#post-496513)
 * My turn to raise this issue as I’m seeing in my error logs where folks (well,
   spammers) leaving comments in UTF8 and they being kicked out with errors complaining
   how the charsets don’t match up.
 * Should I be concerned? Granted its spammers having these issues but still I would
   hate to see actual folks getting hit by this error.
 * Thnaks,
    -drmike

Viewing 12 replies - 1 through 12 (of 12 total)

The topic ‘Character encoding hell’ is closed to new replies.

## Tags

 * [character](https://wordpress.org/support/topic-tag/character/)
 * [encoding](https://wordpress.org/support/topic-tag/encoding/)
 * [utf-8](https://wordpress.org/support/topic-tag/utf-8/)

 * In: [Fixing WordPress](https://wordpress.org/support/forum/how-to-and-troubleshooting/)
 * 12 replies
 * 5 participants
 * Last reply from: [drmike](https://wordpress.org/support/users/drmike/)
 * Last activity: [19 years, 1 month ago](https://wordpress.org/support/topic/character-encoding-hell/#post-496513)
 * Status: not resolved

## Topics

### Topics with no replies

### Non-support topics

### Resolved topics

### Unresolved topics

### All topics
