Mysql and UTF-8 hell

I was trying to dump some UTF-8 data from a mysql database and import it on another machine. Getting it right took almost a day for me.

It was hard to figure out who was screwing up. Was it mysql, php, php-mysql, server or the client, dumping program or something else?

After having solved the problem, I still don’t totally understand what went wrong. But blogging it here in the hope that someone will find it someday and find it useful.

Machine which was exporting had:

mysql> show variables like '%char%';
+--------------------------+----------------------------------+
| Variable_name            | Value                            |
+--------------------------+----------------------------------+
| character_set_client     | latin1                           |
| character_set_connection | latin1                           |
| character_set_database   | utf8                             |
| character_set_results    | latin1                           |
| character_set_server     | latin1                           |
| character_set_system     | utf8                             |
| character_sets_dir       | /usr/local/share/mysql/charsets/ |
+--------------------------+----------------------------------+

Machine which was importing had:

mysql> show variables like '%char%';
+--------------------------+----------------------------------+
| Variable_name            | Value                            |
+--------------------------+----------------------------------+
| character_set_client     | utf8                             |
| character_set_connection | utf8                             |
| character_set_database   | utf8                             |
| character_set_filesystem | binary                           |
| character_set_results    | utf8                             |
| character_set_server     | utf8                             |
| character_set_system     | utf8                             |
| character_sets_dir       | /usr/local/share/mysql/charsets/ |
+--------------------------+----------------------------------+
8 rows in set (0.00 sec)

The exporting machine had wordpress running and was displaying utf8 data just fine, although the client was defaulting to latin1. I suspect that mysql was doing some type of weird escaping.

mysqldump --default-character-set=utf8 ... > dump.sql
mysql --default-character-set=utf8 < dump.sql

didn’t work at all. What worked was:

mysqldump --default-character-set=latin1 ... > dump.sql
sed 's/latin1/utf8/g' dump.sql > dump1.sql
mysql --default-character-set=utf8 < dump1.sql

Hope someone can explain what was going on here.

Leave a Comment

*
To prove you're a person (not a spam script), type the security word shown in the picture.
Anti-spam image