I was trying to dump some UTF-8 data from a mysql database and import it on another machine. Getting it right took almost a day for me.
It was hard to figure out who was screwing up. Was it mysql, php, php-mysql, server or the client, dumping program or something else?
After having solved the problem, I still don’t totally understand what went wrong. But blogging it here in the hope that someone will find it someday and find it useful.
Machine which was exporting had:
mysql> show variables like '%char%';
+--------------------------+----------------------------------+
| Variable_name | Value |
+--------------------------+----------------------------------+
| character_set_client | latin1 |
| character_set_connection | latin1 |
| character_set_database | utf8 |
| character_set_results | latin1 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/local/share/mysql/charsets/ |
+--------------------------+----------------------------------+
Machine which was importing had:
mysql> show variables like '%char%';
+--------------------------+----------------------------------+
| Variable_name | Value |
+--------------------------+----------------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /usr/local/share/mysql/charsets/ |
+--------------------------+----------------------------------+
8 rows in set (0.00 sec)
The exporting machine had wordpress running and was displaying utf8 data just fine, although the client was defaulting to latin1. I suspect that mysql was doing some type of weird escaping.
mysqldump --default-character-set=utf8 ... > dump.sql
mysql --default-character-set=utf8 < dump.sql
didn’t work at all. What worked was:
mysqldump --default-character-set=latin1 ... > dump.sql
sed 's/latin1/utf8/g' dump.sql > dump1.sql
mysql --default-character-set=utf8 < dump1.sql
Hope someone can explain what was going on here.