Archive forTechnology

Mysql and UTF-8 hell

I was trying to dump some UTF-8 data from a mysql database and import it on another machine. Getting it right took almost a day for me.

It was hard to figure out who was screwing up. Was it mysql, php, php-mysql, server or the client, dumping program or something else?

After having solved the problem, I still don’t totally understand what went wrong. But blogging it here in the hope that someone will find it someday and find it useful.

Machine which was exporting had:

mysql> show variables like '%char%';
+--------------------------+----------------------------------+
| Variable_name            | Value                            |
+--------------------------+----------------------------------+
| character_set_client     | latin1                           |
| character_set_connection | latin1                           |
| character_set_database   | utf8                             |
| character_set_results    | latin1                           |
| character_set_server     | latin1                           |
| character_set_system     | utf8                             |
| character_sets_dir       | /usr/local/share/mysql/charsets/ |
+--------------------------+----------------------------------+

Machine which was importing had:

mysql> show variables like '%char%';
+--------------------------+----------------------------------+
| Variable_name            | Value                            |
+--------------------------+----------------------------------+
| character_set_client     | utf8                             |
| character_set_connection | utf8                             |
| character_set_database   | utf8                             |
| character_set_filesystem | binary                           |
| character_set_results    | utf8                             |
| character_set_server     | utf8                             |
| character_set_system     | utf8                             |
| character_sets_dir       | /usr/local/share/mysql/charsets/ |
+--------------------------+----------------------------------+
8 rows in set (0.00 sec)

The exporting machine had wordpress running and was displaying utf8 data just fine, although the client was defaulting to latin1. I suspect that mysql was doing some type of weird escaping.

mysqldump --default-character-set=utf8 ... > dump.sql
mysql --default-character-set=utf8 < dump.sql

didn’t work at all. What worked was:

mysqldump --default-character-set=latin1 ... > dump.sql
sed 's/latin1/utf8/g' dump.sql > dump1.sql
mysql --default-character-set=utf8 < dump1.sql

Hope someone can explain what was going on here.

Comments

IIT Salaries

The headline in today’s economic times screams - some IIT guy got a $100k offer and quotes another guy as saying “IIMs are for second rate IITians”. I can’t seem to find a better example of the vulgarity I was alluding to in my earlier posts and the extent to which Indian media stoops to cater to the starry eyed kids who want a decent life and ready to pay Rs 10 to read this kind of crap.

Somewhere down on line 43 you’ll find that the average salary is only around Rs 600k ($13k). I tend to think that the guys who are trained in computer science are doing a big disservice to their profession and education by throwing away their skills and doing unrelated work for some oil drilling company or a stock broker.

Economic realities of today’s India can’t be ignored and one can’t be a hermit and work on a theoretical computer science problem at Rs 4000 per month. But a better compromise is possible and I think the media needs to highlight better role models - people who take good care of themselves, create job opportunities for others as well and do something for the advancement of technology.

I’m sure most reasonable people will see that it’s the advancement of technology that holds the key to true economic success - not low cost IT outsourcing or financial engineering.

Comments (4)

Bill Gates mocks MIT’s $100 laptop project

While Mocking MIT’s $100 laptop project, Bill Gates said

Hardware is a small part of the cost” of providing computing capabilities, he said, adding that the big costs come from network connectivity, applications and support.

Someone at MS seems to be feeding him misinformation about BRIC. Here are some stats from India:

Cost of a laptop Rs 80000 USD 2000
Cost of a PC Rs 20000 USD 500
Cost of Broadband per month Rs 250 USD 6
Support cost per visit Rs 100 USD 2
Cost of a “commodity OS” Rs 100 USD 2

Whether the commodity OS is a Linux/BSD or pirated Windows depends on the seller and the buyer.

Comments

Server market share by OS

According to IDC , sales of Windows systems accounted for 36.9% of all server revenue in the quarter, versus 31.7% for Unix and 11.5% for Linux.

However, Linux is growing faster than windows (34.3% vs 17.7% revenue growth) and (20.5 % vs 15.3% unit shipment growth).

More information inferred from the IDC press release:

x86 revenue = 6.3 billion (50.4%)

From Q205 numbers:

x86 revenue = 5.7 billion
x86 volume = 1.5 million servers
average price = $3800 (too high for dual cpu boxen?)

Comments (1)

ICANN owns the internet?

Another glaring example of useless diplomatic speak that the UN is often accused of.

To some extent, all this rhetoric is useless. If the other countries don’t like the US control over ICANN, they should just setup their own and ignore ICANN.

Comments (1)

Scott McNealy’s eco-friendly challenge

Someone should tell Mr McNealy about the Mac Mini, which consumes 12W idle and 20W max, while doing work. And I can get work done while the network is down.

Comments

Alternative DNS

Politics and the Internet don’t mix very well.

“Given the Internet’s importance to the world’s economy, it is essential that the underlying domain name system of the Internet remain stable and secure,” the letter said. “As such, the United States should take no action that would have the potential to adversely impact the effective and efficient operation of the domain name system. Therefore, the United States should maintain its historic role in authorizing changes or modifications to the authoritative root zone file.”

These politicians have little clue that the Chinese government can legislate that all ISP DNS servers in China use the UN root servers instead of American ones. The C-net correspondent seems to be more interested in lecturing over the democratic credentials (or the lack thereof) of Tunisia and Cuba, than addressing the issue.

Comments

Another XNU/MacOSX article

What Is Darwin (and How It Powers Mac OS X) seems to be another article about MacOSX which didn’t offer any new insights into why Apple continues to use it instead of FreeBSD, despite all the performance criticisms levelled against it recently by Anandtech.

Also, both Apple and Sun seem to have very little understanding of how to compete against Linux (hint: Single CD distro which people can recompile from scratch including the kernel)

Comments

“Hot” open source software

If you’re wondering what OS/distro the “pros” use, you might want to look at one of the more successful internet businesses .

I ran into this while doing a yum update today.

Comments

Sun x64 and Specfp

Sun has been getting bold about the language in their ads and attacking competition, but why pick specfp?

Comments

« Previous entries