ASCII, DBCS, Unicode, UTF-8 ...

by Volker Weber

Joel Spolsky in "The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)":

So I have an announcement to make: if you are a programmer working in 2003 and you don't know the basics of characters, character sets, encodings, and Unicode, and I catch you, I'm going to punish you by making you peel onions for 6 months in a submarine. I swear I will.

Must read!

Comments

My hubby (who we call Captain Unicode) wrote up a step-by-step guide for Unicode (focusing on Japanese) in VC++. He includes the guide along with source code examples...
http://www.mattandjess.net/c++/index.html

Jess Stratton, 2003-10-11

Well, if programmers who didn't know this, now do, it is probably a good thing. But knowing about character sets (i.e., Unicode, etc.) and charcater set encodings (UTF 8, UTF 16, etc.)provides only half the picture, especially for Asian languages. This is because in each of these langauges the same Unicode code point sometimes gives rise to different pictograms (glyphs). That's why all relatively recent IETF and W3C protocols and content encoding insist that along with a character set specifier, you also include a language specifier.

Nick, 2003-10-11

Also read here.

Volker Weber, 2003-10-13

Old vowe.net archive pages

I explain difficult concepts in simple ways. For free, and for money. Clue procurement and bullshit detection.

vowe

Paypal vowe