ASCII, DBCS, Unicode, UTF-8 ...
by Volker Weber
Joel Spolsky in "The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)":
So I have an announcement to make: if you are a programmer working in 2003 and you don't know the basics of characters, character sets, encodings, and Unicode, and I catch you, I'm going to punish you by making you peel onions for 6 months in a submarine. I swear I will.
Must read!
Comments
My hubby (who we call Captain Unicode) wrote up a step-by-step guide for Unicode (focusing on Japanese) in VC++. He includes the guide along with source code examples...
http://www.mattandjess.net/c++/index.html
Well, if programmers who didn't know this, now do, it is probably a good thing. But knowing about character sets (i.e., Unicode, etc.) and charcater set encodings (UTF 8, UTF 16, etc.)provides only half the picture, especially for Asian languages. This is because in each of these langauges the same Unicode code point sometimes gives rise to different pictograms (glyphs). That's why all relatively recent IETF and W3C protocols and content encoding insist that along with a character set specifier, you also include a language specifier.
Also read here.