Code Yarns ‍👨‍💻
Tech BlogPersonal Blog

Windows: Multi-Byte

📅 2010-Jan-13 ⬩ ✍️ Ashwin Nanjappa ⬩ 🏷️ multi-byte, unicode, wide character, windows ⬩ 📚 Archive

 

You run into Multi-Byte a lot when developing on Windows. For example, Visual Studio 2008 supports 2 character sets: Multi-Byte and Unicode. Notice that it does not list English or ASCII or some old comfortable 8-byte character set. Just so that we are not confused, Multi-Byte is not the same as Wide Character types and functions. Those use wchar_t, stl::wstring and their functions have a w in their name, wprintf() or std::wcout() for example.

On Windows, Multi-Byte is the old character set. It is not Unicode, which is the new (and recommended) character set. Multi-Byte code looks like old C code written to deal with English characters and strings. It uses the old C char types (char and char *), literal strings ("Hello World") and stl::string. It only differs in behavior: if Windows notices that it is running a Multi-Byte code/application on a non-English locale, the chars are interpreted and displayed according to that locale. For example, a char string of length 2 (or more) could be combined to display just one glyph in the foreign language. Hence, the name Multi-Byte for this character set, its code, libraries and applications.


© 2023 Ashwin Nanjappa • All writing under CC BY-SA license • 🐘 Mastodon📧 Email