Quirks in RPG Maker: Unicode and having fun with codepages

This is the second blog entry from our “Quirks in RPG Maker” series. In the previous article we introduced the non-Unicode application problem. Now let’s explain how this problem has been addressed.

The problem: Non-Unicode games

As explained in the previous article, Unicode is a standard for writing characters and texts using an universal encoding. Before existing an universal standard, some languages were using their own character encoding, sharing often the same bytes representing different glyphs. Games created with a non-Unicode editor contain non-Unicode texts. These games won’t be readable properly under operating systems with a different encoding than the original, unless some conversion between encodings is applied.

Encoding conversion issues

Modern operating systems are Unicode. This helps to convert non-Unicode encodings to Unicode, having enough room to fit any previous encoding to the new universal. However there are multiple issues when conversion standard was being designed.

Non-Unicode Windows Japanese Encoding is Shift JIS. This encoding was similar in the first 127 characters with ASCII (7-bit part), except the backslash (\) showing a yen symbol as a replacement. Originally, the command trigger for show messages is a ¥ followed by a letter, e.g. ¥c[1], in western is \c[1]. This was breaking message codes before. In fact, Japanese Windows shows a Yen sign (¥) as a directory separator even in modern Unicode Windows. In Korean Windows shows a Won sign (₩) instead. The wave dash from Shift JIS was incorrectly mapped to the Unicode full-width tilde, generating issues not only when showing the sign, the problem could affect games with filenames using this character.

The first approach: libiconv and Windows API

libiconv is a library to convert common legacy encodings to Unicode, supporting most non-Unicode Windows encodings. In order to play any non-Unicode game we needed to set the encoding manually before playing the game. There is an EasyRPG specific parameter in RPG_RT.ini to set the encoding by hand.

However, libiconv and Windows API were not supporting a way to detect the encoding. There were issues with yen, won and full-width tilde conversions, having issues when playing Japanese and Korean games.

Games don’t say which encoding are using

There are RPG Maker 2000/2003 games written in many encodings, but there is no a clear way to determine which encoding are using, as they can share the same bytes for different languages.

The solution: ICU

ICU (International Components for Unicode) is a library which brings very good support, including proper encoding conversion for Yen, Won and full-width tilde and an heuristic system for character encoding detection. With this feature the encoding can be autodetected and works fine for most games.

Broken game translations

There are games however, specifically game translations which are mistranslated, having mixed encodings and making the encoding autodetection fail. This is the case for some broken Yume Nikki translations. If you have file not found errors, you probably are using a broken translation. You can try to force manually the encoding in RPG_RT.ini for these particular cases, they may work better.

The ICU encoding heuristics detection gets some text data from the game. Currently uses some texts from the game database terms. Not all are passed because some default database translations from some editors are untranslated from Japanese and ICU would detect these games as Shift JIS when they probably are written using Windows codepage 1252 (western).