It looks like Oracle has finally released some documentation for
those options they've been using in SPECjbb2005 submissions. The doc is
here and it looks like it appeared on Christmas eve.
Like I guessed, they're using a byte[] array instead of a char[] array for Strings wherever they can.
Presumably this makes the code path more complicated, because every time the JVM deals with a String it now needs to check what kind it is. The space savings are probably worth it, at least in some applications.
Why isn't it on by default? Two possibilities:
- The penalty is too high in many applications. In my opinion, this would make it a bit of a benchmark special.
- The option isn't quite ready for prime time yet, but they plan to turn it on by default later.
Is this option "fair" to non-Western-European applications? I'd argue that it probably isn't unfair. A lot of String objects aren't involved in the user interface at all. In many applications, such as Eclipse, Strings are used extensively as internal identifiers for things like plug ins, extension points, user interface elements, etc. Even if your application presents a non-ASCII user interface there's a good chance that it still has a lot of ASCII strings under the surface. It might not benefit as much from this option, but it would probably still benefit.
(Of course that assumes that there's no penalty for using non-ASCII Strings beyond the extra space. If the option is implemented in an all-or-nothing fashion, e.g. if it stops using byte[] arrays the first time it encounters a non-ASCII String, then non-ASCII applications wouldn't benefit at all.)