Like I guessed, they're using a byte array instead of a char array for Strings wherever they can.
Presumably this makes the code path more complicated, because every time the JVM deals with a String it now needs to check what kind it is. The space savings are probably worth it, at least in some applications.
Why isn't it on by default? Two possibilities:
The penalty is too high in many applications. In my opinion, this would make it a bit of a benchmark special.
The option isn't quite ready for prime time yet, but they plan to turn it on by default later.
Is this option "fair" to non-Western-European applications? I'd argue that it probably isn't unfair. A lot of String objects aren't involved in the user interface at all. In many applications, such as Eclipse, Strings are used extensively as internal identifiers for things like plug ins, extension points, user interface elements, etc. Even if your application presents a non-ASCII user interface there's a good chance that it still has a lot of ASCII strings under the surface. It might not benefit as much from this option, but it would probably still benefit.
(Of course that assumes that there's no penalty for using non-ASCII Strings beyond the extra space. If the option is implemented in an all-or-nothing fashion, e.g. if it stops using byte arrays the first time it encounters a non-ASCII String, then non-ASCII applications wouldn't benefit at all.)
Sometimes I get a bug that I just can't figure out. If the problem is reproducible with a good test case it's usually fairly easy to narrow the problem down pretty quickly. But what do you do if your product crashed once on a customer's server, and hasn't failed again?
Well, you start with the logs and diagnostic files. Usually we can figure it out from tracepoints and a core file. But sometimes this doesn't work.
In cases like this I don't like to throw in the towel without doing something. It feels like defeat (probably because it is). Instead, I always try to figure out what additional information could have helped me solve the problem.
How did we get to the point of failure? If I can identify two or three paths to the failure point and can't infer which one was taken I'll add some tracepoints to those paths. Or maybe I can add assertions on those paths to detect the error a bit earlier.
Since the problem isn't reproducible they won't help me now, but they might help me in the future. If the problem does occur again (and 'not reproducible' really just means 'very rare') hopefully these diagnostics will get me one step closer to the actual problem. And if it didn't hit any of my new tracepoints or assertions when it reoccurs, that's useful (and potentially maddening) too.
Of course I still might not be able to figure out what's happening. Then I add another round of tracepoints and assertions. Each failure gets me one step closer to the solution.