Like I guessed, they're using a byte[] array instead of a char[] array for Strings wherever they can.
Presumably this makes the code path more complicated, because every time the JVM deals with a String it now needs to check what kind it is. The space savings are probably worth it, at least in some applications.
Why isn't it on by default? Two possibilities:
- The penalty is too high in many applications. In my opinion, this would make it a bit of a benchmark special.
- The option isn't quite ready for prime time yet, but they plan to turn it on by default later.
(Of course that assumes that there's no penalty for using non-ASCII Strings beyond the extra space. If the option is implemented in an all-or-nothing fashion, e.g. if it stops using byte[] arrays the first time it encounters a non-ASCII String, then non-ASCII applications wouldn't benefit at all.)
I did some very preliminary testing for my purposes. I was parsing 10 kB XMLs with Xalan (i.e. default parser) on JDK build 1.6.0_23-b05 (win 64 bit) and was generating text-only PDFs out of each of them with iText. Tests were done with all data in memory, all strings had nothing but ASCII, minimum 1000 transformations in one loop.
ReplyDeleteThe tests consistently show a speed penalty of about 10% when -XX:+UseCompressedStrings is used. No other JVM modifiers, BTW.
I would imagine that when caching wins over any I/O many times over in speed, it might well be a good deal. It does consume measurably less heap. In my case, very approximately around 30%.
Interesting. I would have assumed that in most cases this compression occurs naturally when serializing Objects (with String members) as UTF-8, so that there'd be little benefit to tweaking in-memory representation. Except for Strings that are seldom used (could be the case for class definitions, method names etc). Then again access via String.charAt() should be relatively fast so maybe overhead is not all that drastic.
ReplyDeletepresumably the repeated -XX: is a typo (as in -XX:-XX:+UseCompressedStrings)?
ReplyDelete