The Virtual Machinist: 2011

Tuesday, September 20, 2011

IBM Java 7 is now available

IBM Java 7 was officially released yesterday, September 19. You can download it from IBM DeveloperWorks.

There are a lot of exciting new features including a number of GC improvements.

The balanced GC policy, which I've mentioned before, is included in all 64-bit Java 7 JDKs. You can enable it with -Xgcpolicy:balanced.
The soft-realtime garbage collector is included for evaluation on Linux and AIX. It can be enabled with -Xgcpolicy:metronome.
The verbose GC format has been completely overhauled. It now provides more information and the XML format has been redesigned to make machine interpretation of the data simpler, allowing both IBM and customers to write tools to process and analyse the data.

Friday, September 2, 2011

"Don't do what Donny Don't does"

Thank to Evan Hughes for pointing out this paper: Conditional statements, looping constructs, and program comprehension: an experimental study.

Not surprisingly, negative conditions are more difficult to understand than positive conditions.

I always try to write conditions to be positive. Sometimes, I'll even include an empty 'if' block so that I can put code in the 'else' block instead of using a negative condition:

   if (isInRange(value)) {

      // expected case; do nothing

   } else {

      throw new OutOfRangeException(value);

}

I haven't read the full paper, so maybe the researchers answered my next question: is the problem exacerbated by the syntax for negative conditions used in C-like languages? I find that the '!' operator uses very little horizontal space, making it less noticeable than other unary operators such as '~' or '*'.

e.g. would this statement:

   if (!isInRange(value)) { ...

be more obvious if it were written like this?

   if (not isInRange(value)) { ...

Thursday, August 4, 2011

Balanced garbage collection

We've just published an article on the IBM DeveloperWorks website describing the new garbage collection technology available in IBM Java 6 2.6 and IBM Java 7. This is a project I've been working on for several years and we're pretty excited that customers can now try it out for themselves.

You can read the article for yourself.

Friday, July 15, 2011

Reachability follow up

We've been having quite an interesting series of conversations at work about this reachability problem. Today, one of my co-workers, Dan Heidinga, pointed out that Microsoft's CLR has the same issue. For CLR Microsoft has provided a special static method called GC.KeepAlive(Object). This acts as a hint to the virtual machine and JIT to extend an object's lifetime, but is otherwise a no-op. There's a good article about the problem on an old MSDN blog here. Note that the author considers and rejects the option of automatically extending the lifetime of all function arguments to the end of their functions on the basis that it impacts codegen and therefore performance.

Sunday, July 10, 2011

A subtle issue of reachability

In the last few weeks I've run into two similar and very subtle problems in Java code. In some ways, these seem to illustrate an oversight in the design the Java language and/or virtual machine. The problem has to do with objects being collected earlier than expected.

In one case a finalizer was run and in the other a PhantomReference was cleared. I'll describe an example based on finalization. It's easy enough to see how this could also apply to reference objects.

Consider a class like this:

public class Foo {

  private byte[] array = new byte[] { 1, 2, 3 };

  public void finalize() { 

    array[0] = array[1] = 

      array[2] = 0; 

}

  public byte[] getData() { return array.clone(); }

}

This class is able to return a copy of its data array. (This is a common pattern since it prevents the caller from modifying the master copy of the array). When instances of this class are garbage collected they wipe out the data in the array, overwriting it with zeros. (Let's ignore why it does this; it's just an example!) So far so good.

What result will we get if we invoke getData() on an instance of Foo?

  Foo f = new Foo();

  byte[] array = f.getData();

  System.out.println("array={" + 

    array[0] + ", " + 

    array[1] + ", " + 

    array[2] + "}");

Intuitively, we expect this to print "array={1, 2, 3}". And it usually does. But is it legitimate for it to print "array={0, 0, 0}" (or even "array={1, 2, 0}")? If it did, that would mean that the object was finalized while we were still using it, wouldn't it?

Actually, that can happen. It happens quite often in the IBM Java VM, and seems to happen occasionally in Oracle HotSpot, too, but less frequently.

The Java VM is permitted to collect (or finalize) an object when it is no longer reachable. But how could the object become unreachable when we're running the getData() function? It's easier to understand if you imagine the function in-lined in the caller, and broken up into individual statements:

  Foo f = new Foo();

  byte[] masterArray = f.array; // ignore that array

                                // is private

  // what if a garbage collection happens here?

  // e.g. System.gc(); System.runFinalization(); 

  byte[] copyArray = masterArray.clone();

  System.out.println("array={" + 

    copyArray[0] + ", " + 

    copyArray[1] + ", " + 

    copyArray[2] + "}");

Here we can see that if the garbage collector interrupts the program at just the right (wrong?) time, the finalize() function might run before we clone the array. Even though we don't explicitly assign null to f, a clever VM can analyze the program and determine that f is never used again. It can reclaim the memory for that object and, in this case, finalize it, before the clone() function runs. In most cases this is exactly what you want the VM to do: garbage collect objects as early as possible to recover as much memory as possible.

Ok, but is that really the same thing? Surely, the receiver of a function is kept alive until the function returns, right? In-lining the function isn't quite the same!

Actually, neither the Java language specification nor the Java Virtual Machine specification say anything about that. In the VM, the receiver of a function (i.e. this) isn't very special at all. It's just the first argument of a virtual function. Although the language doesn't allow it (keep in mind that the Java language and the Java VM have separate specifications) you can overwrite the receiver just as you can a local variable if you're writing bytecodes directly without the aid of javac:

byte[] getData() {

  byte[] masterArray = this.array;

  this = null; // not legal in Java language,

               // but is legal in class files!

  return masterArray.clone();

}

javac won't compile this, but the JVM's class file verifier won't report any problems in this function.

So, what's the right way to write your Java code so that your objects won't be finalized or collected earlier than expected? Unfortunately, I don't know the answer. You could add an extra reference to the receiver, like this:

byte[] getData() {

  byte[] result = array.clone();

  this.array = this.array; 

  return result;

}

But that's a hack, not a real solution, and is unlikely to work reliably. The VM can easily determine that the dummy "this.array = this.array" statement has no effect and can be removed, leaving us exactly where we started.

Perhaps Java needs a new keyword like this:

byte[] getData() {

  keep_alive(this) {

    return array.clone();

}

}

However I doubt that something like that would be used correctly very often.

Unfortunately, the best advice is probably to avoid finalization whenever possible.

Saturday, March 12, 2011

Proportionality of incremental changes

A few weeks ago I was watching a presentation on Project Lambda. The proposed syntax is generally quite nice. It's very simple, and avoids a lot of the noise which pollutes typical Java programs using inner classes. But one of the syntax examples didn't sit right with me, and it took me a few days to figure out exactly why.

The particular example isn't important, and I don't think it's currently in the public proposal. But what is important, I realized, is that code should be written to facilitate incremental changes, and this interfered with that.

What do I mean?

Here's an example: one of the rules in our coding guidelines is that control blocks must always use curly brackets. That is, don't write code like this:

if (condition)

  doSomething();

else

  doSomethingElse();

We require that this code be written like this, instead:

if (condition) {

  doSomething();

} else {

  doSomethingElse();

}

Here's why I don't like the syntax without the curly brackets (at least one of the reasons): it makes it harder to modify the code.

If you want to add an extra line inside the else block, you need to add the line and convert the single statement block into a compound block:

if (condition)

  doSomething();

else {

  fprintf(stderr, "Debugging doSomethingElse\n");

  doSomethingElse();

}

To add one line of code I need to modify three lines! This is a disproportionate amount of work considering the actual change.

In summary, code should be written in a way which encourages incremental changes. If small logical changes require large textual changes, then there's something wrong, either with the tools or the technique.

Thursday, March 10, 2011

Overloading and varargs

I learned something important today: don't overload a variadic function.

It's very tempting to have functions like this in your C++ class:

class Foo {

public:

  void write(const char* format, ...);

  void write(const chat* format, va_list args);

}

The variadic function would be implemented like this:

void Foo::write(const char* format, ...) {

  va_list args;

  va_start(args, format);

  write(format, args);

  va_end(args);

}

It looks nice and clean -- a perfect application of overloading.

However there's a subtle problem! The C++ compiler will always try to resolve the method with the most specific signature when it encounters an overloaded call.

Calls like this are fine:

foo->write("%s\n", "Hello");

foo->write("...world");

But what if you have an argument which looks like a va_list? For example, if va_list is a pointer on your platform, what does this line do?

foo->write("How many chickens? Answer: %d\n", 0);

In C++ 0 isn't just a number. It's also the NULL pointer. The compiler will decide that you're actually calling the non-variadic function and will use NULL for the va_list argument. Then your program will crash when you try to read an int argument from a NULL va_list.

Lesson learned. Don't overload functions if one of the functions is variadic.

Now I have to go back and fix some code I wrote yesterday...

The Virtual Machinist