Sunday, July 10, 2011

A subtle issue of reachability

In the last few weeks I've run into two similar and very subtle problems in Java code. In some ways, these seem to illustrate an oversight in the design the Java language and/or virtual machine. The problem has to do with objects being collected earlier than expected.

In one case a finalizer was run and in the other a PhantomReference was cleared. I'll describe an example based on finalization. It's easy enough to see how this could also apply to reference objects.

Consider a class like this:

public class Foo {
  private byte[] array = new byte[] { 1, 2, 3 };

  public void finalize() { 
    array[0] = array[1] = 
      array[2] = 0; 

  public byte[] getData() { return array.clone(); }

This class is able to return a copy of its data array. (This is a common pattern since it prevents the caller from modifying the master copy of the array). When instances of this class are garbage collected they wipe out the data in the array, overwriting it with zeros. (Let's ignore why it does this; it's just an example!) So far so good.

What result will we get if we invoke getData() on an instance of Foo?

  Foo f = new Foo();
  byte[] array = f.getData();
  System.out.println("array={" + 
    array[0] + ", " + 
    array[1] + ", " + 
    array[2] + "}");

Intuitively, we expect this to print "array={1, 2, 3}". And it usually does. But is it legitimate for it to print "array={0, 0, 0}" (or even "array={1, 2, 0}")? If it did, that would mean that the object was finalized while we were still using it, wouldn't it?

Actually, that can happen. It happens quite often in the IBM Java VM, and seems to happen occasionally in Oracle HotSpot, too, but less frequently.

The Java VM is permitted to collect (or finalize) an object when it is no longer reachable. But how could the object become unreachable when we're running the getData() function? It's easier to understand if you imagine the function in-lined in the caller, and broken up into individual statements:

  Foo f = new Foo();
  byte[] masterArray = f.array; // ignore that array
                                // is private
  // what if a garbage collection happens here?
  // e.g. System.gc(); System.runFinalization();
  byte[] copyArray = masterArray.clone();
  System.out.println("array={" + 
    copyArray[0] + ", " + 
    copyArray[1] + ", " + 
    copyArray[2] + "}");

Here we can see that if the garbage collector interrupts the program at just the right (wrong?) time, the finalize() function might run before we clone the array. Even though we don't explicitly assign null to f, a clever VM can analyze the program and determine that f is never used again. It can reclaim the memory for that object and, in this case, finalize it, before the clone() function runs. In most cases this is exactly what you want the VM to do: garbage collect objects as early as possible to recover as much memory as possible.

Ok, but is that really the same thing? Surely, the receiver of a function is kept alive until the function returns, right? In-lining the function isn't quite the same!

Actually, neither the Java language specification nor the Java Virtual Machine specification say anything about that. In the VM, the receiver of a function (i.e. this) isn't very special at all. It's just the first argument of a virtual function. Although the language doesn't allow it (keep in mind that the Java language and the Java VM have separate specifications) you can overwrite the receiver just as you can a local variable if you're writing bytecodes directly without the aid of javac:

byte[] getData() {
  byte[] masterArray = this.array;
  this = null; // not legal in Java language,
               // but is legal in class files!
  return masterArray.clone();

javac won't compile this, but the JVM's class file verifier won't report any problems in this function.

So, what's the right way to write your Java code so that your objects won't be finalized or collected earlier than expected? Unfortunately, I don't know the answer. You could add an extra reference to the receiver, like this:

byte[] getData() {
  byte[] result = array.clone();
  this.array = this.array;
  return result;

But that's a hack, not a real solution, and is unlikely to work reliably. The VM can easily determine that the dummy "this.array = this.array" statement has no effect and can be removed, leaving us exactly where we started.
Perhaps Java needs a new keyword like this:

byte[] getData() {
  keep_alive(this) {
    return array.clone();

However I doubt that something like that would be used correctly very often.

Unfortunately, the best advice is probably to avoid finalization whenever possible.


  1. Yes, finalize() is a dangerous method to rely upon. But some developers yearn for a destructor in Java. My approach is to think of a constructor as an initializer, and then implement my own life-cycle methods, such as startup() and shutdown() that do the necessary construction and destruction of an object. Of course, these methods are not magic and must be called explicitly. And likewise, I never intend to implement a finalize() method since it's way more rope than I need.

  2. Simon: Your approach is sound. The difficulty is making sure that the shutdown() function is called in all paths, including errors. Java has some good tools to help with this, like try-finally and the new try-with-resources support in Java 7. (I think this Project Coin feature made it into Java 7).

  3. Of course, Peter, I approach this from the point of view of OSGi, which adds component modularity to Java. Every module has a lifecycle that supports calling methods such as startup() and shutdown(). The big challenge is getting developers to understand how to design and implement Java modules, and it's even harder to get them to understand why loosely coupled and highly cohesive modules are a good thing. The new try-with-resources support in Java 7 sounds interesting... I must investigate. Thanks.

  4. I'd argue that the VM should be treating the object referenced until any instance method of that object returns - sure it's not in the current spec (so a VM developer can shrug off any blame by pointing at the spec), but its the most intuitive behaviour as far as the programmer is concerned and certainly beats polluting the language with extra specialized keywords like keep_alive().

  5. asvitkine: That might help in some cases, but can also just obscure the problem.

    What if you use a tool which in-lines methods directly in bytecode? javac used to do this in early versions, and there are a number of optimizers and obfuscators (e.g. which do the same thing. Even if the VM treated the receiver (or all arguments) specially, in-lining a method using one of these tools could still change the reachability of objects.

    What if you're using invokedynamic (new in Java 7)? The "receiver" of a method might not be the first argument, or the method could be multidispatch polymorphic, in which case it could have more than one argument with a claim to be the receiver!