The Virtual Machinist

Saturday, March 12, 2011

Proportionality of incremental changes

A few weeks ago I was watching a presentation on Project Lambda. The proposed syntax is generally quite nice. It's very simple, and avoids a lot of the noise which pollutes typical Java programs using inner classes. But one of the syntax examples didn't sit right with me, and it took me a few days to figure out exactly why.

The particular example isn't important, and I don't think it's currently in the public proposal. But what is important, I realized, is that code should be written to facilitate incremental changes, and this interfered with that.

What do I mean?

Here's an example: one of the rules in our coding guidelines is that control blocks must always use curly brackets. That is, don't write code like this:

if (condition)

  doSomething();

else

  doSomethingElse();

We require that this code be written like this, instead:

if (condition) {

  doSomething();

} else {

  doSomethingElse();

}

Here's why I don't like the syntax without the curly brackets (at least one of the reasons): it makes it harder to modify the code.

If you want to add an extra line inside the else block, you need to add the line and convert the single statement block into a compound block:

if (condition)

  doSomething();

else {

  fprintf(stderr, "Debugging doSomethingElse\n");

  doSomethingElse();

}

To add one line of code I need to modify three lines! This is a disproportionate amount of work considering the actual change.

In summary, code should be written in a way which encourages incremental changes. If small logical changes require large textual changes, then there's something wrong, either with the tools or the technique.

Thursday, March 10, 2011

Overloading and varargs

I learned something important today: don't overload a variadic function.

It's very tempting to have functions like this in your C++ class:

class Foo {

public:

  void write(const char* format, ...);

  void write(const chat* format, va_list args);

}

The variadic function would be implemented like this:

void Foo::write(const char* format, ...) {

  va_list args;

  va_start(args, format);

  write(format, args);

  va_end(args);

}

It looks nice and clean -- a perfect application of overloading.

However there's a subtle problem! The C++ compiler will always try to resolve the method with the most specific signature when it encounters an overloaded call.

Calls like this are fine:

foo->write("%s\n", "Hello");

foo->write("...world");

But what if you have an argument which looks like a va_list? For example, if va_list is a pointer on your platform, what does this line do?

foo->write("How many chickens? Answer: %d\n", 0);

In C++ 0 isn't just a number. It's also the NULL pointer. The compiler will decide that you're actually calling the non-variadic function and will use NULL for the va_list argument. Then your program will crash when you try to read an int argument from a NULL va_list.

Lesson learned. Don't overload functions if one of the functions is variadic.

Now I have to go back and fix some code I wrote yesterday...

Wednesday, February 16, 2011

Groan

Friday, December 31, 2010

Die Klasse Namen

I just noticed that the IBM JDK class libraries include these klasses:

com.ibm.security.util.DerInputStream and
com.ibm.security.util.DerValue.

Huck huck.

Ach! Der InputStream ist ein ... NuisanceStream! Der Value, Der. No one who speaks German could be an evil man!

Happy New Year!

Tuesday, December 28, 2010

-XX:+UseCompressedStrings explained

It looks like Oracle has finally released some documentation for those options they've been using in SPECjbb2005 submissions. The doc is here and it looks like it appeared on Christmas eve.

Like I guessed, they're using a byte[] array instead of a char[] array for Strings wherever they can.

Presumably this makes the code path more complicated, because every time the JVM deals with a String it now needs to check what kind it is. The space savings are probably worth it, at least in some applications.

Why isn't it on by default? Two possibilities:

The penalty is too high in many applications. In my opinion, this would make it a bit of a benchmark special.
The option isn't quite ready for prime time yet, but they plan to turn it on by default later.

Is this option "fair" to non-Western-European applications? I'd argue that it probably isn't unfair. A lot of String objects aren't involved in the user interface at all. In many applications, such as Eclipse, Strings are used extensively as internal identifiers for things like plug ins, extension points, user interface elements, etc. Even if your application presents a non-ASCII user interface there's a good chance that it still has a lot of ASCII strings under the surface. It might not benefit as much from this option, but it would probably still benefit.

(Of course that assumes that there's no penalty for using non-ASCII Strings beyond the extra space. If the option is implemented in an all-or-nothing fashion, e.g. if it stops using byte[] arrays the first time it encounters a non-ASCII String, then non-ASCII applications wouldn't benefit at all.)

Friday, December 24, 2010

Working backwards

Sometimes I get a bug that I just can't figure out. If the problem is reproducible with a good test case it's usually fairly easy to narrow the problem down pretty quickly. But what do you do if your product crashed once on a customer's server, and hasn't failed again?

Well, you start with the logs and diagnostic files. Usually we can figure it out from tracepoints and a core file. But sometimes this doesn't work.

In cases like this I don't like to throw in the towel without doing something. It feels like defeat (probably because it is). Instead, I always try to figure out what additional information could have helped me solve the problem.

How did we get to the point of failure? If I can identify two or three paths to the failure point and can't infer which one was taken I'll add some tracepoints to those paths. Or maybe I can add assertions on those paths to detect the error a bit earlier.

Since the problem isn't reproducible they won't help me now, but they might help me in the future. If the problem does occur again (and 'not reproducible' really just means 'very rare') hopefully these diagnostics will get me one step closer to the actual problem. And if it didn't hit any of my new tracepoints or assertions when it reoccurs, that's useful (and potentially maddening) too.

Of course I still might not be able to figure out what's happening. Then I add another round of tracepoints and assertions. Each failure gets me one step closer to the solution.

Saturday, November 6, 2010

Amiguous Java packages and innner classes

Yesterday, when I should have been paying attention to my breathing during yoga, I was instead thinking about inner classes. Specifically I was wondering how Java resolves ambiguities between package names and inner class names.

In Java, an inner class is specified using its parent's name, a dot, and its name. for example, Foo.Bar is an inner class named "Bar" in the parent class "Foo". But this is also how Java names packages. Bar could just as easily be a first class class in package Foo.

Note that there's no ambiguity once the code has been compiled. The class file format uses / as a package separator and $ as an inner class separator, so Foo$Bar is an inner class, and Foo/Bar is a top level class in package Foo. The ambiguity is only present in the compiler, where "." serves both purposes.

I wrote a simple test case. I created a simple Foo.java file with a public static inner class Bar, and a Foo/Bar.java file with a public class Bar. Then I wrote a test class like this:

public class Test {
        public static void main(String[] args) {
                System.out.println(new Foo.Bar());
        }
}

What happens when you compile and run this?

peter$ java Test
Foo$Bar@c53dce

Notice the '$' in the name? The inner class won, so it looks like javac resolves the ambiguity in favour of the inner class.

But what if we really want to mess with the compiler? What happens if you try to compile this file?

public class java {
        public static class lang {
                public static class Object {
                }
        }
}

Yes, the file is called java.java. None of these names are reserved in Java, so this ought to be a perfectly legitimate class named java. It has an inner class, java.lang, with its own inner class, java.lang.Object (not to be confused with java.lang.Object, the superclass of each of these classes).

javac doesn't seem to like this class very much, at least not the version installed on my Mac:

tmp peter$ javac java.java
An exception has occurred in the compiler (1.6.0_17). Please file a bug at the Java Developer Connection (http://java.sun.com/webapps/bugreport) after checking the Bug Parade for duplicates. Include your program and the following diagnostic in your report. Thank you.
java.lang.NullPointerException
    at com.sun.tools.javac.comp.Flow.visitIdent(Flow.java:1214)
    at com.sun.tools.javac.tree.JCTree$JCIdent.accept(JCTree.java:1547)
    at com.sun.tools.javac.tree.TreeScanner.scan(TreeScanner.java:35)
    ...

I don't think that this is a security hole, since the ambiguity is only present in the compiler. It's possible that there's a similar ambiguity in reflection, but I'm not sure. That might be somewhat higher risk.

Update: Eclipse compiles this java.java file with no problem. Looks like it's just an obscure bug in javac.

Further update: I'm not the first person to discover this: http://www.bodden.de/tag/name-resolution/