tag:blogger.com,1999:blog-14684792228025169392024-03-24T19:31:43.947-04:00The Virtual MachinistPeter Burkahttp://www.blogger.com/profile/15290253671442020919noreply@blogger.comBlogger33125tag:blogger.com,1999:blog-1468479222802516939.post-14938153713247873372014-12-16T10:46:00.001-05:002014-12-16T10:46:16.746-05:00Flickr, beards and feature recognitionSo it looks like Flickr is applying image recognition beyond face recognition. A couple of months ago they demoed their "<a href="http://www.digitaltrends.com/photography/flickrs-simple-park-bird-tool-actually-demo-complex-image-recognition/">Park or Bird?</a>" feature recognition system, but I hadn't realized they'd put anything into production until today.<br /><br />Yesterday, I uploaded <a href="https://www.flickr.com/photos/pburka/">some old photos</a> from mid 2000 that I'd taken with my first digital camera, and had recently recovered (thanks Dana). This morning I noticed that someone had found the photo below by searching for "Beard". That seemed odd, since I hadn't tagged any of the old photos. The only way that Flickr could know that Gac has a beard is by looking at the photo itself.<div>
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://flic.kr/p/pu1Zdt"><img border="0" height="213" src="https://farm9.staticflickr.com/8642/15413260023_5a0393c41b_z_d.jpg" width="320" /></a></div>
<div>
<br /><br />Something feels a little bit creepy about this, but on the other hand it's also pretty cool.<br /><br />I would like to see Flickr expose this more explicitly, and give me an opportunity to edit these automatically added tags.<br /><br /></div>
Peter Burkahttp://www.blogger.com/profile/12855319784439726546noreply@blogger.com0tag:blogger.com,1999:blog-1468479222802516939.post-1584784579048944112014-11-16T20:30:00.003-05:002014-11-16T20:30:54.038-05:00Picasa<br />
<div style="-webkit-text-stroke-width: 0px; background-color: white; color: #141823; font-family: Helvetica, Arial, 'lucida grande', tahoma, verdana, arial, sans-serif; font-size: 11.8181819915771px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 13.9636354446411px; orphans: auto; text-align: left; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px;">
</div>
<div style="-webkit-text-stroke-width: 0px; background-color: white; color: #141823; font-family: Helvetica, Arial, 'lucida grande', tahoma, verdana, arial, sans-serif; font-size: 11.8181819915771px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 13.9636354446411px; orphans: auto; text-align: left; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px;">
<form action="https://www.facebook.com/ajax/ufi/modify.php" class="live_747460945302869_316526391751760 commentable_item autoexpand_mode" data-ft="{"tn":"]"}" data-live="{"seq":0}" id="u_m2_4" method="post" rel="async" style="margin: 0px; padding: 0px;">
<div class="_5pcp _5vsi" style="color: #9197a3; margin-top: 10px;">
<div>
<span><span data-reactid=".12o"></span></span></div>
</div>
</form>
</div>
<br />
<div class="_5pbx userContent" data-ft="{"tn":"K"}" style="-webkit-text-stroke-width: 0px; background-color: white; color: #141823; font-family: Helvetica, Arial, 'lucida grande', tahoma, verdana, arial, sans-serif; font-size: 14px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 1.38; orphans: auto; overflow: hidden; text-align: left; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px;">
<div style="margin: 0px 0px 6px;">
I learned something about Picasa today: when you edit a photo, it stores all the modifications as extra fields in the JPEG file and doesn't modify the displayed image. Until you export the image from Picasa, other tools only see the original image.</div>
<div style="margin: 6px 0px;">
I have a bunch of photos from before 2006 that I salvaged from an old Thinkpad, and which I'd touched up with Picasa, and I want to upload them to Flickr. But because the touch-ups are all in metadata, none of the images reflect this.</div>
<div style="display: inline; margin: 6px 0px 0px;">
I downloaded the latest version of Picasa for OS X, imported all of these photos, and amazingly it seems to have correctly applied the changes (which were made with a much older version of the tool). They're not pixel-for-pixel identical -- I presume that some of the enhancement algorithms have changed -- but they're damn close.</div>
</div>
Peter Burkahttp://www.blogger.com/profile/12855319784439726546noreply@blogger.com0tag:blogger.com,1999:blog-1468479222802516939.post-3983844874723341782014-08-22T07:51:00.000-04:002014-08-22T14:08:29.904-04:00The case of the missing CitibikesThe example of my friend and colleague Ben, with his amazing <a href="http://iquantny.tumblr.com/">I Quant NY</a> blog, has motivated me to try my hand at some open data hacking. Ben's written several posts where he analyzes Citibike bike share data. Citibike has made all of their trip data through the end of May 2014 <a href="http://www.citibikenyc.com/system-data">available for free download</a>. I'm a huge fan of New York's bike share program and of their open data policy.<br />
<br />
Ben has analyzed <a href="http://iquantny.tumblr.com/post/94698834784/interborough-less-rapid-transit-analyzing-citi-bike">trips</a> and <a href="http://iquantny.tumblr.com/post/94508898134/in-search-of-the-safest-citi-bike-stations-using-open">stations</a>, but I have a different question: how many of New York's shared bikes have been stolen or lost?<br />
<br />
The <a href="http://nypost.com/2014/07/17/brooklyn-becoming-a-dumping-ground-for-stolen-citi-bikes/">New York Post reports</a> that bikes are routinely stolen from Manhattan stations and ridden to underserved parts of Brooklyn and Queens. I'm not too concerned about these bikes: they're recovered quickly, and Citibike may wish to treat the bikes' eventual destinations as a kind of <a href="https://en.wikipedia.org/wiki/Desire_path">desire line</a>. Clearly Crown Heights residents can't wait for the program to expand to their neighborhood.<br />
<br />
In July, the <a href="http://nymag.com/daily/intelligencer/2014/07/how-not-to-paint-a-stolen-citi-bike.html">109th precinct proudly reported</a> that their detectives had detected a 68 year old man riding a Citibike which had been liberated and repainted. The suspected thief was detained and his ride confiscated. That's what I'm looking for!<br />
<blockquote class="twitter-tweet" lang="en">
What happens when you steal a <a href="https://twitter.com/CitibikeNYC">@CitibikeNYC</a>, do a bad job repainting it & ride it around the <a href="https://twitter.com/hashtag/109pct?src=hash">#109pct</a> ? You get Caught! <a href="http://t.co/qLxcOChtKF">pic.twitter.com/qLxcOChtKF</a><br />
— NYPD 109th Precinct (@NYPD109Pct) <a href="https://twitter.com/NYPD109Pct/statuses/492461887208964096">July 25, 2014</a></blockquote>
<br />
So I got myself the trip data, put together a quick-and-dirty python script, and identified the first and last trip for each bike in the system. I presume that if a bike is stolen or destroyed it will disappear from the trip data, so we can guess that if a bike hasn't been ridden in some time, it's likely gone AWOL.<br />
<br />
Note that the <span style="font-family: Courier New, Courier, monospace;">bikeid</span> field in the data doesn't appear to match the number stenciled on the bike's frame. It could correspond to the electronic identifier (probably an RFID tag) which the stations use to identify bikes. If that's the case, missing trip data could simply indicate that the electronics were damaged and replaced.<br />
<br />
There are 6943 unique numbers in the trip data. This is roughly consistent with a <a href="http://www.nytimes.com/2013/05/28/nyregion/bike-share-program-opens-in-new-york-city-after-long-delay.html?pagewanted=all&_r=0">New York Times story</a>, published when the program launched, reporting 6,000 bikes in the system.<br />
<br />
If we sort the bikes by their final trip, we can quickly get an estimate of losses.<br />
<br />
<table border="1">
<tbody>
<tr><th><span style="font-family: Arial, Helvetica, sans-serif;">Month</span></th><th><span style="font-family: Arial, Helvetica, sans-serif;">Final rides</span></th><th><span style="font-family: Arial, Helvetica, sans-serif;">Month</span></th><th><span style="font-family: Arial, Helvetica, sans-serif;">Final rides</span></th></tr>
<tr><td><span style="font-family: Arial, Helvetica, sans-serif;">2013-07</span></td><td><span style="font-family: Arial, Helvetica, sans-serif;">18</span></td><td><span style="font-family: Arial, Helvetica, sans-serif;">2014-01</span></td><td><span style="font-family: Arial, Helvetica, sans-serif;">26</span></td></tr>
<tr><td><span style="font-family: Arial, Helvetica, sans-serif;">2013-08</span></td><td><span style="font-family: Arial, Helvetica, sans-serif;">36</span></td><td><span style="font-family: Arial, Helvetica, sans-serif;">2014-02</span></td><td><span style="font-family: Arial, Helvetica, sans-serif;">60</span></td></tr>
<tr><td><span style="font-family: Arial, Helvetica, sans-serif;">2013-09</span></td><td><span style="font-family: Arial, Helvetica, sans-serif;">17</span></td><td><span style="font-family: Arial, Helvetica, sans-serif;">2014-03</span></td><td><span style="font-family: Arial, Helvetica, sans-serif;">68</span></td></tr>
<tr><td><span style="font-family: Arial, Helvetica, sans-serif;">2013-10</span></td><td><span style="font-family: Arial, Helvetica, sans-serif;">31</span></td><td><span style="font-family: Arial, Helvetica, sans-serif;">2014-04</span></td><td><span style="font-family: Arial, Helvetica, sans-serif;">428</span></td></tr>
<tr><td><span style="font-family: Arial, Helvetica, sans-serif;">2013-11</span></td><td><span style="font-family: Arial, Helvetica, sans-serif;">41</span></td><td><span style="font-family: Arial, Helvetica, sans-serif;">2014-05</span></td><td><span style="font-family: Arial, Helvetica, sans-serif;">6186</span></td></tr>
<tr><td><span style="font-family: Arial, Helvetica, sans-serif;">2013-12</span></td><td><span style="font-family: Arial, Helvetica, sans-serif;">32</span></td><td></td><td><br /></td></tr>
</tbody></table>
<div>
The vast majority of the bikes showed activity in May 2014, meaning that they weren't stolen or lost. Before April, each month saw between 17 and 68 final rides, averaging 36.5 each month.</div>
<div>
<br /></div>
<div>
At first glance, April appears to have been a disastrous month for Citibike thefts. But a more likely explanation could be that those bikes have been removed for maintenance. If Citibike keeps 300-400 bikes in their warehouse for routine tuneups, and if it takes two months for the bikes to rotate back out into service, it could easily explain most of the 428 bikes which were ridden in April, but idle in May. We would expect most of them to return to service in June. </div>
<div>
<br /></div>
<div>
February and March also saw higher than average losses. Perhaps bike thieves are more active in those months, but this may be better explained by the unusually snowy winter. Plows, for example, may have taken a toll on the fleet.</div>
<div>
<br /></div>
<div>
Citibike can, at least in theory, bill a rider $1200 for failing to return a bike. If they collected this fee for each bike which went missing before April, they'd have raised nearly $400,000. However I've yet to hear of any rider receiving such a bill.</div>
<div>
<br /></div>
<div>
Assuming that these final trips do represent theft and loss, approximately half of 1% of Citibikes are lost each month, or about 6% every year. That's far better than the reputed 80% of Paris's Vélib' bikes which were stolen in that system's first year!
<br />
<br />
<i>Update: the original version of the table showed 116 trips which ended in June. This is because there were a handful of trips which started on May 31 but finished after midnight, and were thus credited to the next month. To make it less confusing, I've merged these final rides into the data for May.</i></div>
Peter Burkahttp://www.blogger.com/profile/12855319784439726546noreply@blogger.com4tag:blogger.com,1999:blog-1468479222802516939.post-29651616843801150192013-06-03T18:43:00.000-04:002013-06-03T18:43:44.918-04:00Variadic macros and trailing commasOne of the great new (ok, 14-year old) features of C99 is the variadic macro. This feature finally provides a portable way to write a logging macro, for example. Here's a simple example:<br />
<br />
<span style="font-family: Courier New, Courier, monospace;">#define LOGIFY(format, ...) \</span><br />
<span style="font-family: Courier New, Courier, monospace;"> fprintf(stderr, "LOG: " format "\n", \</span><br />
<span style="font-family: Courier New, Courier, monospace;"> __VA_ARGS__)</span><br />
<span style="font-family: 'Courier New', Courier, monospace;">...</span><br />
<span style="font-family: 'Courier New', Courier, monospace;">LOGIFY("Found %d widgets.", widgetCount);</span><br />
<span style="font-family: 'Courier New', Courier, monospace;">...</span><br />
<span style="font-family: inherit;"><br /></span>
<span style="font-family: inherit;">However, there's a subtle problem here. What if we call LOGIFY with no variadic arguments?</span><br />
<span style="font-family: inherit;"><br /></span>
<span style="font-family: Courier New, Courier, monospace;">LOGIFY("Finished.");</span><br />
<span style="font-family: Courier New, Courier, monospace;"><br /></span>
<span style="font-family: inherit;">This expands to:</span><br />
<span style="font-family: Courier New, Courier, monospace;"><br /></span>
<span style="font-family: Courier New, Courier, monospace;">fprintf(stderr, "LOG: Finished.\n",</span><span style="font-family: 'Courier New', Courier, monospace;">);</span><br />
<span style="font-family: inherit;"><br /></span>
<span style="font-family: inherit;">And that doesn't compile, because there's a trailing comma.</span><br />
<span style="font-family: inherit;"><br /></span>
<span style="font-family: inherit;">GCC provides <a href="http://gcc.gnu.org/onlinedocs/cpp/Variadic-Macros.html" target="_blank">an extension</a> to the language which works around this problem. For that compiler, you can use token pasting to make the comma magically disappear:</span><br />
<span style="font-family: inherit;"><br /></span>
<span style="font-family: 'Courier New', Courier, monospace;">#define LOGIFY(format, ...) \</span><br />
<span style="font-family: Courier New, Courier, monospace;"> fprintf(stderr, "LOG: " format "\n", \</span><br />
<span style="font-family: Courier New, Courier, monospace;"> <b>##</b>__VA_ARGS__)</span><br />
<div>
<span style="font-family: Courier New, Courier, monospace;"><br /></span></div>
<div>
<span style="font-family: inherit;">However this is non-portable.</span></div>
<span style="font-family: inherit;"><br /></span>
<span style="font-family: inherit;"><a href="http://stackoverflow.com/questions/3576396/variadic-macros-with-0-arguments-in-c99" target="_blank">StackOverflow</a> suggests that you can make the format part of the variadic part of the macro:</span><br />
<span style="font-family: inherit;"><br /></span>
<span style="font-family: 'Courier New', Courier, monospace;">#define LOGIFY(...) \</span><br />
<div>
<span style="font-family: Courier New, Courier, monospace;"> fprintf(stderr, "LOG: " </span><span style="font-family: 'Courier New', Courier, monospace;">__VA_ARGS__)</span><br />
<span style="font-family: 'Courier New', Courier, monospace;"><br /></span>
<span style="font-family: inherit;">However this doesn't work for my example because I want to be able to past a new line onto the end of the format string. Since the format string isn't isolated from the arguments I can't append anything to it.</span><br />
<span style="font-family: inherit;"><br /></span>
An ugly work-around is to simply pass in an extra, unused argument if you have no real arguments:<br />
<span style="font-family: inherit;"><br /></span>
<span style="font-family: Courier New, Courier, monospace;">LOGIFY("Finished.", 0);</span><br />
<div>
<span style="font-family: Courier New, Courier, monospace;"><br /></span></div>
</div>
<div>
<span style="font-family: inherit;">It's always harmless to pass extra arguments to a variadic function, and this does compile, but it doesn't seem very elegant.</span></div>
<div>
<span style="font-family: inherit;"><br /></span></div>
<div>
<span style="font-family: inherit;">If the ANSI C committee ever revisits this, or if I design my own language, they (or I) could resolve this problem by permitting trailing commas in function calls (and function declarations). C already permits trailing commas in array initializers (as do Java and other C-like languages):</span></div>
<div>
<span style="font-family: inherit;"><br /></span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">int fibs[] = {1,1,2,3,5,8,13,};</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"><br /></span></div>
<div>
<span style="font-family: inherit;">This is convenient for a number of reasons, including conditional inclusion (#ifdef) and source code generation. It can also be a good habit to always include a trailing comma, especially when declaring arrays of strings. Without a trailing comma, there's a risk that a developer might add a new string to the end of the list without a comma. Since CPP does string concatenation, the results may be unexpected!</span></div>
<div>
<span style="font-family: inherit;"><br /></span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">const char * messages[] = {</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"> "No error", // 0; added in 2002</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"> "First legacy error", // </span><span style="font-family: 'Courier New', Courier, monospace;">added in</span><span style="font-family: Courier New, Courier, monospace;"> 2002</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"> "Other legacy error" // </span><span style="font-family: 'Courier New', Courier, monospace;">added in </span><span style="font-family: Courier New, Courier, monospace;">2002</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"> "Brand new error" // </span><span style="font-family: 'Courier New', Courier, monospace;">new in </span><span style="font-family: Courier New, Courier, monospace;">2013</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">};</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"><br /></span>
<span style="font-family: inherit;">The same problem can occur with numbers if sign tokens (+/-) are used.</span><br />
<span style="font-family: Courier New, Courier, monospace;"><br /></span></div>
<div>
<span style="font-family: inherit;">Why not support trailing commas in functions, too? It would be more consistent, and there are a number of not-immediately-obvious advantages. In addition to this variadic macro case, consider the case where one or more of the arguments is conditionally included:</span></div>
<div>
<span style="font-family: inherit;"><br /></span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">void</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">doWork(</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"> int howMuchWork,</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"> WorkType whatKindOfWork,</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">#ifdef THREAD_SAFE</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"> bool useLocks</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">#endif</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">)</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">...</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"><br /></span></div>
<div>
<span style="font-family: inherit;">This doesn't compile if THREAD_SAFE isn't defined because there's an unused comma after the second argument. If C permitted trailing commas here we'd avoid awkward constructs like moving the comma into the conditional code (and even that wouldn't work if each of the arguments were conditional!).</span></div>
<div>
<span style="font-family: inherit;"><br /></span></div>
<div>
<span style="font-family: inherit;">Although it's not consistent with the normal <a href="http://en.wikipedia.org/wiki/Serial_comma" target="_blank">usage of commas in prose</a> the trailing comma is surprisingly useful in programming languages, and ought to be supported more widely.</span></div>
<br />Peter Burkahttp://www.blogger.com/profile/12855319784439726546noreply@blogger.com2tag:blogger.com,1999:blog-1468479222802516939.post-61747502109624707742013-03-25T07:58:00.000-04:002013-03-25T07:58:11.927-04:00Square Root: My solution<a href="http://corner.squareup.com/2013/03/puzzle-square-root.html" target="_blank">Wouter Coekaerts's square root puzzle</a> presents a difficult challenge. We need to figure out the number he's thinking of without seeing the number. The only question we can ask is: "is it this number?" (In fact, it's a bit worse than that: when we ask that question, he just writes the answer on the console if we're right.) There are a few possible toeholds here where we could get started. I'll discuss some I rejected before I get to my actual solution.<br />
<br />
Wouter gives us the source code for the <span style="font-family: "Courier New",Courier,monospace;">answer()</span> method, so we know <i>how</i> he's testing the answer. We also know how he generated the answer: a BigInteger of approximately 10000 random bits generated from a SecureRandom.<br />
<br />
The easiest approach would be to just look at the answer. However the answer (actually the square of the answer) is stored in a private field. We can't get at it without using reflection, JNI, sun.misc.Unsafe, or something else of that nature. The challenge doesn't permit those types of attacks. What about accessing it as an inner class? Inner classes can access private fields of their enclosing classes, so could we trick the JVM into thinking our code is an inner class of square.Square? You can't do this, because inner classes are just a fiction created by the compiler. It might look like you can access private fields of your enclosing class, but the compiler actually adds hidden accessors for those fields which it calls under the covers. This approach won't work.<br />
<br />
What if we could influence the answer? Can we control what SecureRandom does? We can do this by providing a different source of entropy from the command line or by modifying /dev/random. However I'm not sure that these can be done with a security manager in place or without root access, and I think that they violate the "no security vulnerabilities" spirit of the challenge.<br />
<br />
One promising approach relies on the lack of a final modifier on BigInteger. What if we don't pass in a BigInteger for the answer, but pass in our own subclass of BigInteger? For example, could we override <span style="font-family: "Courier New",Courier,monospace;">BigInteger.equals()</span> to always return true? This looked like a promising line of attack, and I think it might actually work in some implementations. Unfortunately, the way Wouter wrote <span style="font-family: "Courier New",Courier,monospace;">answer()</span> he doesn't directly invoke any methods on the candidate object we control and <a href="http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/7-b147/java/math/BigInteger.java#BigInteger.divide%28java.math.BigInteger%29" target="_blank">OpenJDK's implementation of <span style="font-family: "Courier New",Courier,monospace;">BigInteger.divide()</span></a> doesn't call any methods on the divisor object either, so there are no opportunities there (however <a href="http://developer.classpath.org/doc/java/math/BigInteger-source.html#line.948" target="_blank">GNU Classpath's <span style="font-family: "Courier New",Courier,monospace;">Biginteger.divide()</span></a> does call methods on the divisor). BigInteger probably should have been final, but unfortunately this oversight doesn't seem to be sufficient to solve this puzzle. (If <span style="font-family: "Courier New",Courier,monospace;">answer()</span> had been implemented as <span style="font-family: "Courier New",Courier,monospace;">root.equals(n.divide(root))</span> instead of <span style="font-family: "Courier New",Courier,monospace;">n.divide(root).equals(root)</span>this solution would have worked well, because our own <span style="font-family: "Courier New",Courier,monospace;">equals()</span> would be invoked.)<br />
<br />
If we can't subclass BigInteger, can we just replace it? We can write our own implementation of BigInteger which always returns true for equals, but we'd need to replace the built-in one. This would require modifying the bootstrap classpath with <span style="font-family: "Courier New",Courier,monospace;">-Xbootclasspath/p:</span>, which I think violates the rules, or at least the spirit, of the challenge.<br />
<br />
What about a simple brute force approach? I was actually tempted to submit this. We know the upper bound of the answer (2<sup>10000</sup>), so a simple for loop ought to suffice. Assuming we can test one answer per nanosecond (highly optimistic!), <a href="http://www.wolframalpha.com/input/?i=%282^10000%29+*+1+second+%2F+1000000" target="_blank">wolframalpha estimates</a> that it would take 6.322×10<sup>2996</sup> years, or 4.6×10<sup>2986</sup> times the age of the universe. The solution is provably correct, if somewhat impractical.<br />
<br />
When I was looking at the implementation of <span style="font-family: "Courier New",Courier,monospace;">BigInteger.divide()</span> to evaluate the feasibility of the subclassing attack, I noticed that there are a few special cases at the top of the method. Consider a/b. If a = b, the result of integral division will always be 1. If a < b, the result of integral division is always 0. If a > b, the code is much more complicated. This suggested that a timing attack might be feasible: we should be able to determine if our candidate is less than Wouter's recorded number by how long it takes for the <span style="font-family: "Courier New",Courier,monospace;">answer()</span> function to complete.<br />
<br />
Note that this attack is dependent on a number of things. We know how Wouter is testing the answer: <span style="font-family: "Courier New",Courier,monospace;">n.divide(root).equals(root)</span>. If he'd done the more efficient <span style="font-family: "Courier New",Courier,monospace;">n.equals(root.times(root))</span>, instead, this attack wouldn't work (or would be much more difficult). It also relies on the JDK not doing unnecessary work for division when the dividend is less than the divisor. In Java 6 the JDK still calculates the remainder for this case (and discards it) making the attack more difficult but still feasible. In Java 7 the timing difference is more pronounced.<br />
<br />
Timing the function is also quite challenging. For one thing, JVMs use JIT compilers which may compile code multiple times, with increasing levels of optimization, as they discover what's important. Therefore it's important to give the JIT plenty of opportunity to compile the code before we start trying to time it. I did most of my initial testing using -Xint, which disables the JIT, to eliminate this variable until I was confident that the solution worked. Garbage collection can also interfere with the timing, but it can only increase the time. If we only look at minimum times, and do enough measurements, it shouldn't matter. Other programs running on the machine can interfere with our timing, as can power management features. I tried running the solution on my MacBook Pro at first, but the timing was all over the place. In the end I used an isolated Linux x86-64 Xeon X5690 (Westmere) machine and Hotspot Java 1.7.0_09_b05 to test the solution. It was much more reliable.<br />
<br />
Once the JIT is warmed up so we can get reliable timing, the next step is to calculate a base line. I do this by generating my own, random, 20000 bit number (the square of a 10000 bit number is a 20000 bit number) and testing how long it takes to divide that by number slightly lower than it. I repeat this a million times and record the minimum time it took to run my own <span style="font-family: "Courier New",Courier,monospace;">answer()</span> function, which is identical to Wouter's. Now that we have the baseline for a 'slow' divide.<br />
<br />
Someone with a better statistical background that me could probably come up with a robust way of distinguishing between 'fast' and 'slow' division. I just used a heuristic and ran the test many times. If the divide time is >= 75% of the fastest baseline time, I consider it slow. If it's <= 50% of the fastest baseline time, I consider it fast. If it's somewhere in between I run the measurement again until I get a conclusive result. I short circuit for fast times, since I assume that these must be fast due to the number being tested, but not for slow times, since they could be slow for any number of the reasons discussed above.<br />
<br />
Then I start to identify bits. I start with the highest possible bit and work my way down. Whenever I discover that the divide is fast with a particular bit set, and slow with the same bit clear, I know that we've identified the next bit in n. When the first stage of my program finishes (it takes about 80 minutes), it should have identified the highest number for which <span style="font-family: "Courier New",Courier,monospace;">answer()</span> is 'slow'. In other words, we know n-1.<br />
<br />
Obviously determining n from n-1 is trivial, but Wouter doesn't actually want us to find n: he wants the square root of n. Unfortunately, there's no built-in library function to do this. I could have implemented <a href="http://en.wikipedia.org/wiki/Newton%27s_method" target="_blank">Newton's method</a> to solve the square root, but fortunately someone else already did this for me: <a href="http://docs.guava-libraries.googlecode.com/git/javadoc/com/google/common/math/BigIntegerMath.html#sqrt%28java.math.BigInteger,%20java.math.RoundingMode%29" target="_blank">BigIntegerMath in Google's Guava</a> has a very efficient implementation that handles a lot of corner cases I probably wouldn't have bothered with. Wouter wanted the solution in a single class, and I didn't want to rely on libraries not included with the base JDK, so I copied the relevant code from Guava. Since this is an Apache licensed project, that's conveniently legal.<br />
<br />
My program runs in about 80 minutes on Java 7 on an Intel Xeon Linux server. The actual time depends on the ratio of 0 bits to 1 bits in n. 0 bits can be identified much faster than 1 bits. It may or may not work on other machines without tweaking some of the timing parameters. There's no way to tell for sure from within the program if it got the right answer, since Wouter just prints something to the console, but I do check to see if it's feasible. Most integers have irrational square roots, but we know that n is a <a href="http://en.wikipedia.org/wiki/Square_number" target="_blank">perfect square</a>. Therefore I test that the number I identified is also a perfect square. If it's not, I repeat the whole process again. A more sophisticated solution might track a confidence level for each bit and try remeasuring those, first, but it's always found the solution on the first attempt.<br />
<br />
You can find the <a href="https://gist.github.com/pburka/bd2b0584294e3b6b2d8f" target="_blank">source code at github</a>.<br />
<br />
<i>Thanks to my colleagues at Two Sigma who provided useful insights into this problem. Andrew Berman and </i><i><i>Yaron Gvili </i>suggested attacking SecureRandom, either directly or via /dev/random and /dev/urandom. Yaron also discussed the brute force approach. Isaac Dooley steered me away from mean and towards min and also explored ClassLoader based solutions. Trammell Hudson encouraged me to continue with the timing attack even when initial results were frustratingly ambiguous. Another colleague who shall remain anonymous suggested I prove that P=NP, and then implement a polynomial-time solution.</i><br />
<br />Peter Burkahttp://www.blogger.com/profile/12855319784439726546noreply@blogger.com0tag:blogger.com,1999:blog-1468479222802516939.post-28173476917343110582013-03-20T21:37:00.003-04:002013-03-20T21:37:47.430-04:00Java Puzzle: Square RootI'm working on a solution to Wouter Coekaerts's <a href="http://corner.squareup.com/2013/03/puzzle-square-root.html">square root puzzle</a>. It's a challenging problem and will test your knowledge of Java and JVMs. I'll post my solution next week, but here's a hint about the direction I'm taking: Wouter's answer() method isn't the most efficient way to test for the correct solution.Peter Burkahttp://www.blogger.com/profile/12855319784439726546noreply@blogger.com0tag:blogger.com,1999:blog-1468479222802516939.post-13731583072763699962012-10-30T15:23:00.000-04:002015-03-22T18:58:17.907-04:00Kobo surgeryA few weeks ago my <a href="http://www.kobobooks.com/wifi">Kobo Wifi e-reader</a> suffered an unfortunate accident. I got caught in a thunderstorm and the pocket of my messenger bag filled with water. (Good: my Brooks Barbican bag is waterproof. Bad: water can't get out once it's in.) It was a few hours before I noticed that my Kobo had been partly submerged for an extended period. The poor thing just displayed a plaintive "Please Charge Your eReader" message. Pressing the power button caused some lights to flash, but no activity on the screen. It wasn't completely dead, but it wasn't working either.<br />
<br />
Coincidentally, the next day my friend Roo tweeted that he'd dropped his Kobo and broken the screen. I suspected that between the two of us we probably had enough parts to build a working Kobo, so I arranged to collect his e-reader so I could try to rebuild one.<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://farm9.staticflickr.com/8326/8139108543_6a4452201e_o_d.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="242" src="https://farm9.staticflickr.com/8326/8139108543_6a4452201e_o_d.jpg" width="320" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;"><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td class="tr-caption" style="font-size: 12.8000001907349px;">Dead board (left) dead screen (right)</td><td class="tr-caption" style="font-size: 12.8000001907349px;"></td></tr>
</tbody></table>
</td></tr>
</tbody></table>
The first step was opening the cases. The Kobo cases just snap together, so you can pry them apart fairly easily. Be careful as you risk cracking the case if you bend it too much. I damaged the white case a bit but was able to get the black one off without any problem. The board is attached to the rear half of the case with four Phillips screws which are easily removed.<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://farm9.staticflickr.com/8045/8139108907_acff1aa02e_o_d.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="240" src="https://farm9.staticflickr.com/8045/8139108907_acff1aa02e_o_d.jpg" width="320" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;"><span style="font-size: 12.8000001907349px;">Naked Kobos</span></td></tr>
</tbody></table>
<br />
The Kobo is fairly simple inside. The e-ink screen is mounted on the circuit board and connected to it with a <a href="http://en.wikipedia.org/wiki/Flexible_flat_cable">flexible flat cable</a> which wraps over the right-hand side of the board to a plug on the reverse. There's a lithium-polymer battery in the lower left hand corner, and you can see the 5 buttons of the rubber navigation pad in the lower right corner.<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://farm9.staticflickr.com/8194/8139140452_d43c603f1f_o_d.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="240" src="https://farm9.staticflickr.com/8194/8139140452_d43c603f1f_o_d.jpg" width="320" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;"><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td class="tr-caption" style="font-size: 12.8000001907349px;">Removing the screen.</td></tr>
</tbody></table>
</td></tr>
</tbody></table>
It took me a while to figure out how the screen was attached. I was worried that it might be glued to to board. Fortunately, it's quite easy to remove once you know that it's only attached with four strips of double sided tape.<br />
<br />
First, unplug the flexible connector, but be careful! The plug
has a plastic clamp to hold the connector in place. These break easily
and without the clamp you'll get a poor connection. You need to disconnect this so that you can have unobstructed access to the edge of the screen.<br />
<br />
I found that a thin utility blade slipped easily between the screen and the board. Just run this around all four edges and you'll loosen the tape. The tape will re-adhere pretty quickly, but it's quite easy to pry the screen off with your fingers once you've cut through the tape.<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://farm9.staticflickr.com/8335/8139109683_55787d36e8_o_d.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="240" src="https://farm9.staticflickr.com/8335/8139109683_55787d36e8_o_d.jpg" width="320" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;"><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td class="tr-caption" style="font-size: 12.8000001907349px;">Screen detached. You can see where the four pieces of tape were.</td></tr>
</tbody></table>
</td></tr>
</tbody></table>
Once both screens were off it was a simple matter to swap them. The boards also have Micro-SD cards which are used to store your books. (This is in addition to the SD expansion slot at the top of the card. This means that if you want to expand your Kobo's capacity you could probably easily replace the 2GB Micro-SD card with a larger one.) I swapped the Micro-SD cards, too. <br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://farm9.staticflickr.com/8326/8139109985_144d9e1cb3_o_d.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="320" src="https://farm9.staticflickr.com/8326/8139109985_144d9e1cb3_o_d.jpg" width="240" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;"><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td class="tr-caption" style="font-size: 12.8000001907349px;">It's alive!</td></tr>
</tbody></table>
</td></tr>
</tbody></table>
While it was open, my friend <a href="http://www.nycresistor.com/author/thudson/">Trammell</a> pointed out that there were some connectors near the navigation pad which looked suspiciously like a serial port. (They were marked Tx and Rx, which was a bit of a give-away!) We connected it to a terminal emulator and were able to watch the Linux kernel boot as the Kobo powered on. It seemed a shame to hide that in the case, so before I put the case back on Trammell helped me make a few small modifications. We added a small hole to the case and soldered a connector onto the serial port.<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://farm9.staticflickr.com/8184/8139319045_b1b8444601_o_d.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="240" src="https://farm9.staticflickr.com/8184/8139319045_b1b8444601_o_d.jpg" width="320" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;"><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td class="tr-caption" style="font-size: 12.8000001907349px;">Serial connector. From top to bottom: V, Tx, Rx</td></tr>
</tbody></table>
</td></tr>
</tbody></table>
I haven't played much with the serial port yet, but I expect it might expose some interesting opportunities.<br />
<br />
After swapping the Micro-SD cards the new Kobo showed all of my books in my library, but would only let me read some of them. This suggests that the DRM scheme is tied to some serial number on the device, and also that not all books are protected by DRM. I did a factory reset on the Kobo, connected it to my laptop and resynced all of my books. This made all the books readable again.<br />
<br />Peter Burkahttp://www.blogger.com/profile/12855319784439726546noreply@blogger.com25tag:blogger.com,1999:blog-1468479222802516939.post-65929543944459232072012-09-23T14:55:00.001-04:002012-09-23T15:03:56.877-04:00A Modest Proposal<i>For Minimizing the Obstruction of Bike Lanes in New York, and for Making the Department of Police Beneficial to the Publick.</i><br />
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="http://4.bp.blogspot.com/-5ZYTOD9u-lo/UF9RnTTg59I/AAAAAAAAABk/BCJrikr7X6E/s1600/NYPD_at_work.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img alt="NYPD at work" border="0" height="240" src="http://4.bp.blogspot.com/-5ZYTOD9u-lo/UF9RnTTg59I/AAAAAAAAABk/BCJrikr7X6E/s320/NYPD_at_work.jpg" title="" width="320" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">NYPD tow truck parked at 6th Ave and 32nd St</td></tr>
</tbody></table>
This morning at about 10:30 I encountered this oversized New York Police Department tow truck straddling the bike lane on 6th Avenue at 32nd Street. While I was watching, one of the officers returned to the truck with coffees, but, even caffeinated, they remained firmly in place, ignoring the cyclists (photographed) who had to detour into motor traffic to pass them. Particularly irksome is that they've pulled half way into the bike lane, but they're still obstructing a full lane of motor traffic. Instead of just blocking one lane, they're blocking two. (Also note that one of the cyclists photographed is going the wrong way, a.k.a. <a href="http://bikesnobnyc.blogspot.com/2009/05/smoked-salmon-lock-your-bike-dont-lox.html">salmoning</a>)<br />
<br />
Taxis, town cars, delivery trucks, private cars and cops regularly block New York's otherwise excellent network of bike lanes (as, apparently, do <a href="http://www.brooklynpaper.com/stories/35/38/dtg_bikesvsvendors_2012_09_21_bk.html">fruit and vegetable carts</a>). Many of the lanes are segregated, making it more challenging for drivers to block them, but many others, like the one pictured, are just painted and we rely on drivers' respect for the law and fellow road users to keep them clear. Like in <a href="http://www.theglobeandmail.com/globe-drive/car-life/why-is-it-so-difficult-for-motorists-to-steer-clear-of-bike-lanes/article4551259/">other cities</a>, this is a bit of wishful thinking, and given the poor example set by law enforcement it's not surprising that other drivers treat these lanes as short-term parking.<br />
<br />
On Friday, over beers with some cow-orkers, I formulated a proposal which I think might be a pragmatic compromise to improve access to unobstructed bike lanes. First, we must recognize that we're not going to change the behavior of the police. So, instead of just complaining and building up resentment (as I've done above), let's work with them.<br />
<br />
I propose that all unsegregated bike lanes in New York City be redesignated as police parking lanes. Let's change the bylaws so that these lanes are reserved for use by law enforcement, with an exception allowing cyclists to use them when they're not required for urgent police business (like the morning coffee run). We'll paint NYPD logos in the lanes alongside the cycling icons.<br />
<br />
My theory is that the NYPD will be much more aggressive about ticketing motor vehicles obstructing police parking lanes. Those are their lanes! Other drivers aren't going to mess with the NYPD's parking lanes.<br />
<br />
For cyclists, this Faustian bargain could significantly improve access to the approximately 1% of New York City's paved road surface currently designated as bike lanes. Of course we'd still have to swerve around parked cop cars, but at least the number of SUVs, FedEx trucks and taxis in the lanes would be lower.Peter Burkahttp://www.blogger.com/profile/12855319784439726546noreply@blogger.com2tag:blogger.com,1999:blog-1468479222802516939.post-25345463829266320372012-05-07T21:58:00.002-04:002012-05-07T22:00:20.714-04:00In praise of idlenessC has some odd baggage in its APIs, but one API which I think ought to be emulated more often is <span style="font-family: "Courier New",Courier,monospace;">free()</span>. Specifically, <span style="font-family: "Courier New",Courier,monospace;">free(NULL)</span>.<br />
<br />
Many programmers don't realize that you can pass a NULL pointer to <span style="font-family: "Courier New",Courier,monospace;">free()</span> and that this has no effect. This isn't undefined behaviour. It's specified to work like that: <a href="http://pubs.opengroup.org/onlinepubs/009695399/functions/free.html">If <i>ptr</i> is a null pointer, no action shall occur.</a><br />
<br />
This makes cleanup code simpler. Instead of<br />
<span style="font-family: "Courier New",Courier,monospace;"> if (ptr != NULL) free(ptr);</span><br />
you can just use<br />
<span style="font-family: "Courier New",Courier,monospace;"> free(ptr);</span><br />
<br />
This has a number of advantages.<br />
<br />
First of all, it reduces the amount of code you need to write for mundane housekeeping tasks.<br />
<br />
Secondly, it encourages good habits (freeing memory when you're done with it) by not punishing programmers for using it. If <span style="font-family: "Courier New",Courier,monospace;">free()</span> crashed when called with a NULL pointer (or worse, corrupted memory) that would discourage programmers from using it. While it doesn't really reward you for writing good code, at least it doesn't kick you in the shin.<br />
<br />
Finally, it's consistent with an unfortunate misfeature of its companion, <span style="font-family: "Courier New",Courier,monospace;">malloc()</span>. <a href="http://pubs.opengroup.org/onlinepubs/009695399/functions/malloc.html"><span style="font-family: "Courier New",Courier,monospace;">malloc(0)</span> is permitted to return NULL</a>. At least you can always pass the result of <span style="font-family: "Courier New",Courier,monospace;">malloc()</span> to <span style="font-family: "Courier New",Courier,monospace;">free()</span>.<br />
<br />
Unfortunately, many other common APIs don't follow <span style="font-family: "Courier New",Courier,monospace;">free()</span>'s good example. <span style="font-family: "Courier New",Courier,monospace;">close(-1)</span> and <span style="font-family: "Courier New",Courier,monospace;">pthread_mutex_destroy(NULL)</span> both invoke undefined behaviour, for example. (And don't get me started about zero as a legal file descriptor!)<br />
<br />
Whenever I'm designing my own APIs which include destructor-style functions, I always try to make sure that they quietly ignore NULL. It just makes life simpler for users.Peter Burkahttp://www.blogger.com/profile/12855319784439726546noreply@blogger.com3tag:blogger.com,1999:blog-1468479222802516939.post-41283951254818760072012-04-10T23:08:00.000-04:002012-04-10T23:16:50.689-04:00Frickin' lasers!My apartment in New York is great. It's close to Central Park, close to the subway and close to Momofuku Milk Bar. But it's pretty small. Especially the kitchen. My kitchen drawers are only about six inches wide. Finding a cutlery organizer which fits in the drawer is pretty well impossible. So what to do? Build one!<br />
<br />
Fortunately, one of my colleagues has access to a laser cutter at <a href="http://www.nycresistor.com/">NYC Resistor</a>. He helped me build a completely custom cutlery organizer which fits both my drawer and my cutlery perfectly.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="http://4.bp.blogspot.com/-gr7aoodYPNQ/T4TzYpziUhI/AAAAAAAAAAM/Nqo9VaIyTAo/s1600/IMG_4706.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="320" src="http://4.bp.blogspot.com/-gr7aoodYPNQ/T4TzYpziUhI/AAAAAAAAAAM/Nqo9VaIyTAo/s320/IMG_4706.jpg" width="240" /></a></div>
Step 1: build a prototype from cardboard to check the dimensions and functionality.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="http://1.bp.blogspot.com/-LANa0VowypM/T4T0fHDZAwI/AAAAAAAAAA0/fTAY_4ZvxZg/s1600/cutlery_3mm_with_bitmaps.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="163" src="http://1.bp.blogspot.com/-LANa0VowypM/T4T0fHDZAwI/AAAAAAAAAA0/fTAY_4ZvxZg/s320/cutlery_3mm_with_bitmaps.png" width="320" /></a></div>
<br />
Step 2: Using InkScape, design all of the pieces (with raster images which are etched with the laser at low power).<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="http://4.bp.blogspot.com/-fVvvCms5qJM/T4Tz3xpKDcI/AAAAAAAAAAk/tD4aLxfPNXw/s1600/IMG_4708.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="240" src="http://4.bp.blogspot.com/-fVvvCms5qJM/T4Tz3xpKDcI/AAAAAAAAAAk/tD4aLxfPNXw/s320/IMG_4708.JPG" width="320" /></a></div>
Step 3: Bring on the laser!<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="http://1.bp.blogspot.com/-IYsMZ56OMGE/T4Tz78cBbSI/AAAAAAAAAAs/GNsdPluDHwA/s1600/IMG_4713.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="320" src="http://1.bp.blogspot.com/-IYsMZ56OMGE/T4Tz78cBbSI/AAAAAAAAAAs/GNsdPluDHwA/s320/IMG_4713.JPG" width="240" /></a></div>
Step 4: Prepare all the pieces<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="http://4.bp.blogspot.com/-SUEiX9PeTRA/T4TzuK1G7TI/AAAAAAAAAAU/zykkonvH-GY/s1600/IMG_4714.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="320" src="http://4.bp.blogspot.com/-SUEiX9PeTRA/T4TzuK1G7TI/AAAAAAAAAAU/zykkonvH-GY/s320/IMG_4714.JPG" width="240" /></a></div>
<br />
Step 5: Put it all together!<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="http://2.bp.blogspot.com/-vwxkf1Z3D5Q/T4Tzx4vpjTI/AAAAAAAAAAc/F4HAmzuksJo/s1600/IMG_4717.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="320" src="http://2.bp.blogspot.com/-vwxkf1Z3D5Q/T4Tzx4vpjTI/AAAAAAAAAAc/F4HAmzuksJo/s320/IMG_4717.JPG" width="240" /></a></div>
<br />
Step 6: A perfect fit!<br />
<br />
Next I'm going to build a small tray to sit on top of this one, giving me a bit of additional storage space for little utensils.<br />
<br />Peter Burkahttp://www.blogger.com/profile/12855319784439726546noreply@blogger.com0tag:blogger.com,1999:blog-1468479222802516939.post-32399027725999689452012-03-25T16:07:00.000-04:002012-03-25T16:07:46.282-04:00C99 designated initializersOne of the very nice features added in C99 is the designated initializer. This allows you to write code like the following to initialize a structure:<br />
<br />
<div style="font-family: "Courier New",Courier,monospace;"> div_t d = { .quot=3, .rem=2 };</div><br />
In C89 there was no way to reliably initialize a structure like this. The specification says that the <span style="font-family: "Courier New",Courier,monospace;">quot</span> and <span style="font-family: "Courier New",Courier,monospace;">rem</span> members may be in any order, so if you write:<br />
<br />
<div style="font-family: "Courier New",Courier,monospace;"> div_t d = { 3, 2 };</div><br />
you can't be sure which member will be 3 and which will be 2.<br />
<br />
In general I'm enamored with designated initializers. But I've run into an unfortunate case where the specification is somewhat ambiguous.<br />
<br />
Paragraph §6.7.8.19 of the C99 standard (draft version available for free <a href="http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf">here</a>) has this to say about the ordering of designated initializers:<br />
<br />
<ol start="18" style="list-style-type: none;"><li><span style="font-family: 'Times'; font-size: 12pt;">The initialization shall occur in initializer list order, each initializer provided for a particular subobject overriding any previously listed initializer for the same subobject;<sup>130</sup> all subobjects that are not initialized explicitly shall be initialized implicitly the same as objects that have static storage duration. </span><br />
</li>
</ol>(Footnote 130 reads "<i>Any initializer for the subobject which is overridden and so not used to initialize that subobject might not be evaluated at all.</i>")<br />
<br />
The spec says that "initialization shall occur in initializer list order". To me, this suggests that one initializer can safely rely on the result of a previous initializer, e.g.:<br />
<br />
<span style="font-family: "Courier New",Courier,monospace;"> div_t d = { .quot=42, .rem=d.quot };</span><br />
<br />
So the following program ought to only invoke well defined behaviour, right?<br />
<br />
<span style="font-family: "Courier New",Courier,monospace;">#include <stdio.h></span><br />
<span style="font-family: "Courier New",Courier,monospace;">#include <stdlib.h></span><br />
<br />
<span style="font-family: "Courier New",Courier,monospace;">int main(void) {</span><br />
<span style="font-family: "Courier New",Courier,monospace;"> div_t d1 = { .quot=1, .rem=2, };</span><br />
<span style="font-family: "Courier New",Courier,monospace;"> printf("d1: quot=%i, rem=%i\n", d1.quot, d1.rem);</span><br />
<br />
<span style="font-family: "Courier New",Courier,monospace;"> div_t d2 = { .quot=1, .rem=d2.quot };</span><br />
<span style="font-family: "Courier New",Courier,monospace;"> printf("d2: quot=%i, rem=%i\n", d2.quot, d2.rem);</span><br />
<br />
<span style="font-family: "Courier New",Courier,monospace;"> div_t d3 = { .rem=2, .quot=d3.rem };</span><br />
<span style="font-family: "Courier New",Courier,monospace;"> printf("d3: quot=%i, rem=%i\n", d3.quot, d3.rem);</span><br />
<br />
<span style="font-family: "Courier New",Courier,monospace;"> return 0;</span><br />
<span style="font-family: "Courier New",Courier,monospace;">}</span><br />
<br />
Let's compile it and see what happens:<br />
<ol start="18" style="list-style-type: none;"><li><span style="font-family: "Courier New",Courier,monospace;">$ gcc -c99 -O3 -Wall -Wextra foo.c</span><br />
<span style="font-family: "Courier New",Courier,monospace;">$ ./a.out</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">d1: quot=1, rem=2</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">d2: quot=1, rem=1</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">d3: quot=2, rem=2</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"></span><br />
</li>
</ol>Excellent! That exactly what I would expect to happen. The initializers are run in order, allowing the second designated initializer to depend on the result of the first.<br />
<br />
Now here's where things start to get weird:<br />
<ol start="18" style="list-style-type: none;"><li><span style="font-family: "Courier New",Courier,monospace;">$ gcc -c99 -O0 -Wall -Wextra foo.c</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"> $ ./a.out</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"> d1: quot=1, rem=2</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"> d2: quot=1, rem=0</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"> d3: quot=0, rem=2</span></li>
</ol><br />
If we turn off optimization (-O0) the results suddenly change! Even worse, I've compiled with maximum warnings (-Wall -Wextra) and GCC doesn't even issue a warning about using an uninitialized variable!<br />
<br />
How can we reconcile this behaviour with the specification? I think that we need to take paragraph §6.7.8.23 into account, as well:<br />
<br />
<ol start="18" style="list-style-type: none;"><li><span style="font-family: 'Times'; font-size: 12pt;">The order in which any side effects occur among the initialization list expressions is unspecified.<sup>131</sup></span></li>
</ol><br />
(Footnote 131 reads "<i>In particular, the evaluation order need not be the same as the order of subobject initialization.</i>")<br />
<br />
This suggests that evaluation and initialization are two separate steps: an implementation may evaluate all of the initialization expressions (in any order), record the results in temporary storage, and then apply them all in order. If you inspect the generated assembly code this does seem to match what happens in GCC at -O0.<br />
<br />
So why have paragraph 19 at all? I don't see how you could write a C program which can observe the initialization order, except for the special case of one designated initializer overriding another. If that is its only purpose the specification could certainly be more explicit about it. (It's also possible that the specification has actually been clarified in this respect; I don't have access to the final version.)<br />
<br />
I've tried this test case with a handful of different compilers and get similar results. However I would be curious to hear about results with other C99 compilers.Peter Burkahttp://www.blogger.com/profile/15290253671442020919noreply@blogger.com4tag:blogger.com,1999:blog-1468479222802516939.post-56838695436745771132012-03-25T12:13:00.000-04:002012-03-25T12:13:04.244-04:00UpdateIt's been a while since I updated this blog. I've been a bit busy, but new articles will start appearing shortly. Since my last post, I have left IBM Canada and joined Two Sigma Investments in New York. I'm no longer developing virtual machines, but the name and scope of the blog will remain the same. I'm still doing low-level software development and I'm learning a lot about areas I haven't investigated in depth before, and from my new colleagues.Peter Burkahttp://www.blogger.com/profile/15290253671442020919noreply@blogger.com0tag:blogger.com,1999:blog-1468479222802516939.post-54789015218900622882011-09-20T11:43:00.000-04:002011-09-20T11:43:53.965-04:00IBM Java 7 is now availableIBM Java 7 was officially released yesterday, September 19. You can download it from <a href="https://www.ibm.com/developerworks/java/jdk/">IBM DeveloperWorks</a>.<br />
<br />
There are a lot of exciting new features including a number of GC improvements.<br />
<br />
<ul><li>The balanced GC policy, which I've mentioned before, is included in all 64-bit Java 7 JDKs. You can enable it with <span style="font-family: "Courier New",Courier,monospace;">-Xgcpolicy:balanced</span>.</li>
<li>The soft-realtime garbage collector is included for evaluation on Linux and AIX. It can be enabled with <span style="font-family: "Courier New",Courier,monospace;">-Xgcpolicy:metronome</span>. </li>
<li>The verbose GC format has been completely overhauled. It now provides more information and the XML format has been redesigned to make machine interpretation of the data simpler, allowing both IBM and customers to write tools to process and analyse the data.</li>
</ul>Peter Burkahttp://www.blogger.com/profile/15290253671442020919noreply@blogger.com0tag:blogger.com,1999:blog-1468479222802516939.post-1120621188979225802011-09-02T17:12:00.000-04:002011-09-02T17:12:56.832-04:00"Don't do what Donny Don't does"Thank to <a href="http://twitter.com/#%21/evan_c_hughes">Evan Hughes</a> for pointing out this paper: <span id="goog_107718949"></span><span id="goog_107718950"></span><a href="http://www.sciencedirect.com/science/article/pii/S002073738880052X">Conditional statements, looping constructs, and program comprehension: an experimental study.</a><br />
<br />
Not surprisingly, negative conditions are more difficult to understand than positive conditions.<br />
<br />
I always try to write conditions to be positive. Sometimes, I'll even include an empty 'if' block so that I can put code in the 'else' block instead of using a negative condition:<br />
<br />
<div style="font-family: "Courier New",Courier,monospace;"> if (isInRange(value)) {</div><div style="font-family: "Courier New",Courier,monospace;"> // expected case; do nothing</div><div style="font-family: "Courier New",Courier,monospace;"> } else {</div><div style="font-family: "Courier New",Courier,monospace;"> throw new OutOfRangeException(value);</div><div style="font-family: "Courier New",Courier,monospace;"> }</div><br />
I haven't read the full paper, so maybe the researchers answered my next question: is the problem exacerbated by the syntax for negative conditions used in C-like languages? I find that the '!' operator uses very little horizontal space, making it less noticeable than other unary operators such as '~' or '*'.<br />
<br />
e.g. would this statement:<br />
<br />
<div style="font-family: "Courier New",Courier,monospace;"> if (!isInRange(value)) { ...</div><br />
be more obvious if it were written like this?<br />
<br />
<div style="font-family: "Courier New",Courier,monospace;"> if (not isInRange(value)) { ...</div>Peter Burkahttp://www.blogger.com/profile/15290253671442020919noreply@blogger.com3tag:blogger.com,1999:blog-1468479222802516939.post-30459666390422188892011-08-04T10:21:00.000-04:002011-08-04T10:21:27.905-04:00Balanced garbage collectionWe've just published an article on the IBM DeveloperWorks website describing the new garbage collection technology available in IBM Java 6 2.6 and IBM Java 7. This is a project I've been working on for several years and we're pretty excited that customers can now try it out for themselves.<br />
<br />
You can read <a href="http://www.ibm.com/developerworks/websphere/techjournal/1108_sciampacone/1108_sciampacone.html">the article</a> for yourself.Peter Burkahttp://www.blogger.com/profile/15290253671442020919noreply@blogger.com0tag:blogger.com,1999:blog-1468479222802516939.post-73246609132654976542011-07-15T21:59:00.000-04:002011-07-15T21:59:19.771-04:00Reachability follow up<span style="font-family: inherit;">We've been having quite an interesting series of conversations at work about this reachability problem. Today, one of my co-workers, Dan Heidinga, pointed out that Microsoft's CLR has the same issue. For CLR Microsoft has provided a special static method called GC.KeepAlive(Object). This acts as a hint to the virtual machine and JIT to extend an object's lifetime, but is otherwise a no-op. There's a good article about the problem on an old MSDN blog <a href="http://blogs.msdn.com/b/cbrumme/archive/2003/04/19/51365.aspx">here</a>. Note that the author considers and rejects the option of automatically extending the lifetime of all function arguments to the end of their functions on the basis that it impacts codegen and therefore performance.</span>Peter Burkahttp://www.blogger.com/profile/15290253671442020919noreply@blogger.com0tag:blogger.com,1999:blog-1468479222802516939.post-75494145127533819792011-07-10T14:45:00.000-04:002011-07-10T14:45:24.155-04:00A subtle issue of reachabilityIn the last few weeks I've run into two similar and very subtle problems in Java code. In some ways, these seem to illustrate an oversight in the design the Java language and/or virtual machine. The problem has to do with objects being collected earlier than expected.<br />
<br />
In one case a finalizer was run and in the other a PhantomReference was cleared. I'll describe an example based on finalization. It's easy enough to see how this could also apply to reference objects.<br />
<br />
Consider a class like this:<br />
<br />
<div style="font-family: "Courier New",Courier,monospace;">public class Foo {</div><div style="font-family: "Courier New",Courier,monospace;"> private byte[] array = new byte[] { 1, 2, 3 };</div><div style="font-family: "Courier New",Courier,monospace;"><br />
</div><div style="font-family: "Courier New",Courier,monospace;"> public void finalize() { </div><div style="font-family: "Courier New",Courier,monospace;"> array[0] = array[1] = </div><div style="font-family: "Courier New",Courier,monospace;"> array[2] = 0; </div><div style="font-family: "Courier New",Courier,monospace;"> }</div><div style="font-family: "Courier New",Courier,monospace;"><br />
</div><div style="font-family: "Courier New",Courier,monospace;"> public byte[] getData() { return array.clone(); }</div><span style="font-family: "Courier New",Courier,monospace;">}</span><br />
<br />
<div style="font-family: inherit;">This class is able to return a copy of its data array. (This is a common pattern since it prevents the caller from modifying the master copy of the array). When instances of this class are garbage collected they wipe out the data in the array, overwriting it with zeros. (Let's ignore why it does this; it's just an example!) So far so good.</div><div style="font-family: inherit;"><br />
</div><div style="font-family: inherit;">What result will we get if we invoke getData() on an instance of Foo?</div><br />
<div style="font-family: "Courier New",Courier,monospace;"> Foo f = new Foo();</div><div style="font-family: "Courier New",Courier,monospace;"> byte[] array = f.getData();</div><div style="font-family: "Courier New",Courier,monospace;"> System.out.println("array={" + </div><div style="font-family: "Courier New",Courier,monospace;"> array[0] + ", " + </div><div style="font-family: "Courier New",Courier,monospace;"> array[1] + ", " + </div><div style="font-family: "Courier New",Courier,monospace;"> array[2] + "}");</div><br />
<div style="font-family: inherit;">Intuitively, we expect this to print "array={1, 2, 3}". And it usually does. But is it legitimate for it to print "array={0, 0, 0}" (or even "array={1, 2, 0}")? If it did, that would mean that the object was finalized while we were still using it, wouldn't it?</div><div style="font-family: inherit;"><br />
</div><div style="font-family: inherit;">Actually, that can happen. It happens quite often in the IBM Java VM, and seems to happen occasionally in Oracle HotSpot, too, but less frequently.</div><div style="font-family: inherit;"><br />
</div><div style="font-family: inherit;">The Java VM is permitted to collect (or finalize) an object when it is no longer reachable. But how could the object become unreachable when we're running the <span style="font-family: "Courier New",Courier,monospace;">getData()</span> function? It's easier to understand if you imagine the function in-lined in the caller, and broken up into individual statements:</div><div style="font-family: inherit;"><br />
</div><div style="font-family: "Courier New",Courier,monospace;"> Foo f = new Foo();</div><div style="font-family: "Courier New",Courier,monospace;"> byte[] masterArray = f.array; // ignore that array<br />
// is private</div><div style="font-family: "Courier New",Courier,monospace;"> // what if a garbage collection happens here?<br />
// e.g. System.gc(); System.runFinalization(); </div><div style="font-family: "Courier New",Courier,monospace;"> byte[] copyArray = masterArray.clone();</div><div style="font-family: "Courier New",Courier,monospace;"> System.out.println("array={" + </div><div style="font-family: "Courier New",Courier,monospace;"> copyArray[0] + ", " + </div><div style="font-family: "Courier New",Courier,monospace;"> copyArray[1] + ", " + </div><div style="font-family: "Courier New",Courier,monospace;"> copyArray[2] + "}");</div><div style="font-family: "Courier New",Courier,monospace;"><br />
</div><div style="font-family: inherit;">Here we can see that if the garbage collector interrupts the program at just the right (wrong?) time, the <span style="font-family: "Courier New",Courier,monospace;">finalize()</span> function might run <i>before</i> we clone the array. Even though we don't explicitly assign null to f, a clever VM can analyze the program and determine that f is never used again. It can reclaim the memory for that object and, in this case, finalize it, before the <span style="font-family: "Courier New",Courier,monospace;">clone()</span> function runs. In most cases this is exactly what you want the VM to do: garbage collect objects as early as possible to recover as much memory as possible.<br />
<br />
Ok, but is that really the same thing? Surely, the receiver of a function is kept alive until the function returns, right? In-lining the function isn't quite the same!</div><div style="font-family: inherit;"><span style="font-family: inherit;"><br />
</span></div><div style="font-family: inherit;"><span style="font-family: inherit;">Actually, neither the Java language specification nor the Java Virtual Machine specification say anything about that. In the VM, the receiver of a function (i.e. <span style="font-family: "Courier New",Courier,monospace;">this</span>) isn't very special at all. It's just the first argument of a virtual function. Although the language doesn't allow it (keep in mind that the Java language and the Java VM have separate specifications) you can overwrite the receiver just as you can a local variable if you're writing bytecodes directly without the aid of javac:</span></div><div style="font-family: inherit;"><span style="font-family: inherit;"><br />
</span></div><div style="font-family: "Courier New",Courier,monospace;">byte[] getData() {</div><div style="font-family: "Courier New",Courier,monospace;"> byte[] masterArray = this.array;</div><div style="font-family: "Courier New",Courier,monospace;"> this = null; // not legal in Java language,<br />
// but is legal in class files!</div><div style="font-family: "Courier New",Courier,monospace;"> return masterArray.clone();</div><div style="font-family: "Courier New",Courier,monospace;">}</div><div style="font-family: "Courier New",Courier,monospace;"><span style="font-family: inherit;"><br />
</span></div><div style="font-family: inherit;"><span style="font-family: inherit;">javac won't compile this, but the JVM's class file verifier won't report any problems in this function.</span></div><div style="font-family: "Courier New",Courier,monospace;"><span style="font-family: inherit;"><br />
</span></div><div style="font-family: inherit;"><span style="font-family: inherit;">So, what's the right way to write your Java code so that your objects won't be finalized or collected earlier than expected? Unfortunately, I don't know the answer. You could add an extra reference to the receiver, like this:</span></div><div style="font-family: "Courier New",Courier,monospace;"><span style="font-family: inherit;"><br />
</span></div><div style="font-family: "Courier New",Courier,monospace;">byte[] getData() {</div><div style="font-family: "Courier New",Courier,monospace;"> byte[] result = array.clone();</div><div style="font-family: "Courier New",Courier,monospace;"> this.array = this.array; </div><div style="font-family: "Courier New",Courier,monospace;"> return result;</div><div style="font-family: "Courier New",Courier,monospace;">}</div><div style="font-family: "Courier New",Courier,monospace;"><br />
</div><div style="font-family: inherit;"><span style="font-family: inherit;">But that's a hack, not a real solution, and is unlikely to work reliably. The VM can easily determine that the dummy "</span><span style="font-family: "Courier New",Courier,monospace;">this.array = this.array</span><span style="font-family: inherit;"><span style="font-family: inherit;">" statement has no effect and can be removed, leaving us exactly where we started.</span></span><br />
</div><div style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;">Perhaps Java needs a new keyword like this:</span></span></div><div style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"> </span></span></div><div style="font-family: "Courier New",Courier,monospace;"><br />
byte[] getData() {</div><div style="font-family: "Courier New",Courier,monospace;"> keep_alive(this) {</div><div style="font-family: "Courier New",Courier,monospace;"> return array.clone();</div><div style="font-family: "Courier New",Courier,monospace;"> }</div><div style="font-family: "Courier New",Courier,monospace;">}</div><div style="font-family: "Courier New",Courier,monospace;"><br />
</div><div style="font-family: inherit;"><span style="font-family: inherit;">However I doubt that something like that would be used correctly very often.</span></div><div style="font-family: inherit;"><br />
</div><div style="font-family: inherit;"><span style="font-family: inherit;">Unfortunately, the best advice is probably to avoid finalization whenever possible. </span></div>Peter Burkahttp://www.blogger.com/profile/15290253671442020919noreply@blogger.com5tag:blogger.com,1999:blog-1468479222802516939.post-75228101222158006272011-03-12T19:21:00.002-05:002011-03-12T19:21:00.048-05:00Proportionality of incremental changesA few weeks ago I was watching a presentation on <a href="http://openjdk.java.net/projects/lambda/">Project Lambda</a>. The proposed syntax is generally quite nice. It's very simple, and avoids a lot of the noise which pollutes typical Java programs using inner classes. But one of the syntax examples didn't sit right with me, and it took me a few days to figure out exactly why.<br />
<br />
The particular example isn't important, and I don't think it's currently in the public proposal. But what is important, I realized, is that code should be written to facilitate incremental changes, and this interfered with that.<br />
<br />
What do I mean?<br />
<br />
Here's an example: one of the rules in our coding guidelines is that control blocks must always use curly brackets. That is, don't write code like this:<br />
<div style="font-family: "Courier New",Courier,monospace;"><br />
</div><div style="font-family: "Courier New",Courier,monospace;">if (condition)</div><div style="font-family: "Courier New",Courier,monospace;"> doSomething();</div><div style="font-family: "Courier New",Courier,monospace;">else</div><div style="font-family: "Courier New",Courier,monospace;"> doSomethingElse();</div><br />
We require that this code be written like this, instead:<br />
<br />
<br />
<div style="font-family: "Courier New",Courier,monospace;">if (condition) {</div><div style="font-family: "Courier New",Courier,monospace;"> doSomething();</div><div style="font-family: "Courier New",Courier,monospace;">} else {</div><div style="font-family: "Courier New",Courier,monospace;"> doSomethingElse();</div><div style="font-family: "Courier New",Courier,monospace;">}</div><br />
Here's why I don't like the syntax without the curly brackets (at least one of the reasons): it makes it harder to modify the code.<br />
<br />
If you want to add an extra line inside the else block, you need to add the line and convert the single statement block into a compound block:<br />
<br />
<br />
<div style="font-family: "Courier New",Courier,monospace;">if (condition)</div><div style="font-family: "Courier New",Courier,monospace;"> doSomething();</div><div style="font-family: "Courier New",Courier,monospace;">else <b>{</b></div><div style="font-family: "Courier New",Courier,monospace;"><b> fprintf(stderr, "Debugging doSomethingElse\n");</b></div><div style="font-family: "Courier New",Courier,monospace;"> doSomethingElse();</div><div style="font-family: "Courier New",Courier,monospace;"><b>}</b></div><br />
To add one line of code I need to modify three lines! This is a disproportionate amount of work considering the actual change.<br />
<br />
In summary, code should be written in a way which encourages incremental changes. If small logical changes require large textual changes, then there's something wrong, either with the tools or the technique.Peter Burkahttp://www.blogger.com/profile/15290253671442020919noreply@blogger.com3tag:blogger.com,1999:blog-1468479222802516939.post-548413137700874252011-03-10T17:35:00.001-05:002012-03-25T16:18:52.406-04:00Overloading and varargsI learned something important today: don't overload a <a href="http://en.wikipedia.org/wiki/Variadic_function">variadic function</a>.<br />
<br />
It's very tempting to have functions like this in your C++ class:<br />
<br />
<div style="font-family: "Courier New",Courier,monospace;">class Foo {</div><div style="font-family: "Courier New",Courier,monospace;">public:</div><div style="font-family: "Courier New",Courier,monospace;"> void write(const char* format, ...);</div><div style="font-family: "Courier New",Courier,monospace;"> void write(const chat* format, va_list args);</div><div style="font-family: "Courier New",Courier,monospace;">}</div><br />
The variadic function would be implemented like this:<br />
<br />
<div style="font-family: "Courier New",Courier,monospace;">void Foo::write(const char* format, ...) {</div><div style="font-family: "Courier New",Courier,monospace;"> va_list args;</div><div style="font-family: "Courier New",Courier,monospace;"> va_start(args, format);</div><div style="font-family: "Courier New",Courier,monospace;"> write(format, args);</div><div style="font-family: "Courier New",Courier,monospace;"> va_end(args);</div><div style="font-family: "Courier New",Courier,monospace;">}</div><br />
It looks nice and clean -- a perfect application of overloading.<br />
<br />
However there's a subtle problem! The C++ compiler will always try to resolve the method with the most specific signature when it encounters an overloaded call.<br />
<br />
Calls like this are fine:<br />
<br />
<div style="font-family: "Courier New",Courier,monospace;">foo->write("%s\n", "Hello");</div><div style="font-family: "Courier New",Courier,monospace;">foo->write("...world");</div><br />
But what if you have an argument which looks like a va_list? For example, if va_list is a pointer on your platform, what does this line do?<br />
<br />
<div style="font-family: "Courier New",Courier,monospace;">foo->write("How many chickens? Answer: %d\n", 0);</div><br />
In C++ 0 isn't just a number. It's also the NULL pointer. The compiler will decide that you're actually calling the non-variadic function and will use NULL for the va_list argument. Then your program will crash when you try to read an int argument from a NULL va_list.<br />
<br />
Lesson learned. Don't overload functions if one of the functions is variadic.<br />
<br />
Now I have to go back and fix some code I wrote yesterday...Peter Burkahttp://www.blogger.com/profile/15290253671442020919noreply@blogger.com0tag:blogger.com,1999:blog-1468479222802516939.post-11679661436285062972011-02-16T19:15:00.000-05:002011-02-16T19:15:33.231-05:00Groan<div class="separator" style="clear: both; text-align: center;"><a href="http://www.google.ca/search?q=anagram"><img border="0" src="http://3.bp.blogspot.com/-Hb55VaSLKvw/TVxoPbDGEtI/AAAAAAAAABA/h-Qh7MHJrEI/s1600/Screen+shot+2011-02-16+at+7.12.07+PM.png" /></a></div>Peter Burkahttp://www.blogger.com/profile/15290253671442020919noreply@blogger.com0tag:blogger.com,1999:blog-1468479222802516939.post-23169312686146464042010-12-31T18:55:00.002-05:002010-12-31T20:01:33.241-05:00Die Klasse NamenI just noticed that the IBM JDK class libraries include these klasses:<br />
<ul><li>com.ibm.security.util.DerInputStream and</li>
<li> com.ibm.security.util.DerValue.</li>
</ul><br />
<span class="short_text" id="result_box" lang="de"><span class="hps" title="Click for alternate translations">Huck huck.</span></span><br />
<span class="short_text" id="result_box" lang="de"><span class="hps" title="Click for alternate translations"></span></span><br />
<br />
<span class="short_text" id="result_box" lang="de"><span class="hps" title="Click for alternate translations"><a href="http://en.wikipedia.org/wiki/Adolf_Hitler_in_popular_culture#Television">Ach! Der InputStream ist ein ... NuisanceStream</a>!</span></span><span class="short_text" id="result_box" lang="de"><span class="hps" title="Click for alternate translations"> Der Value, Der. </span></span><a href="http://www.imdb.com/title/tt0701080/quotes">No one who speaks German could be an evil man!</a><span class="short_text" id="result_box" lang="de"><span class="hps" title="Click for alternate translations"> </span></span><br />
<br />
<span class="short_text" id="result_box" lang="de"><span class="hps" title="Click for alternate translations">Happy New Year!</span></span>Peter Burkahttp://www.blogger.com/profile/15290253671442020919noreply@blogger.com1tag:blogger.com,1999:blog-1468479222802516939.post-4539259466341763162010-12-28T11:38:00.000-05:002010-12-28T11:38:01.878-05:00-XX:+UseCompressedStrings explainedIt looks like Oracle has finally released some documentation for <a href="http://thevirtualmachinist.blogspot.com/2010/09/xxusecompressedstrings.html">those options they've been using in SPECjbb2005 submissions</a>. The doc is <a href="http://www.oracle.com/technetwork/java/javase/tech/vmoptions-jsp-140102.html">here </a>and it looks like it appeared on Christmas eve.<br />
<br />
Like I guessed, they're using a byte[] array instead of a char[] array for Strings wherever they can.<br />
<br />
Presumably this makes the code path more complicated, because every time the JVM deals with a String it now needs to check what kind it is. The space savings are probably worth it, at least in some applications.<br />
<br />
Why isn't it on by default? Two possibilities:<br />
<ol><li>The penalty is too high in many applications. In my opinion, this would make it a bit of a benchmark special.</li>
<li>The option isn't quite ready for prime time yet, but they plan to turn it on by default later.</li>
</ol>Is this option "fair" to non-Western-European applications? I'd argue that it probably isn't unfair. A lot of String objects aren't involved in the user interface at all. In many applications, such as Eclipse, Strings are used extensively as internal identifiers for things like plug ins, extension points, user interface elements, etc. Even if your application presents a non-ASCII user interface there's a good chance that it still has a lot of ASCII strings under the surface. It might not benefit as much from this option, but it would probably still benefit.<br />
<br />
(Of course that assumes that there's no penalty for using non-ASCII Strings beyond the extra space. If the option is implemented in an all-or-nothing fashion, e.g. if it stops using byte[] arrays the first time it encounters a non-ASCII String, then non-ASCII applications wouldn't benefit at all.)Peter Burkahttp://www.blogger.com/profile/15290253671442020919noreply@blogger.com3tag:blogger.com,1999:blog-1468479222802516939.post-57561164945241628992010-12-24T13:59:00.001-05:002010-12-24T14:15:47.314-05:00Working backwardsSometimes I get a bug that I just can't figure out. If the problem is reproducible with a good test case it's usually fairly easy to narrow the problem down pretty quickly. But what do you do if your product crashed once on a customer's server, and hasn't failed again?<br />
<br />
Well, you start with the logs and diagnostic files. Usually we can figure it out from tracepoints and a core file. But sometimes this doesn't work.<br />
<br />
In cases like this I don't like to throw in the towel without doing something. It feels like defeat (probably because it is). Instead, I always try to figure out what additional information could have helped me solve the problem.<br />
<br />
How did we get to the point of failure? If I can identify two or three paths to the failure point and can't infer which one was taken I'll add some tracepoints to those paths. Or maybe I can add assertions on those paths to detect the error a bit earlier.<br />
<br />
Since the problem isn't reproducible they won't help me now, but they might help me in the future. If the problem does occur again (and 'not reproducible' really just means 'very rare') hopefully these diagnostics will get me one step closer to the actual problem. And if it didn't hit any of my new tracepoints or assertions when it reoccurs, that's useful (and potentially maddening) too.<br />
<br />
Of course I still might not be able to figure out what's happening. Then I add another round of tracepoints and assertions. Each failure gets me one step closer to the solution.Peter Burkahttp://www.blogger.com/profile/15290253671442020919noreply@blogger.com0tag:blogger.com,1999:blog-1468479222802516939.post-3386345868060893522010-11-06T16:44:00.002-04:002010-11-06T17:06:03.409-04:00Amiguous Java packages and innner classesYesterday, when I should have been paying attention to my breathing during yoga, I was instead thinking about inner classes. Specifically I was wondering how Java resolves ambiguities between package names and inner class names.<br />
<br />
In Java, an inner class is specified using its parent's name, a dot, and its name. for example, Foo.Bar is an inner class named "Bar" in the parent class "Foo". But this is also how Java names packages. Bar could just as easily be a first class class in package Foo.<br />
<br />
Note that there's no ambiguity once the code has been compiled. The class file format uses / as a package separator and $ as an inner class separator, so Foo$Bar is an inner class, and Foo/Bar is a top level class in package Foo. The ambiguity is only present in the compiler, where "." serves both purposes.<br />
<br />
I wrote a simple test case. I created a simple Foo.java file with a public static inner class Bar, and a Foo/Bar.java file with a public class Bar. Then I wrote a test class like this:<br />
<br />
<span style="font-family: "Courier New",Courier,monospace;">public class Test {</span><br />
<span style="font-family: "Courier New",Courier,monospace;"> public static void main(String[] args) {</span><br />
<span style="font-family: "Courier New",Courier,monospace;"> System.out.println(new Foo.Bar());</span><br />
<span style="font-family: "Courier New",Courier,monospace;"> }</span><br />
<span style="font-family: "Courier New",Courier,monospace;">}</span><br />
<br />
What happens when you compile and run this?<br />
<br />
<span style="font-family: "Courier New",Courier,monospace;">peter$ java Test</span><br />
<span style="font-family: "Courier New",Courier,monospace;">Foo$Bar@c53dce</span><br />
<br />
Notice the '$' in the name? The inner class won, so it looks like javac resolves the ambiguity in favour of the inner class.<br />
<br />
But what if we really want to mess with the compiler? What happens if you try to compile this file?<br />
<br />
<span style="font-family: "Courier New",Courier,monospace;">public class java {</span><br />
<span style="font-family: "Courier New",Courier,monospace;"> public static class lang {</span><br />
<span style="font-family: "Courier New",Courier,monospace;"> public static class Object {</span><br />
<span style="font-family: "Courier New",Courier,monospace;"> }</span><br />
<span style="font-family: "Courier New",Courier,monospace;"> }</span><br />
<span style="font-family: "Courier New",Courier,monospace;">}</span><br />
<br />
Yes, the file is called java.java. None of these names are reserved in Java, so this ought to be a perfectly legitimate class named java. It has an inner class, java.lang, with its own inner class, java.lang.Object (not to be confused with java.lang.Object, the superclass of each of these classes).<br />
<br />
javac doesn't seem to like this class very much, at least not the version installed on my Mac:<br />
<br />
<span style="font-family: "Courier New",Courier,monospace;">tmp peter$ javac java.java</span><br />
<span style="font-family: "Courier New",Courier,monospace;">An exception has occurred in the compiler (1.6.0_17). Please file a bug at the Java Developer Connection (http://java.sun.com/webapps/bugreport) after checking the Bug Parade for duplicates. Include your program and the following diagnostic in your report. Thank you.</span><br />
<span style="font-family: "Courier New",Courier,monospace;">java.lang.NullPointerException</span><br />
<span style="font-family: "Courier New",Courier,monospace;"> at com.sun.tools.javac.comp.Flow.visitIdent(Flow.java:1214)</span><br />
<span style="font-family: "Courier New",Courier,monospace;"> at com.sun.tools.javac.tree.JCTree$JCIdent.accept(JCTree.java:1547)</span><br />
<span style="font-family: "Courier New",Courier,monospace;"> at com.sun.tools.javac.tree.TreeScanner.scan(TreeScanner.java:35)</span><br />
<span style="font-family: "Courier New",Courier,monospace;"> ...</span><br />
<br />
I don't think that this is a security hole, since the ambiguity is only present in the compiler. It's possible that there's a similar ambiguity in reflection, but I'm not sure. That might be somewhat higher risk.<br />
<br />
<b>Update:</b> Eclipse compiles this java.java file with no problem. Looks like it's just an obscure bug in javac.<br />
<br />
<b>Further update:</b> I'm not the first person to discover this: <a href="http://www.bodden.de/tag/name-resolution/">http://www.bodden.de/tag/name-resolution/ </a>Peter Burkahttp://www.blogger.com/profile/15290253671442020919noreply@blogger.com0tag:blogger.com,1999:blog-1468479222802516939.post-62691625164009251272010-10-16T10:53:00.001-04:002010-10-16T11:00:59.047-04:00RIP Benoit Mandelbrot<a href="http://www.nytimes.com/2010/10/17/us/17mandelbrot.html">Benoit Mandelbrot, the father of fractals, is dead at 85.</a><br />
<br />
When I was in high school I was fascinated by the <a href="http://en.wikipedia.org/wiki/Mandelbrot_set">Mandelbrot set</a>. I wrote a program which rendered it (quite slowly) on my 286. It let you draw a line anywhere across the set and play it as musical tones. (Well, as musical as a PC's speaker could be.)Peter Burkahttp://www.blogger.com/profile/15290253671442020919noreply@blogger.com1