Tuesday, December 16, 2014

Flickr, beards and feature recognition

So it looks like Flickr is applying image recognition beyond face recognition. A couple of months ago they demoed their "Park or Bird?" feature recognition system, but I hadn't realized they'd put anything into production until today.

Yesterday, I uploaded some old photos from mid 2000 that I'd taken with my first digital camera, and had recently recovered (thanks Dana). This morning I noticed that someone had found the photo below by searching for "Beard". That seemed odd, since I hadn't tagged any of the old photos. The only way that Flickr could know that Gac has a beard is by looking at the photo itself.

Something feels a little bit creepy about this, but on the other hand it's also pretty cool.

I would like to see Flickr expose this more explicitly, and give me an opportunity to edit these automatically added tags.

Sunday, November 16, 2014


I learned something about Picasa today: when you edit a photo, it stores all the modifications as extra fields in the JPEG file and doesn't modify the displayed image. Until you export the image from Picasa, other tools only see the original image.
I have a bunch of photos from before 2006 that I salvaged from an old Thinkpad, and which I'd touched up with Picasa, and I want to upload them to Flickr. But because the touch-ups are all in metadata, none of the images reflect this.
I downloaded the latest version of Picasa for OS X, imported all of these photos, and amazingly it seems to have correctly applied the changes (which were made with a much older version of the tool). They're not pixel-for-pixel identical -- I presume that some of the enhancement algorithms have changed -- but they're damn close.

Friday, August 22, 2014

The case of the missing Citibikes

The example of my friend and colleague Ben, with his amazing I Quant NY blog, has motivated me to try my hand at some open data hacking. Ben's written several posts where he analyzes Citibike bike share data. Citibike has made all of their trip data through the end of May 2014 available for free download. I'm a huge fan of New York's bike share program and of their open data policy.

Ben has analyzed trips and stations, but I have a different question: how many of New York's shared bikes have been stolen or lost?

The New York Post reports that bikes are routinely stolen from Manhattan stations and ridden to underserved parts of Brooklyn and Queens. I'm not too concerned about these bikes: they're recovered quickly, and Citibike may wish to treat the bikes' eventual destinations as a kind of desire line. Clearly Crown Heights residents can't wait for the program to expand to their neighborhood.

In July, the 109th precinct proudly reported that their detectives had detected a 68 year old man riding a Citibike which had been liberated and repainted. The suspected thief was detained and his ride confiscated. That's what I'm looking for!

So I got myself the trip data, put together a quick-and-dirty python script, and identified the first and last trip for each bike in the system. I presume that if a bike is stolen or destroyed it will disappear from the trip data, so we can guess that if a bike hasn't been ridden in some time, it's likely gone AWOL.

Note that the bikeid field in the data doesn't appear to match the number stenciled on the bike's frame. It could correspond to the electronic identifier (probably an RFID tag) which the stations use to identify bikes. If that's the case, missing trip data could simply indicate that the electronics were damaged and replaced.

There are 6943 unique numbers in the trip data. This is roughly consistent with a New York Times story, published when the program launched, reporting 6,000 bikes in the system.

If we sort the bikes by their final trip, we can quickly get an estimate of losses.

MonthFinal ridesMonthFinal rides
The vast majority of the bikes showed activity in May 2014, meaning that they weren't stolen or lost. Before April, each month saw between 17 and 68 final rides, averaging 36.5 each month.

At first glance, April appears to have been a disastrous month for Citibike thefts. But a more likely explanation could be that those bikes have been removed for maintenance. If Citibike keeps 300-400 bikes in their warehouse for routine tuneups, and if it takes two months for the bikes to rotate back out into service, it could easily explain most of the 428 bikes which were ridden in April, but idle in May. We would expect most of them to return to service in June. 

February and March also saw higher than average losses. Perhaps bike thieves are more active in those months, but this may be better explained by the unusually snowy winter. Plows, for example, may have taken a toll on the fleet.

Citibike can, at least in theory, bill a rider $1200 for failing to return a bike. If they collected this fee for each bike which went missing before April, they'd have raised nearly $400,000. However I've yet to hear of any rider receiving such a bill.

Assuming that these final trips do represent theft and loss, approximately half of 1% of Citibikes are lost each month, or about 6% every year. That's far better than the reputed 80% of Paris's VĂ©lib' bikes which were stolen in that system's first year!

Update: the original version of the table showed 116 trips which ended in June. This is because there were a handful of trips which started on May 31 but finished after midnight, and were thus credited to the next month. To make it less confusing, I've merged these final rides into the data for May.