Dec 182010

Most of the Shuttle Tracking project is managed via an open-source repository on Github, providing a great platform for others to check out the project and see that code we’re using to track shuttles here at RPI.  The one caveat is that not all of the code can be released under an open source license.  Shuttle Tracking interfaces with an external data provider responsible for the in-vehicle modules and their API isn’t public.  We also have a lot of config options specific to RPI that wouldn’t make sense in a public sense like references to the CAS config, our hoptoad instance, and our Google analytics config.

To help manage these RPI-specific things I commit them to my  local ‘RPI’ branch.  This branch doesn’t get pushed to Github (because the world can’t see some of the “secrets”) but it provides version control over things and lets me easily test out the changes in my development copy.  We also use Capistrano for our deployment, it makes it very easy for me to push new code to production and (more importantly) rollback code when things are broken.  The problem with Capistrano, or my understanding of it, is that it doesn’t easily pull code from a less-than-public branches.

So, I wanted to get my RPI-specific changes to the production server, which can only pull from the public ‘master’ branch on Github.  To do this, I added some code to my Capistrano config/deploy.rb file to pull the RPI changes as well.  The following code generates a patch file, sends the patch to the production server, and applies the patch with the RPI changes.

desc "Apply RPI-specific patch"
task :apply_patch, :roles => :app do
  patch_contents = `git diff --no-prefix master..RPI`
  put(patch_contents, "#{release_path}/patch", :via => :scp)
  run "cd #{release_path} && patch -N -p0 < #{release_path}/patch"

You’ll need to add something in here to call this function, like:

after "deploy:update_code", "deploy:apply_patch"

Presto, now your production code will carry over changes committed to a local branch. To make sure things don’t get crazy with conflicts, I make a point of checking out the RPI branch and merging master in with it before deploying it. This gives me the opportunity to resolve any conflicts that might come up during the patch process before they actually happen. We can also, pretty easily, see what makes each release (running cap deploy or alike) specific to RPI by looking at the patch file in each release folder on the server.

Dec 112010

I’ve been beta testing the new Shuttle Tracking system for the past 2 weeks and, after discovering the awesome Request-log-analyzer tool I started to crunch some numbers on the request for new shuttle positions.  Every 4 seconds the page calls /vehicles/current.js (translating to VehiclesController#current.JS) to ask for the latest shuttle locations.  It is important we answer this query as fast as possible, a slow response here can queue up incoming requests very quickly.  The client JavaScript isn’t very smart right now, so requests keep coming every 4 seconds until you leave the page which can bring the server to a screeching halt if we don’t answer (been there, seen that).

Looking at the current production site the average response time is 16ms, with 8ms of database work and 7ms of rendering time.  I ran numbers on the beta and saw the same query was averaging around 63ms, with the split 26ms database and .17ms rendering (No clue where the missing milliseconds are).  I was very very sad to see things were going close to 4x slower, I thought Rails 3 was suppose to make my world better!

Turns out it can, you just have to work a little bit harder.  What I almost forget to mention was that the current Rails-2 production system uses a much smaller dataset, the table with all the shuttle positions is archived and wiped clean every night, so at worst (like 11pm) the queries are hitting a few thousand rows.  On the other hand, my research into route identification and arrival prediction requires a historical dataset so I didn’t build any support into the new Rails 3 code to throw that data aside.  Maybe my code wasn’t so bad after all, but it was still measurable slower.

I switched the database over to my development server which runs orders of magnitude slower than the production box (all production / beta code is running on the same dedicated shuttle tracking production server).  I started by taking a look at the database queries my code was generating and none of them seemed too outrageous.  The first query finds all the shuttles that have the enabled flag true, SELECT vehicles.* FROM vehicles WHERE vehicles.enabled = true, and was only taking 1ms, nothing significant at all.  The real slow guy is the query, executed one for each shuttle, to grab the latest position SELECT “updates”.* FROM “updates” WHERE (“updates”.vehicle_id = 1) ORDER BY timestamp DESC LIMIT 1.  On the development box, running this query for just one shuttle (like it looks previously) was taking 1100ms, multiply that by 8 shuttles and you have >8 seconds of dedicated thinking time.  With the update interval of 4 seconds, the development server would probably implode as a result!

I considered rewriting the code to try and generate a different sql query.  We actually don’t want to know the latest position, we want to know the latest position if that position is recent (e.g has a timestamp within the last N minutes).  To achieve that I’d probably have to write a lamda scope, generating a query like SELECT “updates”.* FROM “updates” WHERE “updates”.vehicle_id = 1 AND “updates”.timestamp > recent_time_here) ORDER BY timestamp DESC LIMIT 1 which isn’t really that intimidating, but I don’t know if it would solve the real problem.  Database indexes, besides requiring less typing on my end, seemed like the better way to speed the query up.  (Lamda scopes are still intimidating most days)

I figure there are 2 parameters that the database cares about when it’s running the latest position query from above: vehicle_id and timestamp.  To figure out the best indexes to add I set out and tested my options, running each index independently, together, and them combined (in both orders).

The first row in represents the indexes added to the table, the vehicle_id + timestamp represents having two independent indexes (combining the first two test) and the comma-separated index represented a combined key.

The data showed, pretty clearly, that the combined key on [vehicle_id, timestamp] was the best index to add to the table. The results came in faster than any other index and (as a nice bonus) the index size wasn’t as large some of those that placed emphasis on the timestamp over the vehicle_id. Given the SQL query being executed, this makes sense. The query first needs to scope what vehicle to look for and then perform the timestamp operation.

I committed code to add the indexes to the updates table and updated the beta appropriately. I posted a new link on Twitter asking people to help out load / stress test the server and it was re-posted on Facebook a bit. I wanted to quickly generate enough data to compare with the previous beta run and the production log to see if the indexes signifnicantly helped everyone’s experience or it just a fluke on my development service.

Below you’ll find the numbers, after expanding out some of request-log-analyzer results, that show how much faster the indexes actually made things.

At first glance I wasn’t super thrilled that the new code, with indexes, was only 4ms faster than the existing code… but I guess another way to frame that is a 25% improvement which is fairly substantial and that same change (closer to 22%) was carried over the upper limit of the 95 percentile range of requests.

I do find myself wishing request-log-analyzer could run it’s computations on the millisecond level, perhaps I’ll look into that change if I’m feeling extra adventurous sometime soon.

While I look forward to having an expanded dataset in the production system for cool things like route identification and estimated arrival times, until those features are public you can look forward to saving around 4ms every time the shuttles move (or don’t move) on your display!

Dec 032010

Over the past 6 months or so I’ve been spearheading the re-write of RPI’s Shuttle Tracking system into something less RPI-specific to make it useful to other organizations.  Part of this has been small semantic changes like removing RPI-specific words, location references (like the hard coded map center) and CAS-based authentication, but on a much larger level the application was restructured to do a lot more.

Both old and new systems store the same data (vehicles, vehicle positions, routes, and stops along the routes) but you no longer have to directly manipulate the database to hide a stop from the map and you don’t have to understand how to build a KML file to change the route around anymore.  Additionally, the new system feel much less “hacky” if that makes any sense, things are where they should be (for the most part) and there’s actually some back end pages worth showing off; we’ll be able to iterate and release new features much faster.

I am always impressed when an interface get’s polished, but I’m rarely the one to do it (Thanks Reilly!)… what I can take credit for is the switch to Ruby on Rails 3.  Flagship Geo was a primary driver behind this, Rails 3 was necessary to pull in all those resources like the route and stop editors, but Rails 3 should also provide some performance enhancements.

The server has also been upgraded to include Ruby 1.9.2 via RVM because I think that makes it harder to break things.  When the site goes into production we’ll be serving using Passenger 3 to, in theory, speed up our web server end of the pipe.

As for the timeline of this release, the current system is staging in beta at RPI for performance testing / feedback.  After I’m satisfied the new one is performing at least as well as the old one it will be switched into production.  In the meantime, you can follow development on github: