Dec 182010

Most of the Shuttle Tracking project is managed via an open-source repository on Github, providing a great platform for others to check out the project and see that code we’re using to track shuttles here at RPI.  The one caveat is that not all of the code can be released under an open source license.  Shuttle Tracking interfaces with an external data provider responsible for the in-vehicle modules and their API isn’t public.  We also have a lot of config options specific to RPI that wouldn’t make sense in a public sense like references to the CAS config, our hoptoad instance, and our Google analytics config.

To help manage these RPI-specific things I commit them to my  local ‘RPI’ branch.  This branch doesn’t get pushed to Github (because the world can’t see some of the “secrets”) but it provides version control over things and lets me easily test out the changes in my development copy.  We also use Capistrano for our deployment, it makes it very easy for me to push new code to production and (more importantly) rollback code when things are broken.  The problem with Capistrano, or my understanding of it, is that it doesn’t easily pull code from a less-than-public branches.

So, I wanted to get my RPI-specific changes to the production server, which can only pull from the public ‘master’ branch on Github.  To do this, I added some code to my Capistrano config/deploy.rb file to pull the RPI changes as well.  The following code generates a patch file, sends the patch to the production server, and applies the patch with the RPI changes.

desc "Apply RPI-specific patch"
task :apply_patch, :roles => :app do
  patch_contents = `git diff --no-prefix master..RPI`
  put(patch_contents, "#{release_path}/patch", :via => :scp)
  run "cd #{release_path} && patch -N -p0 < #{release_path}/patch"

You’ll need to add something in here to call this function, like:

after "deploy:update_code", "deploy:apply_patch"

Presto, now your production code will carry over changes committed to a local branch. To make sure things don’t get crazy with conflicts, I make a point of checking out the RPI branch and merging master in with it before deploying it. This gives me the opportunity to resolve any conflicts that might come up during the patch process before they actually happen. We can also, pretty easily, see what makes each release (running cap deploy or alike) specific to RPI by looking at the patch file in each release folder on the server.

Dec 112010

I’ve been beta testing the new Shuttle Tracking system for the past 2 weeks and, after discovering the awesome Request-log-analyzer tool I started to crunch some numbers on the request for new shuttle positions.  Every 4 seconds the page calls /vehicles/current.js (translating to VehiclesController#current.JS) to ask for the latest shuttle locations.  It is important we answer this query as fast as possible, a slow response here can queue up incoming requests very quickly.  The client JavaScript isn’t very smart right now, so requests keep coming every 4 seconds until you leave the page which can bring the server to a screeching halt if we don’t answer (been there, seen that).

Looking at the current production site the average response time is 16ms, with 8ms of database work and 7ms of rendering time.  I ran numbers on the beta and saw the same query was averaging around 63ms, with the split 26ms database and .17ms rendering (No clue where the missing milliseconds are).  I was very very sad to see things were going close to 4x slower, I thought Rails 3 was suppose to make my world better!

Turns out it can, you just have to work a little bit harder.  What I almost forget to mention was that the current Rails-2 production system uses a much smaller dataset, the table with all the shuttle positions is archived and wiped clean every night, so at worst (like 11pm) the queries are hitting a few thousand rows.  On the other hand, my research into route identification and arrival prediction requires a historical dataset so I didn’t build any support into the new Rails 3 code to throw that data aside.  Maybe my code wasn’t so bad after all, but it was still measurable slower.

I switched the database over to my development server which runs orders of magnitude slower than the production box (all production / beta code is running on the same dedicated shuttle tracking production server).  I started by taking a look at the database queries my code was generating and none of them seemed too outrageous.  The first query finds all the shuttles that have the enabled flag true, SELECT vehicles.* FROM vehicles WHERE vehicles.enabled = true, and was only taking 1ms, nothing significant at all.  The real slow guy is the query, executed one for each shuttle, to grab the latest position SELECT “updates”.* FROM “updates” WHERE (“updates”.vehicle_id = 1) ORDER BY timestamp DESC LIMIT 1.  On the development box, running this query for just one shuttle (like it looks previously) was taking 1100ms, multiply that by 8 shuttles and you have >8 seconds of dedicated thinking time.  With the update interval of 4 seconds, the development server would probably implode as a result!

I considered rewriting the code to try and generate a different sql query.  We actually don’t want to know the latest position, we want to know the latest position if that position is recent (e.g has a timestamp within the last N minutes).  To achieve that I’d probably have to write a lamda scope, generating a query like SELECT “updates”.* FROM “updates” WHERE “updates”.vehicle_id = 1 AND “updates”.timestamp > recent_time_here) ORDER BY timestamp DESC LIMIT 1 which isn’t really that intimidating, but I don’t know if it would solve the real problem.  Database indexes, besides requiring less typing on my end, seemed like the better way to speed the query up.  (Lamda scopes are still intimidating most days)

I figure there are 2 parameters that the database cares about when it’s running the latest position query from above: vehicle_id and timestamp.  To figure out the best indexes to add I set out and tested my options, running each index independently, together, and them combined (in both orders).

The first row in represents the indexes added to the table, the vehicle_id + timestamp represents having two independent indexes (combining the first two test) and the comma-separated index represented a combined key.

The data showed, pretty clearly, that the combined key on [vehicle_id, timestamp] was the best index to add to the table. The results came in faster than any other index and (as a nice bonus) the index size wasn’t as large some of those that placed emphasis on the timestamp over the vehicle_id. Given the SQL query being executed, this makes sense. The query first needs to scope what vehicle to look for and then perform the timestamp operation.

I committed code to add the indexes to the updates table and updated the beta appropriately. I posted a new link on Twitter asking people to help out load / stress test the server and it was re-posted on Facebook a bit. I wanted to quickly generate enough data to compare with the previous beta run and the production log to see if the indexes signifnicantly helped everyone’s experience or it just a fluke on my development service.

Below you’ll find the numbers, after expanding out some of request-log-analyzer results, that show how much faster the indexes actually made things.

At first glance I wasn’t super thrilled that the new code, with indexes, was only 4ms faster than the existing code… but I guess another way to frame that is a 25% improvement which is fairly substantial and that same change (closer to 22%) was carried over the upper limit of the 95 percentile range of requests.

I do find myself wishing request-log-analyzer could run it’s computations on the millisecond level, perhaps I’ll look into that change if I’m feeling extra adventurous sometime soon.

While I look forward to having an expanded dataset in the production system for cool things like route identification and estimated arrival times, until those features are public you can look forward to saving around 4ms every time the shuttles move (or don’t move) on your display!

Dec 032010

Over the past 6 months or so I’ve been spearheading the re-write of RPI’s Shuttle Tracking system into something less RPI-specific to make it useful to other organizations.  Part of this has been small semantic changes like removing RPI-specific words, location references (like the hard coded map center) and CAS-based authentication, but on a much larger level the application was restructured to do a lot more.

Both old and new systems store the same data (vehicles, vehicle positions, routes, and stops along the routes) but you no longer have to directly manipulate the database to hide a stop from the map and you don’t have to understand how to build a KML file to change the route around anymore.  Additionally, the new system feel much less “hacky” if that makes any sense, things are where they should be (for the most part) and there’s actually some back end pages worth showing off; we’ll be able to iterate and release new features much faster.

I am always impressed when an interface get’s polished, but I’m rarely the one to do it (Thanks Reilly!)… what I can take credit for is the switch to Ruby on Rails 3.  Flagship Geo was a primary driver behind this, Rails 3 was necessary to pull in all those resources like the route and stop editors, but Rails 3 should also provide some performance enhancements.

The server has also been upgraded to include Ruby 1.9.2 via RVM because I think that makes it harder to break things.  When the site goes into production we’ll be serving using Passenger 3 to, in theory, speed up our web server end of the pipe.

As for the timeline of this release, the current system is staging in beta at RPI for performance testing / feedback.  After I’m satisfied the new one is performing at least as well as the old one it will be switched into production.  In the meantime, you can follow development on github:

Nov 162010

Whew, I think this is finally working.

Over the past few years I made the mistake of creating unique blogs for each project I had been working on.  It seems like a great way to segment things, create friendly URL’s, and not have to deal with old WordPress installations, but I decided that it wasn’t a very sustainable plan.  On the server end of things I ended up with 4-5 different WordPress installs, each with a separate database, apache config, etc. one huge mess that makes moving servers much slower than I’d like it to be.  When a project “finished” (aka I got busy with something else) I would end up with this dusty blog sitting out there on the internet somewhere.

My new plan is to use one blog for all my projects, using categories to separate posts into their respective projects.  Luckily, WordPress lets you generate RSS feeds based on categories so I don’t have to do any magic to keep separate RSS feeds working.  Also, having just one blog to maintain should help me keep 1 thing up and running better than 4-5 different things up and running.

Before this post I imported all my writings from Flagship Geo and Bonsai Video, two open source projects I worked on during the Summer 10 and Summer 09 respectively.  Over the next few days/weeks I’ll be adding other projects and notes that didn’t import so easily so stay tunes for some updates.  Ideally I’ll be posting >1 entry per week but don’t hold me to it.

So, here’s to giving this a try.

Aug 162010

Prof. Moorthy was unable to attend the RCOS meeting last Friday (2010-08-13) so I’d like to provide a few updates on the projects and groups that presented during the meeting.

Matthew O’Brien started the meeting off with some updates on Project Community Connected.  Matt has finished the main application for his community bulletin board, and is in the process of cleaning up some of the style elements and fixing a few bugs.  He’s hoping to test out the program sometime this fall and was going to be reaching out to a few communities over the next few weeks.

Anthony Loven and Brittany Jason have been developing IntuiTask, a lightweight agenda and calendaring application for the Android platform.  They’ve completed the basic application and are busy at work polishing the user interface and are going to be testing the application on different hardware devices over the next few weeks.  During the Fall semester (or may the Spring) they plan to add GPS support to their program so you can be notified of tasks you can complete around you, like picking up prescriptions in a pharmacy when you approach one.  Anthony mentioned they would be holding off on releasing their application on the Android Marketplace until they could finish some more testing to avoid any bad reviews an early beta edition might receive.

Sean Austin, Diana Mazzola and Griffin Milsap presented their final updates on droidViz.  They have been very busy refactoring their application to serve as a visualization framework which can be used by other applications.  They’ve named this new tool ‘LucidEngine’ and were able to show off a neat demo application they built using it.

Jacob Katz presented via Skype on his chess program, OpenGambit.  Over the past few weeks he has been fixing a few board-evaluation bugs and is going to be starting development work on the GUI over the next week or two.

Finally, Graylin Kim presented on all of the open source work he has been doing this summer.  He has been working with the New York State Senate on  a bunch of projects to help open up their various data feeds.  Graylin presented a few demos, showing off the Senbook, demonstrating all the work he has done aggregating different data sources from the Senate’s various data sources.

Jul 202010

I spent the better part of a week or so adding support for polylines into Flagship Geo.  The hard part wasn’t the data structure development, it was the ability to easily edit the lines in a way that made sense.

Google Maps works as a great User Interface control for maps, but its just a control.  You have to tell it what to do when someone drags a marker around or clicks on the map, there is no automatic way to tell it that the person is trying to edit the line.  All the objects people can interact with need event handlers written for them, which reminds me a lot of regular desktop application development.  I usually take for granted the fact that a tag like <a href=”#”></a> will automatically know what to do when clicked.  Tools like jQuery make it easily to build on those actions, sending that a href into a AJAX library or something.  Building the polyline editing interface was much more like building a desktop application in C, Python, or alike where you have to bind everything yourself.

Looking back on the past week, the hardest part was the ability to delete a point from your line.  This kind of reminded me of my days in Data Structures or Algorithms at RPI when you’re manipulating nodes in a tree or a list (a polyline is just an array of latitudes & longitudes) except unlike C, JavaScript doesn’t have any notion of pointers.

I got stuck trying to do have my markers (the icons you drag to manipulate the map) both update the visual line you see AND update the hidden HTML fields that store the latitudes and longitude for the point.  I found that replacing the event listeners with code that contained the update line point weren’t working correctly, so I ended up building my own pointer-esque data structure that contains id numbers to match up markers, polyline points, and html points.  While I try and minimize my use of global arrays, I couldn’t find any decent way around JavaScript’s (and Google Maps event listener) limitations.

You can take a look at the commit where most of this code was sent along here:

Jul 112010

One of my biggest complaints about Ruby on Rails relates to the environment needed to develop an application.  Last Wednesday I took-off for a few days of holiday up in Vermont.  Internet access is limited to my mobile phone while I’m in Vermont, so I have to take development as “offline” as possible while I’m up there.  Doing this with my Rails applications is a mess.

If I was doing PHP development, things were pretty straightforward for me to setup my laptop as a mobile development environment.  I could download XAMPP, or another all-in-one Windows-AMP stack, to a USB drive and copy all the source code I needed.  I guess I would also need to export a copy of a MySQL database as well, but that’s a pretty quick step.  After running the XAMP installer, I could drop all my source code into the folder and presto! things would be working good as new.

Ruby on Rails doesn’t afford me those kind of luxuries.  My laptop runs Windows 7, so I can’t just `gem install rails` and call it a day.  I’d never setup a local development environment on Windows (I always use a Linux server), so my plan was to download a few different packages and see what I could make work.  Since I was in a rush to depart, I downloaded as much as I could under the premise that I would be able to install things later.  Hopefully I would have enough pieces.

I found two common approaches out there to quickly setup a development environment on Windows.  The first involved virtual machines, where you would run your own Linux server within Windows.  I tried downloading 2 (or maybe 3) different server images, none of them would boot correctly in VirtualBox.  My second plan was to install a XAMP-like environment but with Ruby on Rails instead of PHP.  I was able to get this to install, but it seemed the stack I downloaded included some strange version of Ruby that wasn’t compatible with my application.

Nevertheless, I realized that regardless of what I did while “offline”, I wouldn’t have been able to get my applications up and running without internet access to install the different gems and plugins I require for my applications.  I had probably downloaded enough pieces to patch together a framework/Rails-stack, but that wouldn’t include the externally referenced modules in my code.  Gems and plugins (and by plugins I mean git submodules) are great, but you need to plan for them first when you’re going offline.

In a deployed, production-level application, gems and plugins might be frozen or bundled with your application, but that doesn’t fit well with the development of an application.  It would be really handy if the “download” button on Github automagically included all the git submodules you would need for an application, and I’m optimistic bundler will help solve some of the gem-deployment issues Rails applications face.

Now that I’m back “online,” I’m going to stick with my trusty development servers.  They might not be easy to carry around with me, but at least I don’t have to worry about keeping their configuration up to date or taking them offline.

Jun 282010

Yikes, last week I was too busy working/in-meetings to squeeze in a blog post, so I’ll post a quick one to start of this week.

Since I’ve taken the plunge into developing a test suite for all of my new rails applications (most notable Concerto 2), I’ve struggled with the testing work flow.  Its very natural for me to write a piece of code, refresh my browser, verify the results, and repeat the cycle until I get it right.  The more official testing strategy requires me to write a test + the actual code and make sure the two of them mesh up before I even bother to refresh my browser.

Starting this process was a bit hard for me, probably because my test-writing skills weren’t so hot.  Many times my `rake test` results were failing not because there was an actual bug in my application code, but there was a bug in my testing code.  Unlike the application code which can be viewed in a browser, there really isn’t any way to know if your test is running as expected, so you really have to keep things as simple and straightforward as possible.

If you’re going to be developing in Ruby on Rails and using tests for your application, I’ve found a few tools that help out a lot:

  • autotest – Autotest is a tool that automatically runs the relevant portion of your test suite as you save files.  Essentially, when you save the file foo.rb autotest will automatically run the tests it thinks are related to foo.rb and nothing more.  In rails land, this saves a lot of time waiting for tests to run that are unaffected by a small or incremental change.  You can install some cool plugins to make the output red or green or use growl notifiers if you want… I just use the red/green plugin to spice up my terminals
  • CI Joe – CI Joe is a really simple continuous integration server.  A continuous integration server (automatic testing server) doesn’t make a whole lot of sense for single developer projects, but as more people are contributing and committing code it helps to have an automated system that is making sure the latest commit doesn’t break any tests.  I set this up for Concerto 2 development so everyone on the development team can quickly see if the HEAD is broken or not without having to ssh somewhere, git pull, and rake test on their own.

My one cautionary word about testing: don’t dive in too deeply.  It makes sense to develop really well formed test cases for critical portions of your application or code that is constantly being changed, but I’m not sure there is value in developing 12 tests to validate every possible form input and exact error message wording… at some point your time is better spent developing the application instead of supporting tests.

Jun 162010

Typically when I go to generate some XML-style documents in Ruby on Rails I manually code the XML syntax and manually escape and substitute the strings where I want them. This technique is pretty sloppy in to toss in a view, and relies on your ability to generate well-formed XML off the top of your head. (Its usually the escaping that becomes an issue.)

Thinking back, this the probably the quick and easy technique I picked up from my PHP development projects.  I could hand-code and debug some XML faster than I could find a suitable library, install it, and figure out how to work it.  Sometimes it’s just easier to do things the hard way.

As a rule of thumb, I think the hard way in PHP never translates well into Ruby on Rails.

Yesterday I was struggling to cleanly generate XML in Rails 3 because of the new default sanitation.   Using Builder was easy enough to generate the right XML structure for my KML file, but getting it to output was a challenge.  All of my &lt; tags were getting replaced with &lt; and such, and the raw parameter (&lt;%=raw foo %>) wasn’t cooperating.

I ended up discovering that I could rename my file from show.kml.erb to show.kml.builder, and I wouldn’t have to mess around with any escaping or erb syntax at all. You can check out the code I used in this commit.  It might be just me, but I always struggle to find the appropriate documentation for these little nuances in Rails.  There is tons of code showing how to use Builder to build XML documents, but not one of them mentioned what to name your file.

This technique definitely took me longer than a quick pass through manually plugging in the string would have, but its a lot cleaner.  I don’t have to worry about escaping or generating valid XML, and if performance is an issue I can install a new XML builder to speed things up.

Jun 092010

I just bumped Flagship Geo to use Rails 3 Beta 4 (commit), luckily everything seems to still be working when I run my test suite.   Installing the new version of Rails is pretty easy, you just have to run `gem install rails –pre`.  You might want to sudo if you keep your gems system wide.

I use Passenger to serve most of my rails app on my development server and I had to change around the config a bit to get things working with Beta 4.  Specifically, I had to switch Passenger to treat my application like a Rack application.  I don’t know exactly what this means from an application-architecture standpoint, but I believe its related to the initialization procedures used to boot the application.  To make the switch, I had to edit my public/.htaccess file.  You might need to edit your apache virtual host config if that is where you store your Passenger config info.  The switch is pretty easy, switch every instance of “Rails” to “Rack”:

RackEnv development
RackBaseURI /geo

Also, bundler has been getting on my nerves.  When I upgraded rails, bundler was updated to 0.9.26 which is very confused about its gem locations.  From the command line, everything works great.  I can rake test, ./script/rails console, and do all of that great stuff but when I load my application up in the browser half of my gems can’t be found.  I needed to sudo gem install them to manually make them available system-wide, doing just `bundle install` would install them locally which was good enough for CLI work but didn’t cut it for Passenger.  I believe this might be fixed in bundler 0.10… due out ASAP?

Otherwise, everything is working great.  I look forward to the Rails 3 RC upgrade later in the week… hopefully the upgrade won’t be much harder.