Background Processing Changes

I spent most of my time this week trying to make things go faster without upgrading the old server things are running on. One of the bigger slowdowns I noticed was BackgrounDRb which wasn’t only taking up a hefty chunk of memory when running, but was also querying the database non-stop looking for new jobs in the queue.

I ended up switching out BackgrounDRb with Workling/Starling. While I lost the persistent job queue (something I’ll look into handling internally) the performance gains I’ve noticed are pretty significant. This change has been committed and pushed to github and I was looking forward to saying you can check out how I did it here but it looks like the change was too significant for GitHub to track. You can always download a local copy of the code and see what I changed if you’re interested.

It boiled down to installing working, starling, and some dependencies (like daemons). Converting the actual background scripts was pretty straightforward. I had to change the top of the classes to reference Workling instead of BackgrounDRb and strip out references to the persistent job queue. In the controllers/models that call the conversion I adjusted my function usage slightly (removed the hash wrapper I had on my arguments) and presto, things were working.

On another note, this week the Concerto team launched the Concerto public site at http://www.concerto-signage.com/. Its a pretty great looking site! I also spent some time working on the new version of Shuttle Tracking at RPI, which will be released under an Open Source license this year. I’ve slowly been re-writing the user portion of the application in Ruby on Rails and trying to optimize the code as much as I can for maximum performance. The javascript has also been completely reworked so we can use the faster Google Maps V3 API when they add functionality for polylines, or something that will allow us to draw the shuttle routes.

BackgroundRB Slowdown

I finished transitioning the thumbnail generation to a background process, using the same approach I use for video conversion because it seems like a thumbnail is just a special conversion. You can check out the commit here if you’re interested.

Adding another BackgroundRB worker was easy to do, but I’m finding that its wreaking havoc on my server. The box I’m developing bonsai video on is a pretty lightweight box (it might be my lightest now that I think about it). I choose to do this to force me to make sure my code would work well on slower boxes. So far I haven’t had any major issues. FFMPEG seems to convert videos fairly quickly, and up until today everything seemed under control. Combined, the 3 BackgroundRB workers are using up somewhere between 25 – 40% of my system’s memory, as reported by top. I’m not sure why the processes are using so much when they are sitting their waiting for work, and I doubt the sql overhead of pooling for new jobs (I use the persistent job queue) is that overwhelming.

This has made regular site navigation terribly slow, but at least pages don’t time out like they previously did when generating thumbnails. I think I have 3 options to fix things:

  1. Refactor and combine the workers, so both videos and thumbnails are handled by the same worker. I suspect this might be able to save me 10-15% on my memory usage, but its not as clean as I’d like it to be.
  2. Replace BackgroundRB with something else that may be a bit faster or more efficient. It looks like people have experienced the same problems I’ve had, maybe just on a smaller scale since they have more powerful servers. I would rather avoid this approach, since it means more development time reworking things I’ve already done.
  3. Buy more RAM or upgrade the server. This one is tough for me, as I’d like this to be able to run on a fairly low-end machine, but I’ve found that Ruby on Rails has higher system requirements than something like PHP would. Additionally, this costs money.

For now, I’m just going to turn off the background workers so I can do some interface work.

Conversion Job Pool

In my previous post I talked about some conceptual stuff I use to convert videos to various formats. Yesterday I finished up the implementation that allows for multiple video conversions to occur at the same, so let me walk you though some of the more technical details to the best of my understanding.

BackgrounDRb does all the hard work, and can handle the idea of a job pool. What happens is this: there is one worker that is started by default. Instead of using this worker to handle the video conversion, I use it as the queue manager. My conversion controller calls something like this to send a new conversion request to the queue:

MiddleMan.worker(:video_worker).enq_queue_convert(:args => {:conversion_id => @conversion.id}, :job_key => @conversion.id)

In my video worker, I have two methods one for the queue_convert calls, and one that actually handles the convert.

#Queue up the video for conversion
def queue_convert(args)
conversion = Conversion.find(args[:conversion_id])
conversion.update_attributes({:status => “queued”})
thread_pool.defer(:convert,args)
end

#Run the conversion of the video
def convert(args = nil)
logger.info(“Calling convert method for conversion #{args[:conversion_id]}.”)
conversion = Conversion.find(args[:conversion_id])
#do stuff
#Mark this job as complete in the job queue
persistent_job.finish!
end

My understanding is that thread_pool.defer method puts the job into a persistent queue, such that if everything crashes before the jobs starts it can still recover. If the maximum number of workers hasn’t been reached a new one is spawned to handle the request. For my uses, it would be nice if I could write a bit more code to choose when a new worker spawns. Free memory and processor usage are much more important than total number of procs when it comes to video conversion. Time permitting, I might dive into BackgrounDRb and see how easy it would be to change that around a bit. In the convert method, the only important line is at the end, persistent_job.finish! which marks the job as complete. I suspect this only frees the worked to look for another job or shut down, during my tests a job that crashes halfway through its run (lets say FFMPEG Seg Faulted in the middle of my convert code) is not automatically retried when BackgrounDRb is restarted.

Since converting videos can take a really long time, I had a conversion model to track that status of things of a video that being processed by the video worker. I think I could have implemented something with BackgrounDRb’s result cache, but it struck me as not the most persistent way to track details about a conversion.

If you’re looking for the actual code that *works* for me, you can check it out on GitHub: http://github.com/bamnet/bonsai-video/tree/master