Another Helping of Abstraction, Please

Rails 3.1 is soon to be released, and with it comes two new abstraction libraries – CoffeeScript, and S(ass)CSS. These libraries are used to generate Javascript code, and CSS definitions respectively. While abstraction libraries are nothing new to Rails, the inclusion of two more got me thinking about the direction that Rails stack is heading.

CoffeeScript’s syntax seems to be to make Javascript as Ruby-ish as possible. It describes Javascript’s curly braces and semicolons as embarrassing.

SCSS aims to address some of the repetitive elements of CSS through the use of variables, nesting, and mixins. This feels more acceptable to me than CoffeeScript, but my first encounter left me burned.

A few other abstraction libraries of relevance: Haml aims to generate HTML without the use of HTML tags. Additionally, Squeel‘s (MetaWhere 2.0) aim is to remove all SQL from your projects.

So what am I bitching about? Abstraction is a good thing right? I see two categories of abstraction. The first being the “good” kind, that allow you to be completely ignorant of the underpinnings. For example, Ruby converting down into machine code.

The “bad” kind of abstraction are the substitution of a language with a DSL. This creates a lot of issues starting with development and debugging. In the case of CoffeeScript and SASS, you have to compile the DSL files into Javascript, and CSS files. I feel like this compile step is a step back from what we gain working with dynamic languages like Ruby, and Javascript to begin with.

Development in these libraries also requires that you understand both the DSL of the library, as well as being familiar with the language of the generated code. This additional skill requirement adds time to a project, and raises the entry bar for new developers. Its difficult to be proficient at a language, and a DSL that generates that language at the same time. A Ruby developer told me yesterday that he was surprised at how rusty his knowledge of SQL had gotten. Its shocking to me that a web developer would struggle with SQL, but I think its an accurate sentiment on which many Rails developers would agree.

Another casualty of abstraction is performance. Not only is the generated code sub-optimized relative to coding it by hand, it is also being run through through more system calls to get there. You can either compile up front (CoffeeScript, SASS), or you can incur this penalty on-the-fly (Haml, Squeel).

While none of the libraries are a requirement of web development, when working on a team that uses these technologies you are expected to produce consistent code. Even though these libraries let you execute “native” code, doing so is discouraged because of the availability of the DSL. The syntax for embedding native code (if its even allowed) is often cumbersome, and loses editor functionality such as syntax highlighting and parsing.

Since when did Ruby on Rails web developers stop working with SQL, CSS, HTML, and Javascript? I am beginning to feel like the Ruby camp is becoming the far left extremists of the web development world. The web is built on these core technologies, and the benefits of abstracting them doesn’t seem to outweigh the costs.

You Found Me!

Sorry for any confusion to the few who read my slice of the web. My old DNS name, simpson.mine.nu provided to me through dyndns.org expired leaving me stranded. Looking back through my emails it seems that I had 5 days to reply to continue my account and I failed to do so. Instead of just being a simple fix of creating a new account, they have moved my domain name to a premium service. Instead of forking over my cash, I have decided to stop being lazy and buy a real domain name. So for all who have made it this far, welcome to my new home. The bathrooms are two doors down on the right.

Year of the eBook Readers

No doubt that like me, many of you are getting or giving an eBook reader of some kind for this holiday season. It seems to be a perfect convergence of technology, price point, services and availability, and consumer demand. Initially I was not very interested in eBook readers because I saw them less about the experience, and more about creating a platform for companies to sell content through (a la iTunes). Coming from that perspective, I was impressed to discover the analog conventions present in the digital framework of eBooks. In particular, the Barnes and Noble Nook (in particular) allows me to do three things that surprised me:

  1. Lending program: I can take materials that I have on my device and lend them to my friends. This is setup to mimic lending an actual resource, although with a few more restrictions (14 day limit, one time, etc). During this time, I am not allowed to read the material on my device (policy over technology). It would be great to see this feature become cross-compatible with other platforms. The Nook app is free on most mobile devices, so sharing should be straight forward.
  2. Integration with public library systems: A big reason I resisted an eReader is that I don’t often purchase my books. I am an active patron at the library (why not? – I pay taxes for something!). This was an analog system that allowed me access to resources free of charge. It turns out that the Old Colony Library Network of Massachusetts allows you to checking out a wide range of materials in digital formats. What is even better is I can do this online!  There is a slight complexity with integrating with Adobe’s DRM solution for providing this functionality, but its mostly transparent to me.
  3. Previewing books at the Barnes and Noble: Another analog system is that I could go into a bookstore, and sit down and read a few chapters of the book to see if I liked it before I bought it. The eReaders allows this as well. I can go to any Barnes and Noble, connect to their wi-fi and browse for up to an hour a day for free. My eReader even gives me cafe coupons for food and drinks while I browse!

These digital solutions are meant to mirror our existing analog system. This is a smart move by the people driving the policies of these devices because its addressing the limitations people see with digital formats. These solutions aren’t perfect, but they are a breath of fresh air in the typical DRM rhetoric. What are your experiences with reading in a digital format? Has anyone coupled their device with desktop syncing software such as Calibre?

Taking the Magic Out of ActiveRecord – Speed up that Query!

Rails, and other programming frameworks that include an Object Relational Mapper (ORM) make is really easy to not worry about your database. That is until your application slows way down and you are left scratching your head trying to figure out why. Recently, I have been tasked with doing some things to optimize our database code in order to speed up the controller views.

The truth about databases is that you can’t beat them at their own game. Doing so would be like making a FPS on the consoles and releasing it the same day as Modern Warfare 2. A database stores data in a manner that can be easily and efficiently queried. ActiveRecord makes it easy to write code without worrying about the database underneath but this can create a issue. If you never know what the database is doing, how can you know if you are doing it efficiently, or not? I am of the firm belief (after witnessing Oracle chomp through 1000 line SQL queries like they weren’t there) that if a database can do something that your code can do as well, it is probably best to defer to the database. Pick the best tool for the job.

“If all you have is a hammer, everything looks like a nail” – Abraham Maslow

Lets look at some examples of where some database magic can improve our lives. Take the following code base:

# == Schema Information
#
# Table name: assignments
#
#  id         :integer         not null, primary key
#  post_id    :integer
#  keyword_id :integer
#  created_at :datetime
#  updated_at :datetime
#

class Assignment < ActiveRecord::Base
  belongs_to :topic
  belongs_to :keyword
end

# == Schema Information
#
# Table name: keywords
#
#  id         :integer         not null, primary key
#  name       :string(255)
#  created_at :datetime
#  updated_at :datetime
#

class Keyword < ActiveRecord::Base
  has_many :assignments
end

# == Schema Information
#
# Table name: posts
#
#  id         :integer         not null, primary key
#  title      :string(255)
#  created_at :datetime
#  updated_at :datetime
#

class Post < ActiveRecord::Base
  has_many :assignments
end

I have created a “Post” object to represent a blog post. On this blog post, I can assign keywords to each post. To do this, I have another model, “Keyword”, which is associated to posts through an “Assignment” model. The schema information shows the Database structure.

Now, recently I have come across some code that aimed to collect all the keywords of a post with some special options. These keywords were to be found, their names listed, and sorted without case-sensitivity. Finally, any duplicate keys would be excluded. A programmers mind might gravitate towards a solution such as:

Post.find(1).assignments.map {|x| x.keyword.name }.sort {|a,b| a.upcase  b.upcase}.uniq

Lets walk through this bit of code before I discuss better alternatives. To start with, we find our Post with an id of “1”, so that we can look at just those keywords. Next we iterate through these keywords (using the symbol to proc shorthand), and then run a sort on the returned array. This array is sorted by uppercasing the strings so that the sort is case-insensitive. Finally, “uniq” is run, to exclude any duplicate keywords. While this is a working solution, it doesn’t take any advantage of the power, and flexibility of what a database can do. A few issues:

  1. This code generates multiple SQL select statements. (N+1)
  2. Sorting can be done through “ORDER” in SQL
  3. Unique records can be generated through DISTINCT

The problem with generating multiple select statements is that this problem is an example of the “N+1” problem. This means that one query (“1”) will be run to determine what the attributes of the assignment is. After this is known, “N” queries are run for each keyword reference. In total, you have “N + 1” queries executed. If you have 5 keywords, the results will be unoptimized, but largely unnoticed. You will have 6 select statements:

  Assignment Load (6.3ms)   SELECT * FROM "assignments" WHERE ("assignments".post_id = 1) 
  Keyword Load (0.8ms)   SELECT * FROM "keywords" WHERE ("keywords"."id" = 1) 
  Keyword Load (0.7ms)   SELECT * FROM "keywords" WHERE ("keywords"."id" = 2) 
  Keyword Load (0.7ms)   SELECT * FROM "keywords" WHERE ("keywords"."id" = 3) 
  Keyword Load (0.7ms)   SELECT * FROM "keywords" WHERE ("keywords"."id" = 4) 
  Keyword Load (0.8ms)   SELECT * FROM "keywords" WHERE ("keywords"."id" = 5)

What happens if you have 500 keywords? 5,000,000 keywords? Your performance bottleneck will quickly shift to the database as multiple queries have to be generated, executed, and returned for each page request.

What is the big deal about sorting in your application instead of the database? In order for Rails (ActiveRecord specifically) to “sort” these items, the items will be be returned from a generated select statement from the database. These items will be returned as elements in an array, and then the array is sorted by calling the “” method on the Array class. Further, this is done in Ruby – a dynamic language, which is several orders of magnitude slower than in the database (which is most likely written in C). Simply, the database is made for this kind of work, and outsourcing it to Ruby is just not efficient.

Finally, why not make records unique with Ruby’s “uniq”? Again, it has to do with meddling in database territory. It has the problems inherent to the sorting problem above, but with an additional problem. Lets say that you return 500 records – and 499 of them are duplicates. Before the “uniq” method is run, Ruby is holding in memory references to the attributes of 500 ActiveRecord instances. When you call “uniq”, it can drop them. Why go through this memory spike and plummet, when the database can be instructed to just return one record? Memory is a valuable resource in a server environment – be frugal with its consumption.

So, lets address these issues, refactoring our code to take these issues into account. Starting with the multiple SQL statements, we can combine these into one statement by joining to multiple tables. I would encourage you to run “script/dbconsole” and get familiar with SQL. Let me demonstrate what we will be building the SQL way, before we implement this the “Rails” way.

SELECT a.* 
FROM   keywords a, 
       assignments b 
WHERE  a.id = b.keyword_id 
       AND b.post_id = 1; 

Another variant of this is to join the tables together using a “join” syntax. An inner join is the same type of join that we have replicated in the WHERE clause here. We can right the same SQL as follows:

SELECT a.* 
FROM   keywords a 
       INNER JOIN assignments b 
         ON a.id = b.keyword_id 
WHERE  (( b.post_id = 1 )) 

We can specify this join using ActiveRecord’s “find_by_sql” if we wanted to specify the association by hand. However this case is trivial enough that ActiveRecord can build this for us using “has_many :through”. I can add the following to my models:

# == Schema Information
#
# Table name: keywords
#
#  id         :integer         not null, primary key
#  name       :string(255)
#  created_at :datetime
#  updated_at :datetime
#

class Keyword  :assignments
end

# == Schema Information
#
# Table name: posts
#
#  id         :integer         not null, primary key
#  title      :string(255)
#  created_at :datetime
#  updated_at :datetime
#

class Post  :assignments
end

Now, I can gather all the keywords for a post by executing the following:

Post.find(1).keywords

Next, lets address the sorting issue by specifying an ORDER clause. We can tackle another related problem at the same time we do the sort. Recall that we want to sort in a case-insensitive fashion. If I just call order, then “Pabst” would beat “guinness” simply because of the capitalization (and we all know Guinness beer is better). The easy solution is to call “UPPER” to make the casing the same when the comparison is made. This actually saves even more computation on the Rails side by not having to do string conversions with our array sort. In SQL, we could append the following to our SELECT statement:

SELECT a.* 
FROM   keywords a 
       INNER JOIN assignments b 
         ON a.id = b.keyword_id 
WHERE  (( b.post_id = 1 )) 
ORDER BY UPPER(b.name)

The “Rails” way would be to include this on the association as follows: (Notice that Rails table alias names are probably not as reliable as listing out the name of the table itself. In this case, I have included “keywords.name”

# app/models/post.rb
...
has_many :keywords, :through => :assignments, :order => 'UPPER(keywords.name)'
...

Finally, lets address the unique problem. If I have duplicate keywords, I can return only the unique keywords by using the SQL DISTINCT modifier. In SQL, this would look like:

SELECT DISTINCT(a.name) 
FROM   keywords a 
       INNER JOIN assignments b 
         ON a.id = b.keyword_id 
WHERE  (( b.post_id = 1 )) 
ORDER BY UPPER(b.name)

In Rails, we can specify modifications to our SELECT clause, by passing the :select key to ActiveRecord’s “find”, and “all” methods. This has another benefit, depending on the application. For each column in each record, ActiveRecord has to store information in memory. By choosing only the specific columns that we want returned in the SQL clause, we can reduce the memory footprint. This could look something like this:

Post.find(1).keywords.all(:select => 'DISTINCT(keywords.name)')
# 0.16 seconds to complete w/ 200 keywords - thats 3 times faster!

So in summary, we have reduced SQL select statements, computationally expensive sorting and unique method calls from our results, and have managed to do all this without any fancy tricks. A sharp developer may point out that embedding SQL functions is bad form because it isn’t database agnostic. The truth is most databases conform to a base set of ANSI SQL standards and DISTINCT, and UPPER are acceptable almost across the board.

A little database magic can make a crawling Rails action become snappy again. I am a firm believer that Rails, like any framework should not be a reason to be uncaring about your SQL. Database are the heart of most applications, and usually one of the first bottlenecks for performance. Good SQL can be married with good Ruby code, and happiness will ensue. I hope this post was informative for Rails folks that want to get started with SQL optimization.

Ubuntu 10.04 – Very Refined

A lot has changed with Linux since I have last visited Ubuntu. I had an old crufty version of Ubuntu 8.10 sitting on my hard drive that I hadn’t booted into in quite some time. Realizing that April was a release month for Ubuntu, I decided to go get the latest and greatest.

There was a time when the software that I used on Linux was very exclusive to Linux. It took a lot of hunting down of programs to find what the best ones were for what I was doing since the names were all unrecognizable. That no longer seems to be the case. Google Chrome, has an official Linux client that runs quite well. Bookmark syncing to your Google account provides an easy way to import your information. Dropbox has a Linux client that integrates in with the Nautilus file manager.

Rhythmbox integrates in with Last.fm, Magnatune, and the new music store Ubuntu One. Empathy integrates in with Facebook chat, Google Talk, AIM, IRC, and many others. Gwibber integrates in with Facebook, Twitter, Flickr, Digg, and others. All of these integrate in with Ubuntu’s new Indicator Applet.

The new theme is nice, and the nVidia drivers are stable as always. The new theme does away with the Brown, and moves to a darker theme which I prefer. Compiz is running “discretely” providing effects that enhance with user experience without overwhelming it. The gravy on the cake is the new Ubuntu Software Center which takes all of the “apt-cache, and apt-get” out of the equation. The interface is revamped from the old “Synaptic package manager” and provides some nice touches such as “Featured Applications”, category views, and a seamless search, select and install experience.

If you are doing Rails development on Windows, do yourself a favor and revisit this classic to see how much improvement there has been to the Ubuntu experience.

Have You Had Your Daily Dose of Editors?

The Boss decided to purchase a license to RubyMine for me to use, and the rest of the office to evaluate. I wanted to share my experiences, since there doesn’t seem to be a lot of real-world experience on developing Ruby on Rails in a corporate setting using RubyMine. Also, some of my new (and past) coworkers might be curiously looking over their screens with TextMate to see what else is out there.

First, a bit about the Ruby on Rails culture. It is a very Mac OS X oriented, and the preferred editor of choice is TextMate. I really try and stay away from tools that only run on one operating system, and TextMate falls into that category. Ruby is a very terse, dynamic and simple language. Rails developers will tell you that you don’t need an IDE to do Rails work. While this is true, I find not using anything more than a text editor is like using a screwdriver instead of a power tool. If you are a good developer, and you understand Ruby a good editor will only make you more productive. RubyMine isn’t meant to be relied on like IDEs are for other strongly typed languages including C# and Java. It makes a best effort to provide its features without getting in your way when it fails.

RubyMine offers full support on Windows, Mac and Linux. RubyMine also strives very hard to make the Windows version as strong as the *nix versions. It does this by including an IRB console, and commands to run many rake tasks, and Rails generators. While these tools are a very good solution on Windows, people with the ability to run a native terminal will probably find the offerings lacking in comparison. This review will skip these Windows-audience features, since I don’t feel it represents the majority.

Auto-Completion

RubyMine does a very good job at trying to autocomplete its code. It will look inside Class definitions, and can find methods, attributes, and associations. If you are using gems that extend classes, such as ActiveRecord, RubyMine will do a fairly robust job at reading these methods from the gem files once they are attached to the project. “Attached” just means that RubyMine is inspecting these gems. It was not able to locate gems provided via Bundler, but this is supposed to be coming. Also, the auto-complete can be slow at times and freeze the editor from further input.

Inline Documentation

When you place the caret over a method, or class, RubyMine will fetch the documentation for that method and show it in the editor. This is doesn’t always locate the documentation however, in cases where the method is defined in a gem that is unattached.

Command+Click Following

You can click class names, and method names to jump straight to the definition. Also useful is clicking on associations, and named scopes. You can also jump to route definitions, and partials.

Cucumber Integration

There is auto-complete provided for your Cucumber tests, however also nice is the Command + mouse over action of displaying the definition of a scenario step. These can be Command + clicked to follow to where the step is defined. Also, if your step does not match a definition, you will be notified in the editor.

Safe Refactoring

Refactoring in this sense is renaming a variable, or a filename. The nice part about RubyMine is the ability to optionally search your project for usages of the current variable, or filename and update those references, or just notify you about them.

Spelling

Not a big selling point, however many editors don’t offer strong spelling support. It checks your comments, and your variable names, but stays out of the way of your code.

Find By Filename / Class

You can pull up a dialog that will allow you to type a filename and it will return all matches regardless of directory level. Filenames can be regular expressions, and can include paths, and even line numbers. RubyMine will find them, and in the case of the line number, it will open the file and jump to that location. Searching by a class name is very similar.

Copy Buffer

Only having a clipboard with one item in it can be frustrating at times. Using the copy buffer feature, I can copy multiple sections of a file, then paste them individually later.

Code Formatting

RubyMine allows for manual formatting, or formatting on paste. You can also auto-format a complete document with a keystroke, based on your auto-format settings. It even works on HTML/ERB, HAML, Javascript, and CSS.

RubyMine isn’t a perfect tool however, and there are things about it that are less than ideal. Specifically, the footprint of RubyMine can be quite large. This seems to be a sin it shares with many of its Java IDE brothers. After watching it creep (unnecessarily) up to 400+ MB, I decided to do something about it. The solution turned out to be very simple.  On OS X, look for the file “Info.plist” in the /Applications/RubyMine 2.0.2.app/Contents/ directory. On Linux, change the file in the rubymine/bin/rubymine.vmoptions file. Change the value for Xmx to be 128m. This is the memory cap in which RubyMine will run. Runs like a charm now, and for days too.

Other annoyances include the default editor settings. Changing to soft tabs was more confusing than it should have been. Allowing “virtual space” after then of a line leads to a lot of accidental whitespace. The right gutter line isn’t helpful for Rails development. The font face was terrible. I had to customize the default theme to make it use the Apple default font. And finally, I don’t like the “Project” oriented state. I would rather open from within a directory in the terminal and work from there. I also don’t care for it generating a work folder within my Rails project – its just one more thing I have to pay attention to when using version control.

All in all, this is certainty one of the best editors I have seen yet for Ruby and Rails work, while I am sure I haven’t even scratched the surface of what this editor is capable of doing. It beats Netbeans 6.x, and RadRails. It will be interesting to see how Aptana Studio 3 turns out as the Aptana folks seem to really be putting some love into it. These editors felt like Ruby support was tacked onto what was intended to be a Java editor. The other end of the editor spectrum are hundreds of weak text-editors. I wanted something in-between. RubyMine has a clear focus, and all of its options center around Ruby and Rails work. So, if you are using TextMate as your first, and only Ruby on Rails editor, give yourself some perspective try out RubyMine’s free trial.

Darkness Warshed Over the Dude

We moved into our new office this week. It is really nice inside – very professional. I got to construct a cabinet from IKEA and I thought my head was going to explode. There were points in the instruction booklet where instead of describing something, there would be an arrow and the noise some action would make like “snap!”. I have never seen modular furniture that was so – modular. We are finally setup at the new place, and the overhead lighting is making everyone hate the stupid high-gloss iMac screens. Someone made the comment that it was “like being stabbed repeatedly in the eyes”, and another “could count the hairs on his face”. Now that we have more space, I will be getting a second monitor which I know will be much less frustrating. Along with our move is the mandatory male interior decorator that is stylishly unshaven, wearing a sweater under a brown corduroy jacket, and designer jeans. He was walking around the room talking about how “powerful” the color scheme felt. At least we got some new chairs out of it.

I finally got to the Registry of Motor Vehicles this week after dropping Kristin off at the airport to go back to Atlanta. Of course her being gone threw me out of my barely comfortable routine. The RMV made me surrender my Georgia license without giving me a Massachusetts license. Go figure. So with Kristin gone, the beer gone, and me without a license for the last week I think I went a little crazy. Now that Kristin is back, I asked her to do a beer run with my sad “help me” eyes. No license yet.

I figured I could go park the car at the train station this week, and ride in as usual. Kristin usually drops me off in front, so it shouldn’t have been that different. I got there around 9am the first day I tried it and the damn lot was full. I thought I was going to be clever and drive a block and park in another parking lot and walk back to the station. Apparently everyone tries that because the signs all said “T station parking illegal. Vehicle will be towed at owner’s expense”. The last thing I needed was a pending accident investigation, no license, and a towed car. I ended up just driving in the whole way which I have come to increasingly dislike. The rest of the week worked out better. I just got there earlier and the lot had spaces. The office is right across the street from the train station, so without the bus, it takes about 45 minutes to get to work from the house.

Work has been a mixed bag. Some days I really feel independent enough to feel like I am churning out good, usable code. Other days it is a lot of waiting, or some silly thing that I get hung up on for hours. We moved our project through my first “iteration”. Development on this project is done through “User stories”. A user story (either real, or fictitious) outlines what is a problem, and what needs to be changed to resolve it. A developer can read the user story, make the revisions to the code, and then write a “feature” that demonstrates that it is resolved. The cool thing about features is that it is all natural language thanks to Cucumber. For example,

  Scenario: Search by title
    Given that I am a common user
    And I have a topic with a "title" of "test topic"
    And I am on the Topic Search page
    And I search for "test"
    Then I should see "Results"
    And I should see the topic "test topic"

You can see that it is very easy to follow, and it allows our limited-technical people to read User Stories features without worrying about understanding Ruby tests.  Also, making a change, then seeing your entire test suite pass is a very reassuring feeling. Testing is so important here, that we actually write the feature first, and then change the code to make it pass.

It has been quite a bit to adjust to. I haven’t really had the opportunity to do Ruby development full time, and I am unfamiliar with Git, and OS X. I have already gone through the O’Reilly book on Git with my time on the train, and I am a few chapters into the Ruby 1.8 Pickaxe book. Textmate has been decent, and I could get used to it, however it definitely lacks some of the things Rubymine has. I do like the speed and lightness of Textmate. Case in point is when switching a branch in Git, it applies a bunch of deltas to files in your path, and Textmate just detects that files have changed and reloads them in your tabs. I anticipate that Rubymine would shit bricks and try to re-index your project. Rubymine has some impressive features for speeding up Cucumber development, and showing code coverage though, so I may use it yet again. My co-workers have all been very supportive of me using the tools that I am most comfortable with.

I am getting adjusted to life is adjusting here, but I miss everyone very much. My weekends are just a lot of downtime, with Kristin at work, and me trying to occupy my time. I’ll try and get out of my funk today with a car wash, and a haircut. Hope everyone is having fun down in the Atl!

3 Days Down, 40 Years to Go

Yesterday at 5:00pm marked the end of my first week at Beacon Interactive Systems. My coworkers are all really nice, and there is a surprising geographic mix between them. Some folks have lived in Massachusetts their whole lives, while others come from Maryland, and Michigan. The cultural differences between “down South” and here are pretty minimal, unless you just feel like having a good laugh. There have been two big adjustments however: Snow is really not a big deal up here – people hardly notice it outside. The second is restaurants don’t have sweet tea. You would have to drink sweet tea to understand why this is a big deal.

In general:

  • The job is much less stressful. Even during crunch times, you hear Southpark and Big Lebowski quotes (“I’m not your pal, guy!”).
  • The environment is a lot less structured. You come in whenever, you leave whenever. If you want to go outside and toss around the football, go for it. Good team-builder by the way.
  • The skill sets of my coworkers are all very impressive. Its the rifle vs shotgun approach.
  • The job area is nice – its next to Harvard. Getting there is rough – I have to cut across the city. My 20 minute commute takes about an hour.
  • Developing on a Mac is an easier transition than I thought. I won’t say that I’m in love with it yet, but its workable. The biggest pain has been this silly bundled keyboard and mouse. No one else uses them. Also, package management on Mac sucks compared to Linux. I think I would actually prefer to use Linux. Time will tell on this one.
  • The coffee isn’t as good.

An interesting collision of viewpoints occurred my second day at the job, while I was shadowing a coworker on a joint project. He was showing me their (complex) system of bug detection, and correction. They write up a use case, file a ticket, branch the code, create a changset, rebase it, merge it into QA, verify it, then push it back upstream. Not coming from anything near that complex (“Hey Ben – login to the production server and change it!”) I was amazed that they spent so much time on this process. I asked if they ever just ignore a bug that would be too minimal to matter. My coworker asked me to clarify what I meant. I replied with “You know, its good enough for government.” He paused and looked at me funny, then reiterated that they address all bugs that are discovered. A bug is a bug. It will take me a while to harden my resolve to be like theirs, and aim for perfection. Perfection wasn’t possible before because we had the typical scenario of overworked, underpaid, and on a deadline.

We are moving into our new building in a few weeks. When we move, there will be a train station across the street from the new building, and I will probably make the transition to riding into work. Its about the same amount of time, but I would have the ability to sleep, read, surf the Internet, etc all without causing an accident.

Wish me luck for next week – its been a difficult adjustment.

Deploying Ultrasphinx to Production

Recently I rolled out a Rails app that used the Sphinx full-text indexing service in conjunction with the Ruby Ultrasphinx gem. I am very impressed with some aspects of this project, and I wanted to share my experiences for anyone looking for a better search experience with SQL databases.

Why Sphinx? Sphinx is an open-source, and stable full-text indexing service. It also has good support in the Rails landscape. Why full-text indexing? In a nutshell, people can spot a crappy search implementation really quick. Google is at the top of their game because it searches the way people think. Just try implementing the following with just SQL:

  • Conditional logic (&, |, -)
  • Rank based search results
  • Case (ben vs Ben), punctuation (bens laptop vs ben’s laptop), plurality (virus vs viruses) insensitive
  • Phonetic searching (candy can match candi)
  • Searching across multiple tables with results being in either, but not both
  • 100,000 rank based results in .02 milliseconds
  • Cached data, with delta scans for minimal performance impact

Yes, you could do all these things – but why? The folks at Sphinx do nothing but this, and have packaged it up for your to use at your whim. There are other niceties that you can include like sorting, pagination, restricting to certain columns, and best of all spell checking via the raspell gem.

To begin, you will need a MySQL, or PostgreSQL backend – something I just happened to luck out on with this particular application.  You should install Sphinx and poke around for a few minutes, to understand what Ultrasphinx provides you.

A note for Windows users – add the Sphinx bin/ folder to your path so you can just call its commands a-la Unix style. Additionally, I had issues running my Rails project in a directory containing spaces. YMMV

Ultrasphinx provides a Rails-centric way of using Sphinx. Sphinx provides the search service, and Ultrasphinx builds the configuration file, and manages the Sphinx process via rake tasks. Inside your models that will be Sphinx-ified, you will need to indicate which fields are indexable, and sortable. A useful feature of Sphinx/Ultrasphinx is the ability to create associated SQL to join multiple tables on the full-text search. See http://blog.evanweaver.com/files/doc/fauna/ultrasphinx/files/README.html for more information.

Once Ultrasphinx is configured, and has created a configuration file, you can start the indexing process, then start your Sphinx service. Notes on doing this in production via Capistrano follows:

desc 'Start Ultrasphinx searchd process'
task :ultrasphinx_start do
  run "cd #{release_path}; rake ultrasphinx:daemon:status RAILS_ENV=production" do |ch, stream, data|
    if data[/stopped/]
      run "cd #{release_path}; rake ultrasphinx:daemon:start RAILS_ENV=production"
    end
  end
end

desc 'Stop Ultraphinx searchd process'
task :ultrasphinx_stop do
  run "cd #{release_path}; rake ultrasphinx:daemon:status RAILS_ENV=production" do |ch, stream, data|
    if data[/running/]
      run "cd #{release_path}; rake ultrasphinx:daemon:stop RAILS_ENV=production"
    end
  end
end

desc 'Status of Ultraphinx searchd process'
task :ultrasphinx_status do
  run "cd #{release_path}; rake ultrasphinx:daemon:status RAILS_ENV=production"
end

desc "Reindex Ultrasphinx via indexer process"
task :ultrasphinx_reindex do
  run "cd #{release_path}; rake ultrasphinx:configure RAILS_ENV=production"
  puts "NOTE THAT THIS CAN TAKE A WHILE"
  run "cd #{release_path}; rake ultrasphinx:index RAILS_ENV=production"
end
before :ultrasphinx_reindex, :ultrasphinx_stop
after :ultrasphinx_reindex, :ultrasphinx_start
after 'deploy:update_code', :ultrasphinx_reindex, :roles => [:app, :web]

This Capistrano deploy.rb fragment has four tasks – start, stop, status, and reindex. The anonymous before and after calls ensure that the service is stopped before re indexing occurs. Note that this is a full reindex, and not a delta scan. My application didn’t have  reliable datetime column to determine new entries with, so I opted to do the full index every three hours instead. The database is a small one, with less than 100MB of data, so I can get away with it here.

Additionally, in Cron, you will want to setup a recurring task in your production server environment:

# Sphinx updates http://blog.evanweaver.com/files/doc/fauna/ultrasphinx/files/DEPLOYMENT_NOTES.html
# This merges delta indexes into main index
0 */3 * * * bash -c 'cd /path/to/app/current/; RAILS_ENV=production rake ultrasphinx:index >> log/ultrasphinx-index.log 2>&1'
# Make sure the service is running
*/3 * * * * bash -c 'cd /path/to/app/current/; RAILS_ENV=production rake ultrasphinx:daemon:start >> log/ultrasphinx-daemon.log 2>&1'

Dedicated Hosting

Just wanted to make a quick shout out to James Sumners for setting up our shiny, new hosting solution. I needed to move for several reasons, one of which being that I got a job in Cambridge, Massachusetts and will no longer be able to use Clayton State’s network. The other reason is its just damn time. I need a greater uptime than our network has offered lately.

If you are seeing this post, then welcome to the new and improved Zone of Mr. Frosti