Computers, Open-source, Ruby, Software

Don’t Use “#” In the Paperclip Gem

I’ve learned a whole lot more about ImageMagick commands last week than I ever really wanted to know. The problem was that our uploaded images were having content cropped off the top, bottom, and sides.  Like many folks in the Rails world, we pass our attachments through Paperclip to handle all of the nitty gritty resizing operations.

I was interested in understanding how we could prevent our content from being cropped off when I came across an interesting idiom in the geometry settings:

has_croppable_attachment :image,
styles: {
:'630x315' => { geometry: "630x315#", format: :jpg },
}

Well take a look at that. There is a “#” symbol suffixed to my image geometry. I went to ImageMagick to lookup what this flag meant. Spoiler alert: It doesn’t exist there. After some digging around, I discovered that this idiom is provided by the Paperclip gem, and translates to the following convert command:

convert '/path/to/source.jpg' -resize "630x" -crop "630x315+0+0" '/path/to/output.jpg'

You can see the resize + crop combination of commands being built by Paperclip according to their documentation:

Paperclip also adds the “#” option (e.g. “50×50#”), which will resize the image to fit maximally inside the dimensions and then crop the rest off (weighted at the center).

Well, that is no good! If an image is not a 2:1 aspect ratio as per my dimensions 630×315 (or whatever aspect ratio you have) YOU WILL LOSE CONTENT! Time to rethink this…

The dimensions are 306x306
The dimensions are 306×306

 

The image was scaled until it was large enough to cover a 630x315 canvas, then the tops and bottoms cropped off
The image was resized until it was large enough to cover a 630×315 canvas, then the top and bottom was cropped off

Instead of resizing maximally (so that an image is a minimum of 630 width AND a minimum of 315 height) lets resize minimally (so that an image is a maximum of 630 width OR 315 height). The aspect ratio is preserved in both scenarios.

We want to resize while preserving aspect ratio , but we also need to make our canvas 630×315. The canvas dimensions are referred to as the extent command. When we do this, we will likely have space on the top and bottom, or sides we need to fill to have exactly these dimensions. What you fill this background with can be a color (in my case white). We also likely want to center the minimally resized image on this canvas. You can pass these convert options into Paperclip like so:

convert_options: {
:'630x315' => " -background white -gravity center -extent 630x315",
}

The resulting command will look something like this:

convert '/path/to/source.jpg' -resize "630x315" -background white -gravity center -extent 630x315 '/path/to/output.jpg'

Notice that our lossy crop flag has been replaced with a nicer extent flag. We can see the results this has on a similarly sized image:

We have an image smaller than the target 630x315
We have an image smaller than the target 630×315

 

We now have a 630x315 image with the sides filled in with a white background to preserve the dimensions
We now have a 630×315 image with the sides filled in with a white background to preserve the dimensions. Note the sides of the image are white.

As a final note, this works for both images larger and smaller than the target outcome dimensions. Looking at the ImageMagick documentation for flags can be helpful, but daunting as the real power lies in chaining multiple flags for a desired effect.

With a little effort I was able to get what is (in my opinion) a better image resize with just a few custom flags.

Computers, Linux, Open-source, Ruby, Software, Web

Migrating from Bamboo to Cedar

Heroku recently sent me a nice email:

You have one or more applications deployed to Heroku’s Bamboo stack. Bamboo went live almost four years ago, and the time has come for us to retire it.

You may have received a previous message about Bamboo deprecation. That message may have had an erroneous app list, but this email has the correct list of Bamboo apps that you own or are collaborated on. We’re sorry for any confusion.

On June 16th 2015, one year from now, we will shut down the Bamboo stack. The following app/apps you own currently use the Bamboo stack and must be migrated:

This is on the heels of an email about ending legacy routing support. It seems they have been quite busy over at Heroku, but I can’t complain too seriously about free app hosting.

Upgrading a legacy Rails application to the Cedar stack did require a few changes. I’ll document some stumbling blocks for posterity.

Foreman

The first big change described in the general purpose upgrading article from Heroku: https://devcenter.heroku.com/articles/cedar-migration was the use of foreman to manage your web services. I luckily have a simple app, and did not need to worry about resque, mailers, etc. This made my Procfile rather straightforward:

web: bundle exec unicorn -p $PORT -E $RACK_ENV -c config/unicorn.rb

Absent from the foreman crash course is any information about the corresponding .env file. I did need to add a few environmental variables to keep on working in development as usual:


RACK_ENV=development
PORT=3000

Once the Procfile was committed (end the .env) file added to .gitignore I then added Unicorn to the gemfile. I’m not sure if Unicorn is strictly necessary over webrick in development, however this is the example in the Cedar upgrade guide, so I wanted to run as close to production as practical to prevent any surprises.

After installing Unicorn, I then needed to touch config/unicorn.rb since it did not exist. Against, its in the example, but I’m not sure if its strictly necessary especially given that its just an empty file for me. To start your Rails application, you now issue foreman instead of of the older rails s

Devise incompatability

Not directly related to the Cedar changes, but a common gem, so worth mentioning. Devise 2.x has removed migration helpers. I stupidly didn’t lock my version of Devise in my Gemfile so I was confused why this was failing for me. I found the last 1.x version of Devise by running this command:

gem list devise --remote --all, and the specifying ‘1.5.3’ as the second argument in my Gemfile on the devise entry.

PostgreSQL

Heroku requires Postgres in production and I was previously using sqlite in my development environment. Again, to mirror production I wanted to use Postgres my development environment so that I could be as close to production setup as practical. I took a quick trip down memory lane to setup and configure PostgreSQL on my Linux development machine: https://mrfrosti.com/2011/11/postgresql-for-ruby-on-rails-on-ubuntu/ . An interesting observation is that in Ubuntu 14.04 LTS, PostgreSQL runs on port 5433, and not the default 5432. This can be verified with netstat -nlp | grep 5432. If no entries come back, PostgreSQL is running on a non-default port. Try to grep for 5433, or other numbers until you find the process.

Backing up your database

Before I made any server changes, I wanted to have an up to date copy of the production databse on my machine to prepare for the worst. This can be done by using the pgbackup commands:


heroku pgbackups:capture //create a backup
heroku pgbacksup:url //make a public URL for that backup

Then I clicked the public URL and downloaded the file – safe and sound to my computer. By the way, you can import this production database into your local instance of PostgreSQL using the following command:


pg_restore --verbose --clean --no-acl --no-owner -d development /path/to/database

Pre-provisioning your database

A quick note on database availability. I got into a chicken and the egg scenario where the app wouldn’t deploy without the database, and the database seemingly couldn’t be created without the app first being deployed. Heroku has an article on pre provisioning and I found it a necessary prerequisite to deploying to my newly created test Cedar stack: https://devcenter.heroku.com/articles/pre-provision-database

To pre-provision your database, run the following:

heroku addons:add heroku-postgresql

You can even import the database from production, etc as part of the heroku utility. I used the public database URL I created above to populate my new cedar stack database:


heroku pgbackups:restore

Migrating to Cedar

Once I had tested on a new app, and in a feature branch I had confidence everything was working as expected. I was ready to make my changes to the production app by migrating it to Cedar. To do this, the command is:


heroku stack:migrate cedar

I either have an old version of the Heroku gem, or this is a discrepency in the gem and the non-gem packaging, but the docs misidentify this command as: heroku set:stack cedar which was not a valid command for me. The migrate command above appears to be analagous.

Once I merged my cedar feature branch back into master I was ready to push to master. And FAIL. It turns out that I needed to precompile my assets, which had a dependency on the database existing. I tried to pre-provision as I had done on my cedar branch, however the results were the same after running this command.

A quick search yielded https://devcenter.heroku.com/articles/rails-asset-pipeline#troubleshooting the advise to add the following line in the config/application.rb file:

config.assets.initialize_on_precompile = false

Summary

I’ve learned quite a bit about Heroku in this upgrade experience. Their changes force me to use the latest software which is nice in some ways. When everything is running on my website, I don’t often worry about upgrading until I get an email like the one above.

The downside of course is that this upgrade process is a pain in the ass, and is error prone, and affects production websites that are running smoothly. If it isn’t broken, you don’t want to fix it. Except this time you have to in order to have it continue to function after June, 2015.

Best of luck to other people upgrading. Just be patient, and test everything in a new app if you have any doubts.

Apple, Computers, Ruby, Software

Upgrading Ruby with Rbenv+Homebrew

Heroku has defaulted to Ruby 2.0 for all applications, so its probably time you updated that crufty old version you have been running. Unfortunately the process is less than straightforward, especially when using a version manager. Assuming you are running rbenv with a Homebrew version of ruby-build, this guide will get you running the latest version of Ruby:

To begin, check which versions of Ruby Rbenv knows about. Rbenv delegates this work to ruby-build:

rbenv install --list

Best case scenario you have a recent version of ruby-build and you see the version of Ruby you want in this listing. At the time of this writing, version 2.0.0-p247 is the most current. If your desired version is present, skip the following steps and just install the Ruby version via:

rbenv install 2.0.0-p247

If your version is not present in the list, you will need to upgrade ruby-build so rbenv knows about the more recent versions of Ruby. Assuming you installed ruby-build via Homebrew, you can update it by issuing:

sudo brew update

Issuing this command complained about having untracked files within the Homebrew directory (which is actually just a git clone of the homebrew project). This may not be the correct way to fix this problem, but I issued the following command to stash these untracked files, and uncommited changes so they don’t interfere with the upgrade process:

cd /usr/local && git add . && git stash

This should now be a clean directory, and you can issue the brew update command again.

Now that brew is updated, you should have the latest “formula” for ruby-build. You can then issue the command to update ruby-build itself:

brew upgrade ruby-build

Once this completes, we can list versions of Ruby via rbenv again to ensure our desired Ruby version is now in the list. Once you see this, you can issue the following command to install a known version of Ruby:

rbenv install 2.0.0-p247

To use your shiny new version of Ruby, you can set this to be the default version:

rbenv global 2.0.0-p247

You can also set this per project, or by setting an environmental variable to override, so don’t worry if not all your projects are Ruby 2.0 ready. You can easily switch between versions – and that’s the point of version management right?

You can confirm you are running the latest Ruby version by issuing:

ruby -v

Note that you will need to re-bundle any gems from your Gemfile against the new Ruby version, as well as rehash any rbenv shims for these gem executables:

gem install bundler && bundle install && rbenv rehash

For more information, check out the rbenv, and ruby-build documentation. To discover the latest stable version of Ruby you can peek at the official Ruby download page to find out the latest version and patch number. Finally, check out the Homebrew docs if you are still stuck.

Computers, Open-source, Ruby, Web

Making Rails Routes Fantastic

I took a week off from work to move into our new house. It was a time of rest, and relaxation despite the chaos around what moving can bring. I’ve had a few personal projects on the back burner, but never seemed to have the time or the energy to make much progress. Recently a talk with a friend reminded me how important completing those pet projects can be for your personal happiness. I’m proud to present the completion of an idea I’ve had a for a while: Routastic.

What is Routastic? It serves as an interactive Rails routes editor. Simply, I got tired of the pattern of modifying config/routes.rb, then running rake routes and grepping for some result. This is completely inefficient. My inspiration came from the beautiful Rubular.com and its interactive regular expression building. Its quick. Its painless. Its a valuable tool for everyday programming.

Please check out http://routastic.herokuapp.com/ and let me know how I can improve it.

Special thanks to Avand Amiri for suggesting the name (despite the name screaming Web 2.0, it actually is quite memorable!)

Apple, Computers, Events, Family, Personal, Ruby, Software, Thoughts, Vacations, Web

Cloudy, Cold and Hip – Two Weeks of Training in Portland

I’ve really enjoyed the last two weeks. My new employer, recently acquired Analog Analytics flew me out to Portland, Oregon for training. Portland is quite an amazing place. Skateboarders, cyclists, and runners abound, but with a laid back attitude. Its the greenest city I have ever visited. Stores seem to only dispense recyclable materials including paper bags, and foods in waxed cardboard containers. The entire city is very walkable without much danger of personal harm. The food was amazing, and the drinks even better. This city knows its coffees, teas, and beers. It has to be home to the most microbreweries of any city. Needless to say I have probably gained 5 pounds, and I am super caffeinated. Also, the proximity to all these hip restaurants is giving me second thoughts about living so far outside of the city limits. No lie, I even glanced at Portland housing prices.

It took me a few days to get oriented to the city and the work environment. The company runs out of the Ford Building, in the heart of quite a few cool restaurants and bars in the Southeast side of the city. In fact, it left me a little jealous considering the hotel is only surrounded by fast food joints.  I got a shiny new MacBook Pro (which I am currently battling to make it as “boring” as possible). I can’t talk too much about the work, but it does hit the sweet spot of what I was looking for – a small team feel with deep pockets, and a launch date.

Kristin and Morrigan joined me for the second week and did their own thing, and they had a blast. They visited OMSI, Powell Books, Finnegans, several parks, and malls, and some tasty food joints. I’m happy they got to experience some of what makes this city awesome.

I’m enjoying several aspects of the job in particular: A remote driven environment, and pair programming. Training isn’t the best test run of this environment, as I am in the office everyday for now. Once I am setup, I pick the hours. People hop online and offline, according to their time zones, availability, etc. Every piece of communication, and workflow is centered around remote teams.

Pair programming makes programming social. Despite the image that telling someone you are a programmer conjures, I really enjoy interacting with people. I remember teaming up with James, John and many others at Clayton State to tackle some large issues with our portal and other systems. Since Clayton State, I have worked on a couple teams, and it was almost always in isolation, save for 5-10 minute high level meetings. The best part is, its actually kind of fun.

Pair programming was a tough adjustment for me. I’m used to presenting a final product and defending its implementation. I have all the answers. I know what the talking points are up front, and I am comfortable because I am the authority on the subject. Pair programming is letting your guard down, and conceding as much as contributing. You are two people working on a problem together, with neither party starting off knowing the complete solution. The work is certainly slower than solo programming, as incorporating input, early refactoring, and general discussion takes up time. This team takes an interesting approach to combat some of the time drain; You can either pair program and merge directly, or work solo but your code requires a peer review before merging. The choice is yours. The solo programming option will probably act as a safety value for those days when I just want some time to myself. They also encourage “switching drivers” to vary the work. Interestingly, being the passenger requires more focus than driving, as you are trying to proactively find issues with the current approach.

I’m still struggling to embrace TDD. I don’t like the zealotry in the community when the topic comes up; presenting the only two options as either you test first, or you are just ignorant, undisciplined, or apathetic to the code you write. The truth is far from it. I figure things out by moving the pieces around – not by staring at it from a distance. That is not to say that there aren’t times when testing first is extremely useful, like when clarifying requirements. The test assertions (even with missing test bodies) is often enough to help solidify an attack plan. The amount of code coverage can be a hindrance though, as real world tests always end up being more tightly coupled than you ideally want them to be. If you make seemingly small code changes, you can end up with quite a bit of the test suite failing (all though with the same few errors repeating). If you mock and stub too much, you aren’t testing much that is useful. Even worse, the workflow doesn’t seem realistic: Write the tests, verify the tests fail, write the code, verify the tests pass. The reality seems to be write the tests (heavily guessing at the exact implementation), verify they fail, write the code, refactor almost all of your tests, and verify they pass. Given the choice, I think I’d still rather write code, then test the code to verify it does what I want in all scenarios. I’ve yet to meet a dyed-in-the-wool TDDer that sees any fault with this extra refactoring step. The subject of pre-written tests needing to be refactored seems to be glossed over. Maybe my opinion will be changed yet.

Things are looking awesome for this next step in my life! I’m keeping my fingers crossed for Railsconf tickets, since they are in my employer’s backyard. There are also a few missed restaurants I am meaning to visit next time I’m back up this way…

Apple, Computers, Linux, Open-source, Ruby, Software, Thoughts, Web

PostgreSQL for Ruby on Rails on Ubuntu

My new desktop came in at work this week, and the installation was painless thanks to the great driver support of Ubuntu 11.10. For anyone setting up a Rails development box based on Linux, I have some tips to get around some pain points when using a PostgresSQL database.

Installation:

Postgres can be quickly and easily installed using apt-get on Debian or Ubuntu based distributions. Issue the command:

apt-get install postgresql

Ruby Driver

In order for Ruby to connect to PostgreSQL databases, you will need to install the pg gem. This gem will need the development package of PostgreSQL to successfully build its native extension. To install the PostgreSQL development package, issue the following command:

apt-get install libpq-dev # EDIT: postgresql-dev was replaced by this package on Ubuntu 11.10

Setup A PostgreSQL Role

You can configure PostgreSQL to allow your account to have superuser access, allowing your Rails tasks to create and drop databases. This is useful for development, but is strongly discouraged for a production. That being said, we can create a PostgreSQL role by logging into psql as postgres as follows:

su postgres -c psql

This will open a PostgreSQL prompt as the database owner postgres. Next, we need to create an account for our user. This should match the response from “whoami”:

create role  superuser login;

We can now exit from psql by issuing “q“. Try to connect to psql directly by issuing the following command from your shell account:

psql postgres

This should allow you to connect to the default database postgres without being prompted for credentials. You should now be able to issue the rake commands for creating, and dropping the database:

rake db:create

Rspec Prompts for Credentials

I was being prompted by Rspec for credentials when running my test suite. If you would like to remove this credential prompt, please read the following:

There are differences in how the PostgreSQL package is configured in Homebrew on OS X, and how it is packaged in the Ubuntu and across other distributions. One difference is in the level of security configured in the pg_hba.conf file. This file is responsible for identifying which sources using which authentication mechanisms should be allowed or denied. By default, Rspec will cause a prompt for a password even if your shell account has trusted permissions. This is because Rspec connects not as a local process, but to localhost. To allow connections to localhost to be trusted, you will need to modify the pg_hba.conf file.

Next, we can modify the pg_hba.conf file located at /etc/postgresql/<version>/main/pg_hba.conf

Comment out the lines any lines at the bottom of the file and append the following:

local   all             all                                      trust
host    all             all              127.0.0.1/32            trust
host    all             all              ::1/128                 trust

This will allow connections from the shell, as well as connections to 127.0.0.1 (localhost) using both IPv4 and IPv6.

You will need to restart PostgreSQL for the changes from this file to take affect:

/etc/init.d/postgresql restart

PostgreSQL Extensions

If you want to make use of any of the additional extensions to Postgres, including fuzzystrmatching, you will need to install the postgresql-contrib package:

apt-get install postgresql-contrib

The extensions will install to /usr/share/postgresql/<version>/extension/

Using the Postgres version 9, you can create these extensions in your database by using the new CREATE EXTENSION syntax. In the case of the fuzzystrmatch extensions, you can issue the following command from inside a PostgresSQL command prompt to load the extensions:

psql ;

Once inside your database:

create extension fuzzystrmatch;
Computers, Open-source, Ruby, Software

Storing Data Using Numerical Representation

Numerical representation is a great way to store a set of boolean values in a single database column. Lets say that you have a user that has multiple roles. We want to store all of a user’s roles inside a single database column. This single-column approach may be more desirable than a separate column to track each role for the user. We can condense this data into a single column using the numerical representation technique.

If you have ever had to work with Unix permissions, you have already been exposed to numerical representation. A file in Unix can have read, write, and/or execute permissions for a user, group, or other. These permissions are represented in notation octal notation. The permissions are as follows:

  • execute: 1 (Base 2: 001)
  • read: 2 (Base 2: 010)
  • write: 4 (Base 2: 100)

If a user has a single permission on a record, then the numerical representation for that data is stored, as you would expect. “Read” would be stored as “2”. To store a combination of permissions, you sum the separate numerical representations. “Read” and “Write” would be stored as “6”. This works because each value is a power of 2, and so has an unambiguous representation. Take the position in the array, and power 2 to this integer. You will end up with the constants 0, 1, 2, 4, 8, 16, etc. Using powers of 2, bitwise operators can help us easily calculate whether a given integer is included in the sum.

So how can we represent this in Ruby? Lets take a look at our scenario from earlier. Lets represent our user model as:

# app/models/user.rb

# == Schema Info
#
# Table name: users
#
#  role                  :integer(11)
class User < ActiveRecord::Base
  # roles
  EXECUTE, READ, WRITE = 1, 2, 4
end

To keep track of the roles, I have defined them as constants, with numerical values that are powers of 2. We can create a custom setter for the role that takes a collection of these constants. These constants have a numerical value that we can sum together to get the integer to store:

# app/models/user.rb

# == Schema Info
#
# Table name: users
#
#  role                  :integer(11)
class User < ActiveRecord::Base
  # roles
  EXECUTE, READ, WRITE = 1, 2, 4

  # Example usage: role=([User::EXECUTE, User::READ])
  # Sets role column in database to sum of array
  def role=(roles)
    super(Array(roles).inject(&:+))
  end
end

In the role= method, we cast our values to an array, then we use inject to sum them all together. The super method provided by ActiveRecord is then called to set the role column in the database to this integer.

Now that our setter is complete, we can use a bitwise operation to determine if our sum contains our constant. Using a bitwise OR operator, if the sum bitwise or’ed against a constant returns the sum, then the constant is included. To express this in code would look like this:

# app/models/user.rb

# == Schema Info
#
# Table name: users
#
#  role                  :integer(11)
class User < ActiveRecord::Base
  # roles
  EXECUTE, READ, WRITE = 1, 2, 4

  # Example usage: has_role?(User::EXECUTE)
  # Returns boolean
  def has_role?(permission)
    (role | self.class.const_get(permission)) == role
  end

  # Example usage: role=([User::EXECUTE, User::READ])
  # Sets role column in database to sum of array
  def role=(roles)
    super(Array(roles).inject(&:+))
  end
end

Here is an example usage:

user = User.first #=> #
user.has_role?(User::READ) #=> false
user.role=([User::READ]) #=> 2
user.has_role?(User::READ) #=> true

There you have it: A simple and easy way to store multiple boolean values in a single column. This solution is very scalable too. As new roles are introduced to the system, just add them as a constant with the next power of 2 value. No database changes are needed to add additional columns. This calculation is even available in SQL when using PostgreSQL, and other database solutions that allow bitwise operators:

select (5 | 1) = 5; -- true
select (5 | 2) = 5; -- false
Computers, Open-source, Ruby

Fuzzy Matching in PostgreSQL with Nicknames

Introduction

If you have had the LIKE comparison in SQL leave you wanting, this post is for you. The reason I say this is because LIKE is the tip of the iceberg when it comes to the searching capabilities of a modern database. Signs that you are ready to move on from LIKE:

  • You are attempting to pad both sides of the LIKE query with wildcards
  • You are joining multiple keywords with a wildcard separator
  • You are using multiple LIKE statements on the same column in your query

 

I have been working on a project lately that uses PostgreSQL as its database, and discovered some interesting ways of locating data by using the contributed fuzzy matching functions. When searching for people in a database there are many things to consider:

  • The search query may contain a typo
  • The user may have an uncommon spelling of their name
  • The user may have a nickname

 

How can we address these scenarios?

Fuzzy Matching Algorithms To the Rescue

Levenshtein is a great algorithm to detect typos in a search query. It operates based on how distant one search term is from another term. Starting with the “source” word, it counts the number of operations (additions, subtractions, substitutions) it takes to arrive at the “destination” word. This make the Levenshtein algorithm particularly good at catching seach typos, and uncommon spellings. Here is an example:

INSERT INTO users (first_name) VALUES ('William'); --Notice the double "L"
SELECT * FROM users WHERE LEVENSHTEIN(LOWER(users.first_name), 'wiliam') < 2;
id first_name last_name created_at updated_at
4 William   2011-08-21 14:54:34.513516 2011-08-21 14:54:34.513516

(1 row)

I am retrieving all records where the result has a distance of two or less from the source word to the target. The number of changes required to go from “wiliam” to “william” is one addition of the letter “L”. You can of course modify the query to change the number of changes allowed. Note that the query is case sensitive. Going from an uppercase letter to the same letter in lowercase will count as 1 substitution. For this reason, make sure you are down-casing everything before comparing to operate on letters only, and not case.

Another choice when locating person records is the phoenetic algorithms of Soundex, Metaphone, and Double Metaphone. These have linguistic awareness of the English language (and other languages in the case of the Metaphone, and DMetaphone algorithms). The Soundex is the oldest phonetic algorithm, and has been deprecated in favor of the more complex Metaphone and DMetaphone algorithms. Nevertheless, if you are working with English only names, Soundex may provide performance benefits that are worth the trade-offs.

Soundex breaks down a word, and assigns the pronunciation a value that can be compared against other pronunciations. Every word when passed through the Soundex function returns a four digit value, resulting in each word being the values of 0 and 4 for least similar, to most similar respectively. DIFFERENCE uses SOUNDEX under the hood. “William” and “Willem” have a pronunciation that is mostly similar. Again, you can adjust this setting by changing the integer that you compare the result of DIFFERENCE against. Here is an example of METAPHONE:

SELECT * FROM users WHERE DIFFERENCE(users.first_name, 'willem') > 2;

Metaphone takes an integer which specifies how specific it should be when calculating the similarity. The higher the integer, the more specific the comparison. While the results are the same as Soundex in this scenario, it is important to remember that the power of METAPHONE, and DMETAPHONE are in the pronunciations of these characters in other languages. This is particularly beneficial when working with non-English names:

SELECT * FROM users WHERE METAPHONE(users.first_name, 2) = METAPHONE('Willem', 2);

Double Metaphone expands on the METAPHONE capabilities, but since it has no additional configuration options, it may not be suitable for your needs. You should experiment with the different fuzzy matching options to get the best result for your application:

SELECT * FROM users WHERE DMETAPHONE(first_name) = DMETAPHONE('Willem');

Nickname Matching

These methods are great when comparing data that is similar in distance, or in pronunciation, however it will not be able to address the concern of the use of nicknames, and alternate names within the system.

Nicknames allow us to match the name “William” with its alternate names of “Bill”, “Bud”, “Will”, and “Willie”. Using the algorithms discussed so far, the name “Will” (and possibly “Willie”) would be the only results of this match. Nicknames represent relationships between names that a database will need to understand to address the last criteria of our search.

In order to accomplish this, I have have leveraged the data from the Nickname and Dominuitive Name Lookup project. I have created a gem that can be added to your Ruby on Rails project that will fetch the nickname data, parse it, and insert it into a nicknames table, along with a foreign key that references related names. This gem will allow you to query nicknames using the following query:

SELECT * from nicknames WHERE nickname_id IN (SELECT nickname_id FROM nicknames WHERE name = 'william');

or in Ruby (check out the project here):

Nickname.for('William').map(&:name)

We can do the following to make a comprehensive search using all methods that we have discussed so far:

-- levenshtein
SELECT * FROM users WHERE LEVENSHTEIN(LOWER(users.first_name), 'wiliam') < 2 
--dmetaphone
OR DMETAPHONE(first_name) = DMETAPHONE('Willem')
--nicknames
OR LOWER(users.first_name) IN (SELECT name from nicknames WHERE nickname_id IN (SELECT nickname_id FROM nicknames WHERE name = 'william'));

or in Ruby:

# app/models/user.rb
class User < ActiveRecord::Base
  scope :levenshtein, lambda {|term| where(["LEVENSHTEIN(LOWER(first_name), LOWER(?)) < 2", term])}
  scope :dmetaphone, lambda {|term| where(["DMETAPHONE(first_name) = DMETAPHONE(?)", term])}
  scope :nicknames, lambda {|term| where(["LOWER(first_name) IN (?)", Nickname.for(term.downcase).map(&:name)])}

  def self.fuzzy_match(term)
    levenshtein(term) | dmetaphone(term) | nicknames(term)
  end 
end

User.fuzzy_match('william')

This will generate separate queries when calling fuzzy_match. This has the benefit of encapsulating each algorithm, but at the cost of worse database performance. If performance is desired over encapsulation, restructure the query to include these comparisons in the same WHERE clause, as is shown in the SQL method.

TL;DR

Stop using LIKE if anything more than the bare minimum comparison is needed. The time involved in setting up these extra functions in PostgreSQL is just a few minutes, and it will greatly enhance the search capabilities available to you and your users.

Additional references:

Computers, Linux, Open-source, Ruby, Software, Web

Setup PostgreSQL with Rails on Linux

Today, I found myself needing to setup a Rails application to work with the PostgreSQL database. I found that the documentation on the PostgreSQL website was like drinking from a fire hose. Worse was every community response for an error message has a slightly different approach to the solution. Lets run through a basic Rails PostgreSQL configuration assuming Rails 3, Postgres 8.x, and Ubuntu 11.04:

Step 1: Installing PostgreSQL and Libraries

Install the PostgresSQL server, the client package to connect (psql), and the pg library needed to compile the Ruby PostgreSQL driver:

$ sudo apt-get install postgresql postgresql-client libpq-dev

After this finishes installing, you can turn to your OS X co-worker and laugh at him while he is still downloading the first tarball file. PostgreSQL will start automatically, under the user postgres. You can verify that the installation is a success by using the psql command line utility to connect as the user postgres. This can be accomplished using the following command:

$ sudo -u postgres psql

This uses sudo to elevate your basic user privileges, and the “-u” switch will execute the following command as an alternate user. As the postgres user, this will run psql. If you connect successfully, you should be at the psql interactive prompt. If not, ensure PostgreSQL is running, and that psql is in the path for postgres.

Note: From the psql interactive prompt, type “q” to exit.

Step 2: Configure a New PostgreSQL database

From the psql prompt, you can run SQL to view the current PostgreSQL users:

select * from pg_user;

You should see a table of database users returned:

usename usesysid usecreatedb usesuper usecatupd passwd valuntil useconfig
postgres 10 t t t ********    

(1 row)

We can see the postgres user that was created automatically during the installation of PostgreSQL. Lets add another user to be an owner for our Rails database. The path of least resistance may be to use your shell account username, since it will keep us from having to change some options in the database configuration file.

$ sudo -u postgres createuser 
# Shall the new role be a superuser? (y/n) n
# Shall the new role be allowed to create databases? (y/n) y
# Shall the new role be allowed to create more new roles? (y/n) n

This will create a new database user (named your shell account name), and grant that user access to login to the database. This will ask you a few questions regarding the user account. It is important for Rails that you answer “y” to whether the user should be able to create databases. If you say no, you will not be able to run any rake tasks that create or drop the database.

We can confirm by selecting from the pg_user table again.

$ sudo -u postgres psql
select * from pg_user;
usename usesysid usecreatedb usesuper usecatupd passwd valuntil useconfig
postgres 10 t t t ********    
<username> 16391 f f f ********    

(2 rows)

Step 3: Configure Rails

Switching to the Rails side, lets configure our application for Postgres. This requires the pg gem. Open your Gemfile and append:

# Gemfile
gem "pg"

Now run bundle install to update your project gems.

$ bundle install

This should compile the Ruby pg database driver, allowing Ruby to talk to Postgres. Now, lets tell our Rails application how to access our database. Open up config/database.yml, and change the adapter line to read “postgresql”. The database name by convention is the name of your project plus “_development”. Finally, your shell username is needed. Because PostgreSQL will authenticate this account locally, you will not need to supply a password option. Delete this line.

# config/database.yml
development:
  adapter: postgresql
  encoding: unicode
  database: _development
  pool: 5
  username: 

To test, run the rake task to create your database:

rake db:create

If everything works, you should have a newly created database owned by your shell account. You can login using psql by passing the name of the database as an option:

$ psql -d _development

Happy migrating!

Troubleshooting

If you get the error: “FATAL: Ident authentication failed for user “, ensure that you can see your newly created account in the pg_user table of the postgres database. (See Step 2 above)

If you get the error: “PGError: ERROR: permission denied to create database”, then ensure that your database user account has been granted the privilege CREATE. This can be done during the “createuser” command line account creation by answering “y” to the corresponding question about this permission.

If you get the error: “FATAL: role is not permitted to log in”, try manually granting the privilege to login on your database user account. This can be done by executing the following as postgres in the psql prompt:

ALTER ROLE  LOGIN;

Notes on Alternative Authentications

PostgreSQL integrates very deeply into the Linux authentication world, allowing for quite an array of connection options. By default passwords are not accepted for local connections. Instead, PostgreSQL is configured to use the “ident sameuser” method of user authentication. See more at http://www.postgresql.org/docs/8.4/static/auth-pg-hba-conf.html.

Computers, Open-source, Personal, Ruby, Software, Thoughts, Web

Another Helping of Abstraction, Please

Rails 3.1 is soon to be released, and with it comes two new abstraction libraries – CoffeeScript, and S(ass)CSS. These libraries are used to generate Javascript code, and CSS definitions respectively. While abstraction libraries are nothing new to Rails, the inclusion of two more got me thinking about the direction that Rails stack is heading.

CoffeeScript’s syntax seems to be to make Javascript as Ruby-ish as possible. It describes Javascript’s curly braces and semicolons as embarrassing.

SCSS aims to address some of the repetitive elements of CSS through the use of variables, nesting, and mixins. This feels more acceptable to me than CoffeeScript, but my first encounter left me burned.

A few other abstraction libraries of relevance: Haml aims to generate HTML without the use of HTML tags. Additionally, Squeel‘s (MetaWhere 2.0) aim is to remove all SQL from your projects.

So what am I bitching about? Abstraction is a good thing right? I see two categories of abstraction. The first being the “good” kind, that allow you to be completely ignorant of the underpinnings. For example, Ruby converting down into machine code.

The “bad” kind of abstraction are the substitution of a language with a DSL. This creates a lot of issues starting with development and debugging. In the case of CoffeeScript and SASS, you have to compile the DSL files into Javascript, and CSS files. I feel like this compile step is a step back from what we gain working with dynamic languages like Ruby, and Javascript to begin with.

Development in these libraries also requires that you understand both the DSL of the library, as well as being familiar with the language of the generated code. This additional skill requirement adds time to a project, and raises the entry bar for new developers. Its difficult to be proficient at a language, and a DSL that generates that language at the same time. A Ruby developer told me yesterday that he was surprised at how rusty his knowledge of SQL had gotten. Its shocking to me that a web developer would struggle with SQL, but I think its an accurate sentiment on which many Rails developers would agree.

Another casualty of abstraction is performance. Not only is the generated code sub-optimized relative to coding it by hand, it is also being run through through more system calls to get there. You can either compile up front (CoffeeScript, SASS), or you can incur this penalty on-the-fly (Haml, Squeel).

While none of the libraries are a requirement of web development, when working on a team that uses these technologies you are expected to produce consistent code. Even though these libraries let you execute “native” code, doing so is discouraged because of the availability of the DSL. The syntax for embedding native code (if its even allowed) is often cumbersome, and loses editor functionality such as syntax highlighting and parsing.

Since when did Ruby on Rails web developers stop working with SQL, CSS, HTML, and Javascript? I am beginning to feel like the Ruby camp is becoming the far left extremists of the web development world. The web is built on these core technologies, and the benefits of abstracting them doesn’t seem to outweigh the costs.