Computers, Open-source, Ruby, Software

Upgrading to Rails 4

Recently, I started a new job and the first big assignment was to upgrade their software stack to a more recent version. They are running Rails 3.2 and wanted to upgrade as far forward as they can. With Rails 3.2 support gone for all but severe bug fixes, and Rails 5 due any month now, this is something they wisely didn’t want to put off.

Its a smaller company, and they have been open to a lot of my feedback and suggestions. I was basically given the reins and told to do what needed to be done to get us upgraded.

So the first task was some research, and I stumbled upon the official Rails upgrade guide pretty quickly. It nicely outlines the breaking changes. Fortunately, the big change was to strong parameters, but this can be deferred by including protected_attributes and kicking this can down the road. We will be logging what controller actions receive which parameters, instead of raising so we will have some time to collect some data before we switch over in one painful release.

The guides stressed that the test suite is critical during the upgrade. I was fortunate enough to have a project with adequate testing coverage. It wasn’t the 80% sweet spot, but it was certainly valuable at ~40%. However, the suite had fallen into disuse, so the first task was to get them back to green.

Once the test suite was green, it became a matter of KEEPING it green. Luck smiled a second time and they had an old CI server that had fallen into disuse. It was powered by CruiseControl.rb and it was little fuss to get it back up and running again. The migrations could no longer be played from the projects inception to the current time.

This is where luck stopped smiling upon me. The project did not track db/schema.rb and the migrations were not playable. The only way to get an instance of the database was to download the schema from production. Not the best practice, so I went about tracking the schema, and getting adoption of this new practice. Further complicating the schema approach was the decision to move all older migrations into subfolders in db/migrate by year (e.g. 2011, 2012, etc). This was done I found out because Textmate doesn’t like big directories. The issue is that db:schema:load isn’t recursive in its retrieval of migration versions. It took me a bit to understand what was happening, and how it was happening. After a failed monkey patch to the migrator logic in ActiveRecord, I decided to just move the migrations back into db/migrate and eliminate the subdirectories. Sorry Textmate!

Now the database could be rapidly provisioned, and I got a seed working with a minimal set of data. Back in CI I reconfigured the build script to use db:schema:load instead of db:migrate and with the green test suite, we got builds working again.

We used a utility called CC Menu to show the build status in the notification bar in OS X: http://ccmenu.org/

To make the builds even more visible, I discovered an integration with Slack to report the build status in our chat. https://github.com/epinault/cruisecontrolrb-to-slack . I made my own fork and added some famous movie quotes for successes and failures since I found the default messages lacking: https://github.com/bsimpson/cruisecontrolrb-to-slack . I didn’t think our female developers would appreciate the “you’re a stud!” message.

Back to the Rails 4 upgrade. The tests are passing in master, so I made a feature branch that will be long lived called “rails-upgrade”. I merge master in daily. The “rails-upgrade” branch will serve as an integration point for other features branches that will merge into it. The plan is to keep any upgrade related changes out of master until its time to deploy. That means separate branches, separate CI builds, and separate staging servers for manual QA.

One lesson I’ve learned is that a deprecation warning may not always be just informational. In particular, Rails 4 requires all scopes to be callable (lambdas, or procs). This was breaking the way that associations with scopes would be built: users.roles.admin.find_or_create! would previously find an associated admin record, or create it. However, in Rails 4, it fails creation because the role’s reference to user is nil. I’m not sure why, but its reproducable, and changing the admin scope on Role to a callable restores this reference back to user.

Ideally, I’d have wanted to get the test suite green before tackling deprecation warnings because I want to change as little as possible before I get back to a known good status. However, not fixing this deprecation warning was actually causing tests to fail.

Now we are down to a handful of failings tests on Rails 4. Most deal with the ActiveRecord syntax changes. Hopeful I can get these knocked out quickly. Then its on to manual QA.

In summary – get your test suite green. Keep it green. Do the upgrade and get it back to green. Then make any changes to remove deprecation warnings, keeping the suite green. The test suite is your guide during an upgrade like this.

Advertisements
Computers, Open-source, Ruby, Web

Making Rails Routes Fantastic

I took a week off from work to move into our new house. It was a time of rest, and relaxation despite the chaos around what moving can bring. I’ve had a few personal projects on the back burner, but never seemed to have the time or the energy to make much progress. Recently a talk with a friend reminded me how important completing those pet projects can be for your personal happiness. I’m proud to present the completion of an idea I’ve had a for a while: Routastic.

What is Routastic? It serves as an interactive Rails routes editor. Simply, I got tired of the pattern of modifying config/routes.rb, then running rake routes and grepping for some result. This is completely inefficient. My inspiration came from the beautiful Rubular.com and its interactive regular expression building. Its quick. Its painless. Its a valuable tool for everyday programming.

Please check out http://routastic.herokuapp.com/ and let me know how I can improve it.

Special thanks to Avand Amiri for suggesting the name (despite the name screaming Web 2.0, it actually is quite memorable!)

Apple, Computers, Linux, Open-source, Ruby, Software, Thoughts, Web

PostgreSQL for Ruby on Rails on Ubuntu

My new desktop came in at work this week, and the installation was painless thanks to the great driver support of Ubuntu 11.10. For anyone setting up a Rails development box based on Linux, I have some tips to get around some pain points when using a PostgresSQL database.

Installation:

Postgres can be quickly and easily installed using apt-get on Debian or Ubuntu based distributions. Issue the command:

apt-get install postgresql

Ruby Driver

In order for Ruby to connect to PostgreSQL databases, you will need to install the pg gem. This gem will need the development package of PostgreSQL to successfully build its native extension. To install the PostgreSQL development package, issue the following command:

apt-get install libpq-dev # EDIT: postgresql-dev was replaced by this package on Ubuntu 11.10

Setup A PostgreSQL Role

You can configure PostgreSQL to allow your account to have superuser access, allowing your Rails tasks to create and drop databases. This is useful for development, but is strongly discouraged for a production. That being said, we can create a PostgreSQL role by logging into psql as postgres as follows:

su postgres -c psql

This will open a PostgreSQL prompt as the database owner postgres. Next, we need to create an account for our user. This should match the response from “whoami”:

create role  superuser login;

We can now exit from psql by issuing “q“. Try to connect to psql directly by issuing the following command from your shell account:

psql postgres

This should allow you to connect to the default database postgres without being prompted for credentials. You should now be able to issue the rake commands for creating, and dropping the database:

rake db:create

Rspec Prompts for Credentials

I was being prompted by Rspec for credentials when running my test suite. If you would like to remove this credential prompt, please read the following:

There are differences in how the PostgreSQL package is configured in Homebrew on OS X, and how it is packaged in the Ubuntu and across other distributions. One difference is in the level of security configured in the pg_hba.conf file. This file is responsible for identifying which sources using which authentication mechanisms should be allowed or denied. By default, Rspec will cause a prompt for a password even if your shell account has trusted permissions. This is because Rspec connects not as a local process, but to localhost. To allow connections to localhost to be trusted, you will need to modify the pg_hba.conf file.

Next, we can modify the pg_hba.conf file located at /etc/postgresql/<version>/main/pg_hba.conf

Comment out the lines any lines at the bottom of the file and append the following:

local   all             all                                      trust
host    all             all              127.0.0.1/32            trust
host    all             all              ::1/128                 trust

This will allow connections from the shell, as well as connections to 127.0.0.1 (localhost) using both IPv4 and IPv6.

You will need to restart PostgreSQL for the changes from this file to take affect:

/etc/init.d/postgresql restart

PostgreSQL Extensions

If you want to make use of any of the additional extensions to Postgres, including fuzzystrmatching, you will need to install the postgresql-contrib package:

apt-get install postgresql-contrib

The extensions will install to /usr/share/postgresql/<version>/extension/

Using the Postgres version 9, you can create these extensions in your database by using the new CREATE EXTENSION syntax. In the case of the fuzzystrmatch extensions, you can issue the following command from inside a PostgresSQL command prompt to load the extensions:

psql ;

Once inside your database:

create extension fuzzystrmatch;
Computers, Open-source, Ruby

Fuzzy Matching in PostgreSQL with Nicknames

Introduction

If you have had the LIKE comparison in SQL leave you wanting, this post is for you. The reason I say this is because LIKE is the tip of the iceberg when it comes to the searching capabilities of a modern database. Signs that you are ready to move on from LIKE:

  • You are attempting to pad both sides of the LIKE query with wildcards
  • You are joining multiple keywords with a wildcard separator
  • You are using multiple LIKE statements on the same column in your query

 

I have been working on a project lately that uses PostgreSQL as its database, and discovered some interesting ways of locating data by using the contributed fuzzy matching functions. When searching for people in a database there are many things to consider:

  • The search query may contain a typo
  • The user may have an uncommon spelling of their name
  • The user may have a nickname

 

How can we address these scenarios?

Fuzzy Matching Algorithms To the Rescue

Levenshtein is a great algorithm to detect typos in a search query. It operates based on how distant one search term is from another term. Starting with the “source” word, it counts the number of operations (additions, subtractions, substitutions) it takes to arrive at the “destination” word. This make the Levenshtein algorithm particularly good at catching seach typos, and uncommon spellings. Here is an example:

INSERT INTO users (first_name) VALUES ('William'); --Notice the double "L"
SELECT * FROM users WHERE LEVENSHTEIN(LOWER(users.first_name), 'wiliam') < 2;
id first_name last_name created_at updated_at
4 William   2011-08-21 14:54:34.513516 2011-08-21 14:54:34.513516

(1 row)

I am retrieving all records where the result has a distance of two or less from the source word to the target. The number of changes required to go from “wiliam” to “william” is one addition of the letter “L”. You can of course modify the query to change the number of changes allowed. Note that the query is case sensitive. Going from an uppercase letter to the same letter in lowercase will count as 1 substitution. For this reason, make sure you are down-casing everything before comparing to operate on letters only, and not case.

Another choice when locating person records is the phoenetic algorithms of Soundex, Metaphone, and Double Metaphone. These have linguistic awareness of the English language (and other languages in the case of the Metaphone, and DMetaphone algorithms). The Soundex is the oldest phonetic algorithm, and has been deprecated in favor of the more complex Metaphone and DMetaphone algorithms. Nevertheless, if you are working with English only names, Soundex may provide performance benefits that are worth the trade-offs.

Soundex breaks down a word, and assigns the pronunciation a value that can be compared against other pronunciations. Every word when passed through the Soundex function returns a four digit value, resulting in each word being the values of 0 and 4 for least similar, to most similar respectively. DIFFERENCE uses SOUNDEX under the hood. “William” and “Willem” have a pronunciation that is mostly similar. Again, you can adjust this setting by changing the integer that you compare the result of DIFFERENCE against. Here is an example of METAPHONE:

SELECT * FROM users WHERE DIFFERENCE(users.first_name, 'willem') > 2;

Metaphone takes an integer which specifies how specific it should be when calculating the similarity. The higher the integer, the more specific the comparison. While the results are the same as Soundex in this scenario, it is important to remember that the power of METAPHONE, and DMETAPHONE are in the pronunciations of these characters in other languages. This is particularly beneficial when working with non-English names:

SELECT * FROM users WHERE METAPHONE(users.first_name, 2) = METAPHONE('Willem', 2);

Double Metaphone expands on the METAPHONE capabilities, but since it has no additional configuration options, it may not be suitable for your needs. You should experiment with the different fuzzy matching options to get the best result for your application:

SELECT * FROM users WHERE DMETAPHONE(first_name) = DMETAPHONE('Willem');

Nickname Matching

These methods are great when comparing data that is similar in distance, or in pronunciation, however it will not be able to address the concern of the use of nicknames, and alternate names within the system.

Nicknames allow us to match the name “William” with its alternate names of “Bill”, “Bud”, “Will”, and “Willie”. Using the algorithms discussed so far, the name “Will” (and possibly “Willie”) would be the only results of this match. Nicknames represent relationships between names that a database will need to understand to address the last criteria of our search.

In order to accomplish this, I have have leveraged the data from the Nickname and Dominuitive Name Lookup project. I have created a gem that can be added to your Ruby on Rails project that will fetch the nickname data, parse it, and insert it into a nicknames table, along with a foreign key that references related names. This gem will allow you to query nicknames using the following query:

SELECT * from nicknames WHERE nickname_id IN (SELECT nickname_id FROM nicknames WHERE name = 'william');

or in Ruby (check out the project here):

Nickname.for('William').map(&:name)

We can do the following to make a comprehensive search using all methods that we have discussed so far:

-- levenshtein
SELECT * FROM users WHERE LEVENSHTEIN(LOWER(users.first_name), 'wiliam') < 2 
--dmetaphone
OR DMETAPHONE(first_name) = DMETAPHONE('Willem')
--nicknames
OR LOWER(users.first_name) IN (SELECT name from nicknames WHERE nickname_id IN (SELECT nickname_id FROM nicknames WHERE name = 'william'));

or in Ruby:

# app/models/user.rb
class User < ActiveRecord::Base
  scope :levenshtein, lambda {|term| where(["LEVENSHTEIN(LOWER(first_name), LOWER(?)) < 2", term])}
  scope :dmetaphone, lambda {|term| where(["DMETAPHONE(first_name) = DMETAPHONE(?)", term])}
  scope :nicknames, lambda {|term| where(["LOWER(first_name) IN (?)", Nickname.for(term.downcase).map(&:name)])}

  def self.fuzzy_match(term)
    levenshtein(term) | dmetaphone(term) | nicknames(term)
  end 
end

User.fuzzy_match('william')

This will generate separate queries when calling fuzzy_match. This has the benefit of encapsulating each algorithm, but at the cost of worse database performance. If performance is desired over encapsulation, restructure the query to include these comparisons in the same WHERE clause, as is shown in the SQL method.

TL;DR

Stop using LIKE if anything more than the bare minimum comparison is needed. The time involved in setting up these extra functions in PostgreSQL is just a few minutes, and it will greatly enhance the search capabilities available to you and your users.

Additional references:

Computers, Linux, Open-source, Ruby, Software, Web

Setup PostgreSQL with Rails on Linux

Today, I found myself needing to setup a Rails application to work with the PostgreSQL database. I found that the documentation on the PostgreSQL website was like drinking from a fire hose. Worse was every community response for an error message has a slightly different approach to the solution. Lets run through a basic Rails PostgreSQL configuration assuming Rails 3, Postgres 8.x, and Ubuntu 11.04:

Step 1: Installing PostgreSQL and Libraries

Install the PostgresSQL server, the client package to connect (psql), and the pg library needed to compile the Ruby PostgreSQL driver:

$ sudo apt-get install postgresql postgresql-client libpq-dev

After this finishes installing, you can turn to your OS X co-worker and laugh at him while he is still downloading the first tarball file. PostgreSQL will start automatically, under the user postgres. You can verify that the installation is a success by using the psql command line utility to connect as the user postgres. This can be accomplished using the following command:

$ sudo -u postgres psql

This uses sudo to elevate your basic user privileges, and the “-u” switch will execute the following command as an alternate user. As the postgres user, this will run psql. If you connect successfully, you should be at the psql interactive prompt. If not, ensure PostgreSQL is running, and that psql is in the path for postgres.

Note: From the psql interactive prompt, type “q” to exit.

Step 2: Configure a New PostgreSQL database

From the psql prompt, you can run SQL to view the current PostgreSQL users:

select * from pg_user;

You should see a table of database users returned:

usename usesysid usecreatedb usesuper usecatupd passwd valuntil useconfig
postgres 10 t t t ********    

(1 row)

We can see the postgres user that was created automatically during the installation of PostgreSQL. Lets add another user to be an owner for our Rails database. The path of least resistance may be to use your shell account username, since it will keep us from having to change some options in the database configuration file.

$ sudo -u postgres createuser 
# Shall the new role be a superuser? (y/n) n
# Shall the new role be allowed to create databases? (y/n) y
# Shall the new role be allowed to create more new roles? (y/n) n

This will create a new database user (named your shell account name), and grant that user access to login to the database. This will ask you a few questions regarding the user account. It is important for Rails that you answer “y” to whether the user should be able to create databases. If you say no, you will not be able to run any rake tasks that create or drop the database.

We can confirm by selecting from the pg_user table again.

$ sudo -u postgres psql
select * from pg_user;
usename usesysid usecreatedb usesuper usecatupd passwd valuntil useconfig
postgres 10 t t t ********    
<username> 16391 f f f ********    

(2 rows)

Step 3: Configure Rails

Switching to the Rails side, lets configure our application for Postgres. This requires the pg gem. Open your Gemfile and append:

# Gemfile
gem "pg"

Now run bundle install to update your project gems.

$ bundle install

This should compile the Ruby pg database driver, allowing Ruby to talk to Postgres. Now, lets tell our Rails application how to access our database. Open up config/database.yml, and change the adapter line to read “postgresql”. The database name by convention is the name of your project plus “_development”. Finally, your shell username is needed. Because PostgreSQL will authenticate this account locally, you will not need to supply a password option. Delete this line.

# config/database.yml
development:
  adapter: postgresql
  encoding: unicode
  database: _development
  pool: 5
  username: 

To test, run the rake task to create your database:

rake db:create

If everything works, you should have a newly created database owned by your shell account. You can login using psql by passing the name of the database as an option:

$ psql -d _development

Happy migrating!

Troubleshooting

If you get the error: “FATAL: Ident authentication failed for user “, ensure that you can see your newly created account in the pg_user table of the postgres database. (See Step 2 above)

If you get the error: “PGError: ERROR: permission denied to create database”, then ensure that your database user account has been granted the privilege CREATE. This can be done during the “createuser” command line account creation by answering “y” to the corresponding question about this permission.

If you get the error: “FATAL: role is not permitted to log in”, try manually granting the privilege to login on your database user account. This can be done by executing the following as postgres in the psql prompt:

ALTER ROLE  LOGIN;

Notes on Alternative Authentications

PostgreSQL integrates very deeply into the Linux authentication world, allowing for quite an array of connection options. By default passwords are not accepted for local connections. Instead, PostgreSQL is configured to use the “ident sameuser” method of user authentication. See more at http://www.postgresql.org/docs/8.4/static/auth-pg-hba-conf.html.

Computers, Open-source, Personal, Ruby, Software, Thoughts, Web

Another Helping of Abstraction, Please

Rails 3.1 is soon to be released, and with it comes two new abstraction libraries – CoffeeScript, and S(ass)CSS. These libraries are used to generate Javascript code, and CSS definitions respectively. While abstraction libraries are nothing new to Rails, the inclusion of two more got me thinking about the direction that Rails stack is heading.

CoffeeScript’s syntax seems to be to make Javascript as Ruby-ish as possible. It describes Javascript’s curly braces and semicolons as embarrassing.

SCSS aims to address some of the repetitive elements of CSS through the use of variables, nesting, and mixins. This feels more acceptable to me than CoffeeScript, but my first encounter left me burned.

A few other abstraction libraries of relevance: Haml aims to generate HTML without the use of HTML tags. Additionally, Squeel‘s (MetaWhere 2.0) aim is to remove all SQL from your projects.

So what am I bitching about? Abstraction is a good thing right? I see two categories of abstraction. The first being the “good” kind, that allow you to be completely ignorant of the underpinnings. For example, Ruby converting down into machine code.

The “bad” kind of abstraction are the substitution of a language with a DSL. This creates a lot of issues starting with development and debugging. In the case of CoffeeScript and SASS, you have to compile the DSL files into Javascript, and CSS files. I feel like this compile step is a step back from what we gain working with dynamic languages like Ruby, and Javascript to begin with.

Development in these libraries also requires that you understand both the DSL of the library, as well as being familiar with the language of the generated code. This additional skill requirement adds time to a project, and raises the entry bar for new developers. Its difficult to be proficient at a language, and a DSL that generates that language at the same time. A Ruby developer told me yesterday that he was surprised at how rusty his knowledge of SQL had gotten. Its shocking to me that a web developer would struggle with SQL, but I think its an accurate sentiment on which many Rails developers would agree.

Another casualty of abstraction is performance. Not only is the generated code sub-optimized relative to coding it by hand, it is also being run through through more system calls to get there. You can either compile up front (CoffeeScript, SASS), or you can incur this penalty on-the-fly (Haml, Squeel).

While none of the libraries are a requirement of web development, when working on a team that uses these technologies you are expected to produce consistent code. Even though these libraries let you execute “native” code, doing so is discouraged because of the availability of the DSL. The syntax for embedding native code (if its even allowed) is often cumbersome, and loses editor functionality such as syntax highlighting and parsing.

Since when did Ruby on Rails web developers stop working with SQL, CSS, HTML, and Javascript? I am beginning to feel like the Ruby camp is becoming the far left extremists of the web development world. The web is built on these core technologies, and the benefits of abstracting them doesn’t seem to outweigh the costs.

Linux, Open-source, Ruby, Software

Rail’s Acts_as_revisable now summarizes changes

If you are a frequent reader, please congratulate me next time you see me on my first ever GiHub commit. I have been a long time user of GitHub, but I haven’t contributed anything back yet. I decided to start with the project acts_as_revisable, because I have some ideas for it that I think are useful.

The commit tonight adds a “revisable_changes” column, which records the attributes “before” and “after” values in the revision record. This data is calculated from the ActiveRecord instance method “changes”. To demonstrate:

# User.create :username => 'ben'
#[nil, "ben"]}>
# User.find(1).update_attributes :username => 'bob'
#["ben", "bob"]}>

This should serve as a good base to build summaries in a human readable format in an application. After tossing around a few different word choices to describe the changes, I decided that the better approach was to just record the data that changed in a structured format and leave it up to the implementation. Such an implementation may look something like this:

  def changes_summary
    self.revisable_changes.map do |attribute, values|
      before, after = values[0], values[1]
      if before.blank? and after
        "Added #{attribute} as #{after}"
      elsif before and after.blank?
        "Removed #{attribute} value of #{before}"
      else
        "Changed #{attribute} from #{before} to #{after}"
      end
    end.join('; ')
  end

Using our user instance above, this method out return: “Changed username from ben to bob”.

Other plans for acts_as_revisable in the future include tracking associations of a model. For example, if a user has many permissions, and the permissions change, the revisable information will include the permissions for that user at that point in time. I haven’t worked out the specifics of this yet, as it could potentially generate a lot of unwanted data. I plan on parsing the data from ActiveRecord’s “reflect_on_all_associations” method to gather the associations. I will then provide a way for a user to configure which associations should, or should not be tracked. Then I will iterate through these objects using ActiveRecord hooks and somehow record the state. Either a shadow table, or by serializing the associations in a column. The trick here will be to marshal the objects back to the live data when a revert is called. More soon…