Computers, Open-source, Ruby, Software

Upgrading to Rails 4

Recently, I started a new job and the first big assignment was to upgrade their software stack to a more recent version. They are running Rails 3.2 and wanted to upgrade as far forward as they can. With Rails 3.2 support gone for all but severe bug fixes, and Rails 5 due any month now, this is something they wisely didn’t want to put off.

Its a smaller company, and they have been open to a lot of my feedback and suggestions. I was basically given the reins and told to do what needed to be done to get us upgraded.

So the first task was some research, and I stumbled upon the official Rails upgrade guide pretty quickly. It nicely outlines the breaking changes. Fortunately, the big change was to strong parameters, but this can be deferred by including protected_attributes and kicking this can down the road. We will be logging what controller actions receive which parameters, instead of raising so we will have some time to collect some data before we switch over in one painful release.

The guides stressed that the test suite is critical during the upgrade. I was fortunate enough to have a project with adequate testing coverage. It wasn’t the 80% sweet spot, but it was certainly valuable at ~40%. However, the suite had fallen into disuse, so the first task was to get them back to green.

Once the test suite was green, it became a matter of KEEPING it green. Luck smiled a second time and they had an old CI server that had fallen into disuse. It was powered by CruiseControl.rb and it was little fuss to get it back up and running again. The migrations could no longer be played from the projects inception to the current time.

This is where luck stopped smiling upon me. The project did not track db/schema.rb and the migrations were not playable. The only way to get an instance of the database was to download the schema from production. Not the best practice, so I went about tracking the schema, and getting adoption of this new practice. Further complicating the schema approach was the decision to move all older migrations into subfolders in db/migrate by year (e.g. 2011, 2012, etc). This was done I found out because Textmate doesn’t like big directories. The issue is that db:schema:load isn’t recursive in its retrieval of migration versions. It took me a bit to understand what was happening, and how it was happening. After a failed monkey patch to the migrator logic in ActiveRecord, I decided to just move the migrations back into db/migrate and eliminate the subdirectories. Sorry Textmate!

Now the database could be rapidly provisioned, and I got a seed working with a minimal set of data. Back in CI I reconfigured the build script to use db:schema:load instead of db:migrate and with the green test suite, we got builds working again.

We used a utility called CC Menu to show the build status in the notification bar in OS X: http://ccmenu.org/

To make the builds even more visible, I discovered an integration with Slack to report the build status in our chat. https://github.com/epinault/cruisecontrolrb-to-slack . I made my own fork and added some famous movie quotes for successes and failures since I found the default messages lacking: https://github.com/bsimpson/cruisecontrolrb-to-slack . I didn’t think our female developers would appreciate the “you’re a stud!” message.

Back to the Rails 4 upgrade. The tests are passing in master, so I made a feature branch that will be long lived called “rails-upgrade”. I merge master in daily. The “rails-upgrade” branch will serve as an integration point for other features branches that will merge into it. The plan is to keep any upgrade related changes out of master until its time to deploy. That means separate branches, separate CI builds, and separate staging servers for manual QA.

One lesson I’ve learned is that a deprecation warning may not always be just informational. In particular, Rails 4 requires all scopes to be callable (lambdas, or procs). This was breaking the way that associations with scopes would be built: users.roles.admin.find_or_create! would previously find an associated admin record, or create it. However, in Rails 4, it fails creation because the role’s reference to user is nil. I’m not sure why, but its reproducable, and changing the admin scope on Role to a callable restores this reference back to user.

Ideally, I’d have wanted to get the test suite green before tackling deprecation warnings because I want to change as little as possible before I get back to a known good status. However, not fixing this deprecation warning was actually causing tests to fail.

Now we are down to a handful of failings tests on Rails 4. Most deal with the ActiveRecord syntax changes. Hopeful I can get these knocked out quickly. Then its on to manual QA.

In summary – get your test suite green. Keep it green. Do the upgrade and get it back to green. Then make any changes to remove deprecation warnings, keeping the suite green. The test suite is your guide during an upgrade like this.

Open-source, Ruby, Software

Attack of the Clones

This is an elementary Rails tip, but one that I just recently stumbled into. When working in a project where you have a record that you want to make multiple instances of, you can do it the old fashioned way:

User.create(:first_name => 'Laurence;', :last_name => 'Tureaud', :alias => 'Mr. T')

However, this is quite a bit of typing. If you want multiple users, you can add a loop structure like this:

(1..5).each {User.create(:first_name => 'Laurence;', :last_name => 'Tureaud', :alias => 'Mr. T')}

This will create 5 Mr. T’s! Of course, if you have any validations checking for unique values, this might fail. I would recommend using the loop structure to prefix, or postfix a unique digit:

(1..5).each {|x| User.create(:first_name => 'Laurence;', :last_name => "Tureaud#{x}", :alias => 'Mr. T')}

Notice that the “x” will be a different value in each loop. Now for the grand finale, you can save all this typing and use a method on an ActiveRecord object called “clone”. From the official Rails docs:

Returns a clone of the record that hasn‘t been assigned an id yet and is treated as a new record. Note that this is a “shallow” clone: it copies the object‘s attributes only, not its associations. The extent of a “deep” clone is application-specific and is therefore left to the application to implement according to its need.

To use this, grab the record you want to clone first:

user = User.find(1)
# "Tureaud">

Note that this does not work on a collection of ActiveRecord objects – only an individual object. Now, call the clone method, and this removes the id attribute:

user = User.find(1)
# "Tureaud">
new_user = user.clone
# "Tureaud">
new_user.save
true

I have now cloned my object. With some method chaining, and some looping, we can easy mass create fake records:

user = User.find(1)
# "Tureaud">
(1..5).each {user.clone.save}

Much less typing! Now, with all that time you saved, go sign up for an Amazon Prime account and buy some cool stuff and get it shipped for free!

Open-source, Ruby, Software, Thoughts, Web

ActiveRecord Callbacks aka How to Keep Data you Don’t Control Fresh

Ruby on Rails logoHave you ever been frustrated by having to query data that you don’t control? Especially if the data you want is not accessible in a format that you desired, you have probably “locally cached” this information. What happens then when this data changes on the source? There are a few approaches:

  • Rebuild the cache at a certain time – This approach allows your code to function in a way that doesn’t care too much about the data being cached. You do your thing, and a cron/scheduled job, does its thing and everyone is happy. Well, mostly. The problem with this approach is the frequency of the cache rebuilding. The shorter the frequency, the more accurate, but intensive the application becomes. The longer the frequency, the less intensive, but less accurate your data becomes. In either scenario, you will probably have to worry about mechanisms to manually rebuild the cache
  • Rebuild the cache on-the-fly – This approach allows your code to be as up-to-date as possible, while preserving the local cache, and not affecting performance too much. A typical scenario would be to insert records into your local cache the first time you pull it from the native source. This takes care of the need to pre-cache objects, since it is done at request time, but it comes at a performance penalty. The first request is the longest, then subsequent requests are quick. Also, you still have the issues of when to refresh the cache, and how to allow manually refreshing the cache. Also, this complicates your code; in addition to your logic, you now have relatively meaningless cache logic side-by-side with your meaningful logic.
  • Don’t cache – Just take the performance hit, optimize it as much as possible, and hope that no one cares the operation takes some extra time to complete. The problem with this approach is efficiency. Computers are fast, and people expect this. People may stop using your code all together if the performance impacts are severe enough to outweigh its usefulness.

So what is a programmer to do? Out of the approaches above, I have opted to perform caching on-the-fly with a twist. That twist takes advantage of ActiveRecord’s callbacks. What is a callback? Think of them as “in-between” steps available for you to hook into as ActiveRecord does its thing. Callbacks are an API that allow you to do this without any ugly hacks, or baseline modifications. Callbacks are also known as hooks. From Ruby on Rails official website:

“Callbacks are methods that get called at certain moments of an object’s lifecycle. With callbacks it’s possible to write code that will run whenever an Active Record object is created, saved, updated, deleted, validated, or loaded from the database.”

Simply, you can create methods with certain names in an ActiveRecord::Base derived model, and define your cache logic here. For example, if we had a Users model, we could query a user in a method like the following:

# app/models/user.rb
 class User  #

This code sample will return the first instance of a user, with their attributes loaded. Now, if this information was pulled from our local cache, the information contained may be different than in the original source. For instance, perhaps since the cache was built, this person got married, and changed their name. Your cache is now different from your original source, and this needs to be resolved. So lets implement some cache refreshing via ActiveRecord’s callback method after_find:

# app/models/user.rb
 class User  #

A few things to note. The name “after_find” means that this will be executed immediately following the completion of an ActiveRecord find operation. This includes: first, last, find_by_xxx, all, etc. The method then changes the User instance (local cache) with the data from the other database. ActiveRecord is smart enough to not actually issue a save command unless the data has actually changed, so don’t worry about not being efficient here. Also, you can write this without using the “self” prefix, but it helps me keep track of what is what. Also note that I using “put” just to show when this is executed. You can see that after I call find_by_username, this code is run. If there are any changes, they are reflected in the result, transparent to the rest of your application’s logic. This keeps the cache logic out of your “real” logic.

This will execute everytime we issue a find command on a User class, so this isn’t really efficient yet. Basically, the cache is always immediately expired. For performance reasons, lets make only check the other database every 10 minutes for a user:

# app/models/user.rb
 class User < ActiveRecord::Base
  attr_accessor :first_name, :last_name, :username

  # ActiveRecord callback
  def after_find
    if self.updated_at.blank? or self.updated_at  #
User.find_by_username('kristin')
=> #
# 10 minutes elapse... (use your imagination)
User.find_by_username('kristin')
refreshing cache
=> #

Now, we can see the cache working. Every 10 minutes, the local cache is checked against the original source, and for all the other requests, it just skips the conditional, and exits. You can obviously change the 10 minute expiration to anything you desire. Better still, throw this value in a YAML config file, and reference it so that this setting can be customized.

There are many other callback functions that you can use, and can work together to be a very powerful part tool. Check out this following code:

# app/models/user.rb
 class User < ActiveRecord::Base
  attr_accessor :first_name, :last_name, :username

  # ActiveRecord callback
  def after_find
    if self.updated_at.blank? or self.updated_at  #
User.find_by_username('kristin')
=> #
# 10 minutes elapse... (use your imagination)
User.find_by_username('kristin')
refreshing cache
=> #

This allows me to use “find_or_create_by” to generate records with incomplete information. The missing information is filled in at creation time thanks to the before_save method. Just a note, do NOT call “save” from within some of these methods, as this would create an infinite loop – think about it. Before_save calling save, which would call before_save, etc. Be careful.

There is a performance penalty for me creating a record in this manner, and it would be much better if I got all this information in one query. For example:

# pull the first user
OtherDatabase::User.find_by_username('ksimpson') do |user|
  user = User.find_or_create_by_username(:username => user.username, :last_name =>       user.last_name, :first_name => user.first_name)
end
user
=> #
# 10 minutes elapse... (use your imagination)
User.find_by_username('kristin')
refreshing cache
=> #

The before_save would have taken care of any missing information (as we saw above), however this comes at the penalty of a second query, and can quickly mean you have unnecessarily doubled your queries.

Computers

ActiveRecord’s Secret find_by_sql Results

Ruby on Rails logo Well, its not exactly a secret. It sure isn’t well documented however. Recently, I wanted to return a query that spanned multiple database tables. I decided to go with find_by_sql because of the mind-blowing idiocy with which this legacy database was structured. I will take a watered down version of what I was attempting to do to demonstrate how we can expose some “hidden” functionality of ActiveRecord’s find_by_sql method.

 
Channel table:
------------------------------------
id | title  | description        | user_id
------------------------------------
1  | first | the first channel | 1

User table:
-----------
id | name
-----------
1  | ben

After I constructed my find_by_sql query, it looked something like this:

Channel.find_by_sql("SELECT a.*, b.name
FROM channel a, user b
WHERE a.user_id = b.id")

This query selects all columns from table a (channel), and a single column from table b (user). This is pretty standard, as many queries need to gather values from multiple table columns in a single SELECT operation.

Running this query, you will receive an array of Channel instances with all the attributes filled in for the channel model. Missing however, will be the attributes from any table other than “Channel”:

Channel.find_by_sql("SELECT a.*, b.name
FROM channel a, user b
WHERE a.user_id = b.id")
=> "[#]"

Notice how the “name” column from table b (user) is not present in the display? You can even query this attribute directly:

c = Channel.find_by_sql("SELECT a.*, b.name
FROM channel a, user b
WHERE a.user_id = b.id")
=> "[#]"
c[0].name
=> NoMethodError: undefined method 'name' for #

We could create an attr_accessor for the Channel class, and this would resolve the NoMethodError, but it still won’t be populated for our Channel instance after a find_by_sql.

After some digging around in the source code, and online, I came across this posting, which made the brilliant suggestion of looking in channel.attributes. This method will list an array of attributes that ActiveRecord knows about. Take a look at channel.attributes.keys:

c.attributes.keys
=> ["id", "title", "description", "user_id", "name"]

There it is! Our “missing” name attribute from the SELECT query has been located. Accessing the value for this attribute is trivial:

c.attributes["name"]
=> "ben"

We can do this with as many “extra” columns as we want. If two column names conflict (say channels had a column “name”, and users also had a column “name”), the database will return “name”, and “name_1” respectively. This is a really powerful feature of ActiveRecord that will encourage people to stick with the ORM, since they can still write SQL in a pinch.

Bonus: Customizing .to_json to include find_by_sql attributes

In the preceding example, the attribute “name” would not be included in the output of a “.to_json” call, as in the following example:

c.attributes.keys
=> ["id", "title", "description", "user_id", "name"]
c.to_json
=> "{"channel":{"id":1,"title":"first","description":"The first channel"}}"

This is where we can customize what is included in our JSON output. This article showed me that you can use the :methods argument with to_json to explicitly include any custom attributes, such as those that are attr_accessor objects in your class. When passing in the :methods argument, I must specify which attributes to include:

c.attributes.keys
c.to_json
=> "{"channel":{"id":1,"title":"first","description":"The first channel"}}"
c.to_json :methods => :name
=> "{"channel":{"id":1,"title":"first","description":"The first channel","name":"ben"}}"

Good job Rails team! No ugly hacks, or overrides needed today.

Computers, Open-source, Ruby, Software, Web

Rails ActiveRecord Callbacks (Hooks)

Rails has some neat tricks up its sleeve when it comes to its ORM – ActiveRecord. One of the many things it does well is provide the ability to customize what happens at certain stages of the ActiveRecord transaction lifecycle. This means that you can have pre and post events that fire off when you create, save, or destroy records.

Here are the methods that are available inside ActiveRecord::Base derived models:

after_create, after_destroy, after_save, after_update, after_validation, after_validation_on_create, after_validation_on_update, before_create, before_destroy, before_save, before_update, before_validation, before_validation_on_create, before_validation_on_update

So when would you use this?:

Glad you asked! I recently had a need to implement such a thing when I was writing a new website.  In this website, a user can create, move, and delete tabs from their interface. The position of the tab was saved in a database table. When I would render the tabs to the user, I would just make my association with “:order => ‘position'”. Whatever position the tabs were ordered in, would show in the interface.

Adding a tab sounds easy (at first glance). You can just append the new tab to the end of the user’s tab listing. Something like this may be a good first run:

before_create :order_tab

def order_tab
  self.position = self.user.tab_layouts.count
end

Remember not to save your record in the callback methods, as this will cause an infinite loop, as it is saved, triggering the before_save event again, and so on. Before you know it, your CPU is hot enough to fry an egg.

However, we want to be flexible enough to allow the user to rearrange tabs as they please. If we keep this as is, no matter what is rearranged, when the save method is called, it will just override the new position again with the last position because of our code above. You may be thinking of using the before_create callback to get around this, however I wanted a more generic answer in creating / updating tab positions.

After toying around for a while, I came up with this solution:

before_create :order_tabs_on_create

def order_tabs_on_create
  self.position = self.user.tab_layouts.count if self.position.nil?

  ActiveRecord::Base.connection.execute("UPDATE tab_layouts
     SET POSITION = POSITION + 1
   WHERE user_id = #{self.user_id} AND POSITION >= #{self.position}")
end

Lets look at this line by line. First, we need to determine if a position has been set prior to saving. If it does not, then lets just throw it on the end (or wherever – it really won’t matter soon).

If a position has been specified, then we want to honor that location, and increment the positions of other tabs by one to allow room to insert the new tab. After we increment these positions (leaving a gap), our new tab will fill this in when the method ends, and is saved.

Just a side note, I could have used the “increment!” method instead of executing raw SQL to save the record. There is an “n+1” performance problem to consider. Simply, if a user inserts a new tab at the beginning of 100 other tabs, then you will have one insert statement, and 100 update statements to accomplish this callback magic.

So this will allow you to insert a tab at any location, and increment the positions of the tabs behind it so they are out of the way. Now, what happens when you destroy a tab? If the tab is on the end of the list, everything is fine, because the positions are still sequential. However, if we destroy a tab in the middle, then we have a gap in our positions. Lets take care of this with a new callback:

before_destroy :order_tabs_on_destroy

def order_tabs_on_destroy
  ActiveRecord::Base.connection.execute("UPDATE tab_layouts
       SET POSITION = POSITION - 1
     WHERE user_id = #{self.user_id} AND POSITION > #{self.position}")
end

This method will fire whenever a record is destroyed. Note the significant difference between calling tab.delete, and tab.destroy. If you just call “delete” it is gone, without ever invoking all of this callback magic. The ActiveRecord authors provide both methods to address performance concerns. Destroy is slower because it does run through all of these callbacks prior to deletion, creation, saving, etc.

Now, when we delete a tab, all tabs with positions higher than the current tab are destroyed as well.

Update:

After I wrote the original version of this code this morning, I realized that I didn’t take into account reordering tabs that were already added to a user’s layout. The next piece of code below handles this as well. Additionally, I have a sneaking suspicion that there is a more compact way to do this, but I just couldn’t get my head around all of that logic:

before_update :order_tabs_on_update

def order_tabs_on_update
  old_position = TabLayout.find(self.id).position
  if old_position  #{old_position} AND POSITION <= #{self.position}")
  else
    ActiveRecord::Base.connection.execute("UPDATE tab_layouts
     SET POSITION = POSITION + 1
   WHERE user_id = #{self.user_id} AND POSITION = #{self.position}")
  end
end