Open-source, Ruby, Software, Thoughts, Web

ActiveRecord Callbacks aka How to Keep Data you Don’t Control Fresh

Ruby on Rails logoHave you ever been frustrated by having to query data that you don’t control? Especially if the data you want is not accessible in a format that you desired, you have probably “locally cached” this information. What happens then when this data changes on the source? There are a few approaches:

  • Rebuild the cache at a certain time – This approach allows your code to function in a way that doesn’t care too much about the data being cached. You do your thing, and a cron/scheduled job, does its thing and everyone is happy. Well, mostly. The problem with this approach is the frequency of the cache rebuilding. The shorter the frequency, the more accurate, but intensive the application becomes. The longer the frequency, the less intensive, but less accurate your data becomes. In either scenario, you will probably have to worry about mechanisms to manually rebuild the cache
  • Rebuild the cache on-the-fly – This approach allows your code to be as up-to-date as possible, while preserving the local cache, and not affecting performance too much. A typical scenario would be to insert records into your local cache the first time you pull it from the native source. This takes care of the need to pre-cache objects, since it is done at request time, but it comes at a performance penalty. The first request is the longest, then subsequent requests are quick. Also, you still have the issues of when to refresh the cache, and how to allow manually refreshing the cache. Also, this complicates your code; in addition to your logic, you now have relatively meaningless cache logic side-by-side with your meaningful logic.
  • Don’t cache – Just take the performance hit, optimize it as much as possible, and hope that no one cares the operation takes some extra time to complete. The problem with this approach is efficiency. Computers are fast, and people expect this. People may stop using your code all together if the performance impacts are severe enough to outweigh its usefulness.

So what is a programmer to do? Out of the approaches above, I have opted to perform caching on-the-fly with a twist. That twist takes advantage of ActiveRecord’s callbacks. What is a callback? Think of them as “in-between” steps available for you to hook into as ActiveRecord does its thing. Callbacks are an API that allow you to do this without any ugly hacks, or baseline modifications. Callbacks are also known as hooks. From Ruby on Rails official website:

“Callbacks are methods that get called at certain moments of an object’s lifecycle. With callbacks it’s possible to write code that will run whenever an Active Record object is created, saved, updated, deleted, validated, or loaded from the database.”

Simply, you can create methods with certain names in an ActiveRecord::Base derived model, and define your cache logic here. For example, if we had a Users model, we could query a user in a method like the following:

# app/models/user.rb
 class User  #

This code sample will return the first instance of a user, with their attributes loaded. Now, if this information was pulled from our local cache, the information contained may be different than in the original source. For instance, perhaps since the cache was built, this person got married, and changed their name. Your cache is now different from your original source, and this needs to be resolved. So lets implement some cache refreshing via ActiveRecord’s callback method after_find:

# app/models/user.rb
 class User  #

A few things to note. The name “after_find” means that this will be executed immediately following the completion of an ActiveRecord find operation. This includes: first, last, find_by_xxx, all, etc. The method then changes the User instance (local cache) with the data from the other database. ActiveRecord is smart enough to not actually issue a save command unless the data has actually changed, so don’t worry about not being efficient here. Also, you can write this without using the “self” prefix, but it helps me keep track of what is what. Also note that I using “put” just to show when this is executed. You can see that after I call find_by_username, this code is run. If there are any changes, they are reflected in the result, transparent to the rest of your application’s logic. This keeps the cache logic out of your “real” logic.

This will execute everytime we issue a find command on a User class, so this isn’t really efficient yet. Basically, the cache is always immediately expired. For performance reasons, lets make only check the other database every 10 minutes for a user:

# app/models/user.rb
 class User < ActiveRecord::Base
  attr_accessor :first_name, :last_name, :username

  # ActiveRecord callback
  def after_find
    if self.updated_at.blank? or self.updated_at  #
User.find_by_username('kristin')
=> #
# 10 minutes elapse... (use your imagination)
User.find_by_username('kristin')
refreshing cache
=> #

Now, we can see the cache working. Every 10 minutes, the local cache is checked against the original source, and for all the other requests, it just skips the conditional, and exits. You can obviously change the 10 minute expiration to anything you desire. Better still, throw this value in a YAML config file, and reference it so that this setting can be customized.

There are many other callback functions that you can use, and can work together to be a very powerful part tool. Check out this following code:

# app/models/user.rb
 class User < ActiveRecord::Base
  attr_accessor :first_name, :last_name, :username

  # ActiveRecord callback
  def after_find
    if self.updated_at.blank? or self.updated_at  #
User.find_by_username('kristin')
=> #
# 10 minutes elapse... (use your imagination)
User.find_by_username('kristin')
refreshing cache
=> #

This allows me to use “find_or_create_by” to generate records with incomplete information. The missing information is filled in at creation time thanks to the before_save method. Just a note, do NOT call “save” from within some of these methods, as this would create an infinite loop – think about it. Before_save calling save, which would call before_save, etc. Be careful.

There is a performance penalty for me creating a record in this manner, and it would be much better if I got all this information in one query. For example:

# pull the first user
OtherDatabase::User.find_by_username('ksimpson') do |user|
  user = User.find_or_create_by_username(:username => user.username, :last_name =>       user.last_name, :first_name => user.first_name)
end
user
=> #
# 10 minutes elapse... (use your imagination)
User.find_by_username('kristin')
refreshing cache
=> #

The before_save would have taken care of any missing information (as we saw above), however this comes at the penalty of a second query, and can quickly mean you have unnecessarily doubled your queries.

Computers, Open-source, Ruby, Software, Web

Rails ActiveRecord Callbacks (Hooks)

Rails has some neat tricks up its sleeve when it comes to its ORM – ActiveRecord. One of the many things it does well is provide the ability to customize what happens at certain stages of the ActiveRecord transaction lifecycle. This means that you can have pre and post events that fire off when you create, save, or destroy records.

Here are the methods that are available inside ActiveRecord::Base derived models:

after_create, after_destroy, after_save, after_update, after_validation, after_validation_on_create, after_validation_on_update, before_create, before_destroy, before_save, before_update, before_validation, before_validation_on_create, before_validation_on_update

So when would you use this?:

Glad you asked! I recently had a need to implement such a thing when I was writing a new website.  In this website, a user can create, move, and delete tabs from their interface. The position of the tab was saved in a database table. When I would render the tabs to the user, I would just make my association with “:order => ‘position'”. Whatever position the tabs were ordered in, would show in the interface.

Adding a tab sounds easy (at first glance). You can just append the new tab to the end of the user’s tab listing. Something like this may be a good first run:

before_create :order_tab

def order_tab
  self.position = self.user.tab_layouts.count
end

Remember not to save your record in the callback methods, as this will cause an infinite loop, as it is saved, triggering the before_save event again, and so on. Before you know it, your CPU is hot enough to fry an egg.

However, we want to be flexible enough to allow the user to rearrange tabs as they please. If we keep this as is, no matter what is rearranged, when the save method is called, it will just override the new position again with the last position because of our code above. You may be thinking of using the before_create callback to get around this, however I wanted a more generic answer in creating / updating tab positions.

After toying around for a while, I came up with this solution:

before_create :order_tabs_on_create

def order_tabs_on_create
  self.position = self.user.tab_layouts.count if self.position.nil?

  ActiveRecord::Base.connection.execute("UPDATE tab_layouts
     SET POSITION = POSITION + 1
   WHERE user_id = #{self.user_id} AND POSITION >= #{self.position}")
end

Lets look at this line by line. First, we need to determine if a position has been set prior to saving. If it does not, then lets just throw it on the end (or wherever – it really won’t matter soon).

If a position has been specified, then we want to honor that location, and increment the positions of other tabs by one to allow room to insert the new tab. After we increment these positions (leaving a gap), our new tab will fill this in when the method ends, and is saved.

Just a side note, I could have used the “increment!” method instead of executing raw SQL to save the record. There is an “n+1” performance problem to consider. Simply, if a user inserts a new tab at the beginning of 100 other tabs, then you will have one insert statement, and 100 update statements to accomplish this callback magic.

So this will allow you to insert a tab at any location, and increment the positions of the tabs behind it so they are out of the way. Now, what happens when you destroy a tab? If the tab is on the end of the list, everything is fine, because the positions are still sequential. However, if we destroy a tab in the middle, then we have a gap in our positions. Lets take care of this with a new callback:

before_destroy :order_tabs_on_destroy

def order_tabs_on_destroy
  ActiveRecord::Base.connection.execute("UPDATE tab_layouts
       SET POSITION = POSITION - 1
     WHERE user_id = #{self.user_id} AND POSITION > #{self.position}")
end

This method will fire whenever a record is destroyed. Note the significant difference between calling tab.delete, and tab.destroy. If you just call “delete” it is gone, without ever invoking all of this callback magic. The ActiveRecord authors provide both methods to address performance concerns. Destroy is slower because it does run through all of these callbacks prior to deletion, creation, saving, etc.

Now, when we delete a tab, all tabs with positions higher than the current tab are destroyed as well.

Update:

After I wrote the original version of this code this morning, I realized that I didn’t take into account reordering tabs that were already added to a user’s layout. The next piece of code below handles this as well. Additionally, I have a sneaking suspicion that there is a more compact way to do this, but I just couldn’t get my head around all of that logic:

before_update :order_tabs_on_update

def order_tabs_on_update
  old_position = TabLayout.find(self.id).position
  if old_position  #{old_position} AND POSITION <= #{self.position}")
  else
    ActiveRecord::Base.connection.execute("UPDATE tab_layouts
     SET POSITION = POSITION + 1
   WHERE user_id = #{self.user_id} AND POSITION = #{self.position}")
  end
end