Computers, Linux, Open-source, Ruby, Software, Thoughts, Web, Windows

Deploying Ultrasphinx to Production

Recently I rolled out a Rails app that used the Sphinx full-text indexing service in conjunction with the Ruby Ultrasphinx gem. I am very impressed with some aspects of this project, and I wanted to share my experiences for anyone looking for a better search experience with SQL databases.

Why Sphinx? Sphinx is an open-source, and stable full-text indexing service. It also has good support in the Rails landscape. Why full-text indexing? In a nutshell, people can spot a crappy search implementation really quick. Google is at the top of their game because it searches the way people think. Just try implementing the following with just SQL:

  • Conditional logic (&, |, -)
  • Rank based search results
  • Case (ben vs Ben), punctuation (bens laptop vs ben’s laptop), plurality (virus vs viruses) insensitive
  • Phonetic searching (candy can match candi)
  • Searching across multiple tables with results being in either, but not both
  • 100,000 rank based results in .02 milliseconds
  • Cached data, with delta scans for minimal performance impact

Yes, you could do all these things – but why? The folks at Sphinx do nothing but this, and have packaged it up for your to use at your whim. There are other niceties that you can include like sorting, pagination, restricting to certain columns, and best of all spell checking via the raspell gem.

To begin, you will need a MySQL, or PostgreSQL backend – something I just happened to luck out on with this particular application.  You should install Sphinx and poke around for a few minutes, to understand what Ultrasphinx provides you.

A note for Windows users – add the Sphinx bin/ folder to your path so you can just call its commands a-la Unix style. Additionally, I had issues running my Rails project in a directory containing spaces. YMMV

Ultrasphinx provides a Rails-centric way of using Sphinx. Sphinx provides the search service, and Ultrasphinx builds the configuration file, and manages the Sphinx process via rake tasks. Inside your models that will be Sphinx-ified, you will need to indicate which fields are indexable, and sortable. A useful feature of Sphinx/Ultrasphinx is the ability to create associated SQL to join multiple tables on the full-text search. See http://blog.evanweaver.com/files/doc/fauna/ultrasphinx/files/README.html for more information.

Once Ultrasphinx is configured, and has created a configuration file, you can start the indexing process, then start your Sphinx service. Notes on doing this in production via Capistrano follows:

desc 'Start Ultrasphinx searchd process'
task :ultrasphinx_start do
  run "cd #{release_path}; rake ultrasphinx:daemon:status RAILS_ENV=production" do |ch, stream, data|
    if data[/stopped/]
      run "cd #{release_path}; rake ultrasphinx:daemon:start RAILS_ENV=production"
    end
  end
end

desc 'Stop Ultraphinx searchd process'
task :ultrasphinx_stop do
  run "cd #{release_path}; rake ultrasphinx:daemon:status RAILS_ENV=production" do |ch, stream, data|
    if data[/running/]
      run "cd #{release_path}; rake ultrasphinx:daemon:stop RAILS_ENV=production"
    end
  end
end

desc 'Status of Ultraphinx searchd process'
task :ultrasphinx_status do
  run "cd #{release_path}; rake ultrasphinx:daemon:status RAILS_ENV=production"
end

desc "Reindex Ultrasphinx via indexer process"
task :ultrasphinx_reindex do
  run "cd #{release_path}; rake ultrasphinx:configure RAILS_ENV=production"
  puts "NOTE THAT THIS CAN TAKE A WHILE"
  run "cd #{release_path}; rake ultrasphinx:index RAILS_ENV=production"
end
before :ultrasphinx_reindex, :ultrasphinx_stop
after :ultrasphinx_reindex, :ultrasphinx_start
after 'deploy:update_code', :ultrasphinx_reindex, :roles => [:app, :web]

This Capistrano deploy.rb fragment has four tasks – start, stop, status, and reindex. The anonymous before and after calls ensure that the service is stopped before re indexing occurs. Note that this is a full reindex, and not a delta scan. My application didn’t have  reliable datetime column to determine new entries with, so I opted to do the full index every three hours instead. The database is a small one, with less than 100MB of data, so I can get away with it here.

Additionally, in Cron, you will want to setup a recurring task in your production server environment:

# Sphinx updates http://blog.evanweaver.com/files/doc/fauna/ultrasphinx/files/DEPLOYMENT_NOTES.html
# This merges delta indexes into main index
0 */3 * * * bash -c 'cd /path/to/app/current/; RAILS_ENV=production rake ultrasphinx:index >> log/ultrasphinx-index.log 2>&1'
# Make sure the service is running
*/3 * * * * bash -c 'cd /path/to/app/current/; RAILS_ENV=production rake ultrasphinx:daemon:start >> log/ultrasphinx-daemon.log 2>&1'
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s