Hardware, Linux, Open-source, Ruby, Software, Uncategorized

Delayed Job Performance Tuning

We found a bug. A bug that affected a lot of historical records that we now have the pleasure of reprocessing. Fortunately we already had an async job infrastructure in place with the delayed job gem. Unfortunately, this gem is intended for fairly small batches of records and isn’t tuned to handle 1M+ records in the delayed_jobs table.

After reading some best practices we decided on a queue based approach. To keep our day to day async jobs running we would use a “default” queue. And to reprocess our old records we used a new queue. Starting the workers up with a “–queue” flag did the trick. We had hardware dedicated for day-to-day operations, and new hardware dedicated to our new queue operations. Now it was simply a matter of filling up the queue with the jobs to reprocess the records.

Our initial approach maxed out the CPU on our database server. This was largely due to us not tuning our SQL in our async jobs. Because the volume we processed was always low, this was never really a noticeable problem. But when we threw lots of new jobs into the queues, it became very noticeable. The workers would start up, then mysteriously die. After some digging in /var/log/kern.log we discovered the workers were being killed due to an out of memory manager. Attaching a small swap partition helped, but once you’ve hit swap, things become horribly slow. Whats the point in keeping the worker alive if its running a thousand times slower? Clearly we needed to profile and tune. So we did. (The specifics are out of scope for this article, but it involved consolidating N+1 queries and limiting columns returned by the SELECT).

With our newly tuned SQL our spirits were high as we cranked up the workers again. Only to reach the next process bottleneck. And this is where databases gets frustrating. Delayed job workers run a query each time they are looking for a new job to find out which job to pick up. It puts a mutex lock on the record by setting locked_at and locked_by. The query looks like this:

UPDATE delayed_jobs
SET `delayed_jobs`.`locked_at` = '2016-06-05 11:48:28', 
 `delayed_jobs`.`locked_by` = 'delayed_job.2 host:ip-10-203-174-216 pid:3226' 
WHERE ((run_at <= '2016-06-05 11:48:28' 
AND (locked_at IS NULL OR locked_at < '2016-06-05 07:48:28') OR locked_by = 'delayed_job.2 host:ip-10-203-174-216 pid:3226') 
AND failed_at IS NULL) 
ORDER BY priority ASC, run_at ASC 
LIMIT 1;

The UPDATE does an ORDER which results in a filesort. Filesorts are typically something an index can resolve. So I optimistically added the following:

CREATE INDEX delayed_job_priority
ON delayed_jobs(priority,run_at);

Sadly, this index was completely ignored when I ran an EXPLAIN on my UPDATE. And the reason is that MySQL doesn’t execute an UPDATE query the same way as if you did a SELECT with the same conditions. The index probably made things worse, because now with each record update, we now also have an index update as well. I could probably fork the code and probably use some type of isolation level in a transaction to get the best of both worlds with an index based SELECT, and a quick UPDATE on a single record by id. But there are easier solutions to try first.

My UPDATE statements were pushing 40 seconds in some cases according to MySQL. Eventually the lock wait timeout is exceeded and you see an error in the delayed_jobs.log:

Error while reserving job: Mysql2::Error: Lock wait timeout exceeded; 
try restarting transaction

Jobs were moving very slowly, and throwing more workers at it didn’t make an improvement. This is because each time a worker picks up a job, it was waiting 40+ seconds. The UPDATE was doing a filesort, and any index was being ignored. (And MySQL doesn’t support UPDATE hints). It was pretty clear that all of the jobs from the reprocessing queue needed to find a new home that didn’t negatively impact my filesort. I settled on the following solution:

CREATE TABLE delayed_jobs_backup LIKE delayed_jobs;

INSERT INTO delayed_jobs_backup
SELECT * FROM delayed_jobs WHERE queue='new_queue';

DELETE FROM delayed_jobs WHERE queue='new_queue';

This creates a new database table with the structure of the existing delayed_jobs table. The table is then populated with the jobs that needed to find a new home (All 1M+ of them). And finally, deleted from the original delayed_jobs table. Be careful doing this, and do some SELECT/EXPLAIN queries in between to ensure you are doing what you think you are doing. (Deleting 1M+ records from a production database makes me sit up in my chair a bit).

Looking at MySQL’s process list I no longer have System locks on my UPDATE statements (presumably because the table size is small enough that the filesort is mostly painless):

mysql> SHOW FULL PROCESSLIST;
# Id, User, Host, db, Command, Time, State, Info
1, user, 0.0.0.0:34346, localhost, Query, 0, System lock, UPDATE ...

The important columns here are the Time (in seconds), State, and Info. This proves that my job locking was happening quickly. I was seeing Time values of 40+ seconds before. I kept referring back to this process list to verify that the UPDATES were remaining snappy while I modified the number of workers running, and the number of jobs in the queue. I had a goal of keeping the UPDATE system lock times under 2 seconds. Adding more workers pushed the times up. Adding more jobs to the queue pushed the times up. Its a balance that probably depends very much on what you are processing, how much your database can handle, and what your memory constraints are on your worker servers.

To conclude – my job over the next few days will be to run the following command to put some jobs into the delayed_jobs table 10,000 at a time:

INSERT INTO delayed_jobs
SELECT * FROM delayed_jobs_backup LIMIT 10000;

DELETE FROM delayed_jobs_backup LIMIT 10000;

You can of course automate this. But my objective was met. The jobs can reprocess old records without impacting day to day jobs in the default queue. When the delayed_jobs table is almost empty, I move over another batch of jobs from the delayed_jobs_backup table. Rinse and repeat until there are no more jobs left to process. Its a bit more painful, but day to day operations continue to function, and I can cross of the reprocessing task from my list of things to do. All without any code changes!

I’ve been reading up on transaction isolation levels thinking something like a SELECT FOR UPDATE lock might be worthy contribution to the delayed_job codebase: http://dev.mysql.com/doc/refman/5.7/en/innodb-transaction-isolation-levels.html

Ruby, Uncategorized

Testing and External Calls

Why Are External Calls Bad?

When writing unit tests, the principle is to test each component in isolation from other components. If your tests fail because of concerns outside of your subject, then this is unintentional coupling. A call to an external network resource also (S3, Pusher, Facebook, Google, etc) introduces brittleness, and will slow down your test. Network requests are very expensive compared to memory and local disk access. These quickly add up and cause a slow running test suite.

VCR

Reference a Ruby gem in the testing Gemfile group of your Rails project for VCR. It is responsible for recording request/response traffic (complete with headers, params, etc) and serializing it as a testing fixture. This fixture can then be used by the test instead of making an external network call.
VCR documentation at: https://relishapp.com/vcr/vcr/docs

Webmock

External network calls via a number of avenues are disabled in testing via the Webmock gem. These methods include Net:HTTP, Curl::Easy, etc. If you can think it up, its probably blocked. This is intentional for the reasons listed earlier. Exceptions should not be made in tests.
Webmock documentation at: https://github.com/bblimke/webmock

How to Write a Test Using VCR

Take for example an Image class. Pretend it is responsible (in part) for uploading a file to Amazon’s S3 service. This is hosted on AWS and is an external network call. Currently, creating an image in a test will fail with an exception similar to the following:

it 'uploads an image' do
  FactoryGirl.create(:image) # this fails
end
VCR::Errors::UnhandledHTTPRequestError:
================================================================================
An HTTP request has been made that VCR does not know how to handle:
  PUT https://bwtesting.s3.amazonaws.com/uploads/image/file/1/logo.png

The error explains that there is currently no cassette to play. We can create a cassette to be used going forward by temporarily enabling external network requests:

# spec/vcr_helper.rb
require 'vcr'

VCR.configure do |config|
  config.cassette_library_dir = &quot;spec/fixtures&quot;
  config.hook_into :webmock
  config.allow_http_connections_when_no_cassette = true
  config.configure_rspec_metadata!
end

Uncomment the line

config.allow_http_connections_when_no_cassette = true

Now instruct your test to use VCR by adding an Rspec metadata tag of :vcr:

it 'uploads an image', :vcr do
  FactoryGirl.create(:image)
end

Now you can run your test again with a passing result:
1 example, 0 failures, 1 passed

You will also note a new file has been created under spec/fixtures. The specific path and file name are based on the location of your test. (Note if you rename your test, be sure to rename the fixture accordingly). Look at the fixture and you will see something like:

---
http_interactions:
...
- request:
    method: put
    uri: https://bwtesting.s3.amazonaws.com/uploads/image/file/1/logo.png
...

One fixture can contain multiple network call representations.
If we comment back out in spec/vcr_helper.rb we should now be able to run our spec without a network connection:

# spec/vcr_helper.rb
require 'vcr'

VCR.configure do |config|
  config.cassette_library_dir = &quot;spec/fixtures&quot;
  config.hook_into :webmock
  # config.allow_http_connections_when_no_cassette = true
  config.configure_rspec_metadata!
end

Matching a Cassette

How does VCR know which cassette recording to use when mocking an external network request? The default behavior is to match on method (GET, POST, PUT, etc) and the URI (the full URL of the resource)

You might have guessed that this can cause an issue with RESTful resources, or anything with a unique identifier in the URL. Take this example:

it 'uploads an image', :vcr do
  FactoryGirl.create(:image)
  FactoryGirl.create(:image)
end

We create two images. This will now cause our test to fail with an error from VCR:

VCR::Errors::UnhandledHTTPRequestError:
================================================================================
An HTTP request has been made that VCR does not know how to handle:
  PUT https://bwtesting.s3.amazonaws.com/uploads/image/file/2/logo.png
VCR is currently using the following cassette:
  - spec/fixtures/Image/uploads_an_image.yml
  - :record =&gt; :once
  - :match_requests_on =&gt; [:method, :uri]

This failed because the URI we previously recorded was for
https://bwtesting.s3.amazonaws.com/uploads/image/file/1/logo.png
and now we have the URI
https://bwtesting.s3.amazonaws.com/uploads/image/file/2/logo.png

Changing Matching behaviors

VCR fortunately provides flexibility in how to match an HTTP request. As part of the metadata, you can specify what to match on. For our test, we don’t really care about the specific URL, just that we simulate a good response from Amazon. This means its sufficient to match on method and host. In your test you can specify by doing:

it 'uploads an image' do
  VCR.use_cassette('spec/fixtures/Image/uploads_an_image', match_requests_on: [:method, :host]) do
    FactoryGirl.create(:image)
    FactoryGirl.create(:image)
  end
end

Note that :vcr is gone from our Rspec metadata. We have wrapped the code responsible for issuing the external network call in an explicit VCR block that allows us to match on different criteria of the HTTP request. This test should now pass.

Custom Matching

Maybe the built in match_requests_on attributes aren’t fine grained enough for your test. In that case, you can write your own matcher. See the example below for how to match on part of the path of a URI:

it 'uploads an image' do
  VCR.use_cassette('spec/fixtures/Image/uploads_an_image', match_requests_on: [:method, :host, method(:s3_matcher)]) do
    FactoryGirl.create(:image)
    FactoryGirl.create(:image)
  end
end

def s3_matcher(request_1, request_2)
  !!URI(request_1.uri).path[/\/uploads\/image\/file\/\d+/]
end

I’ve added a new method into the array of values passed to
match_requests_on
This method takes two arguments (it iterates over each request in the fixture and compares it to this specific request until it finds a match). The return value of this method must be a boolean. If its true, then the match is made against one of the requests. If the match is false, it tries the next HTTP request until it runs out of options, and fails.

Computers, Open-source, Ruby

Rspec Bisect

During a journey to upgrade to Rails 4, the test suite was crucial for success. I ran the test suite and recorded all of the test failures. Then I went through test by test fixing them until I was all done. Barely containing my excitement I pushed to remote and waited. Sadly, our CI server reported a build failure. What the heck?! The tests all passed in isolation. Why do they not pass when run together? Well…

The mark of a good test is that they can be run in any order and still pass. In order to accomplish this, we need each test (or the suite at large) to be responsible for setting up its own data, and tearing down that data when its finished. You want to leave the state of everything exactly like how you found it before the test was run. This can include clearing caches, truncating database tables, or rolling back transactional changes. Many test suites including Rspec have the capability to do these things for you without much headache.

But what happens when your tests are run in a different order and fail? Or what happens when your tests pass in isolation, but fail when run in part of the larger test suite? Likely something in an unknown test is causing your test failure. And finding which other test is being naughty can be a real pain. Fortunately, this is the case for a new feature in Rspec 3.3 called bisect.

In case you are not familiar with bisect its a simple, but powerful algorithm. Suppose you want to drop an egg off a building at different floors to discover the lowest floor on which the egg will break. There are multiple ways to accomplish this task. The most wasteful way would be to start at floor one, drop the egg, if it doesn’t break, go to floor two and repeat. When you drop the egg and it breaks, then you have your answer. This is slow. In Big O notation, this would be O(n) where n is the number of floors in the building.

We can do better. What if we could cut the floors that we test on in half each time we perform our egg drop test? For argument sake, if the building has 100 floors (and we have an unusually strong egg!) we might bisect the floors and start at floor 50. If we drop the egg and it breaks, we know that the answer is somewhere between 1 and 50. If it doesn’t break, we know the answer is between 51 and 100. Lets assume it breaks. We can then do our next test (with a new egg) on floor 25. If it breaks, we know our answer is between 1 and 25. If it doesn’t break, we know our answer is between floors 26 and 50. Just by dropping two eggs on different floors, we’ve cut our sample size from 100 to 25. In Big O notation, this would be O(n/2). The number of times we have to do this test is the number of powers of 2 we can divide into our number of floors. On a 100 floor building, we would have our answer in ~6 tests (2^6). We are going to save a lot of eggs.

So why are we talking about eggs when we have test failures to fix? Rspec offers a sweet utility called “bisect” that will do something analogous to our egg drop algorithm in our test suite. We have a known test failure in the larger suite, but it passes in isolation. And we want to know which currently unknown test is interfering and causing our known test failure. How would you tackle this? Well, you could remove half your tests (keeping your failing test) and see if you still get a test error when running the test suite. If you do, you remove half again, and repeat your testing. Do this until you get down to two tests – the known failing test, and the previously unknown (now known) passing test. But that is tedious work to do manually.

Lets take a look at Rspec bisect:

$ bundle exec rspec --seed 1234 --bisect
Bisect started using options: "--seed 1234"
Running suite to find failures... (3 minutes 14.1 seconds)
Starting bisect with 3 failing examples and 2475 non-failing examples.
Checking that failure(s) are order-dependent... failure appears to be order-dependent

Round 1: bisecting over non-failing examples 1-2475 . ignoring examples 1-1238 (1 minute 38.37 seconds)
Round 2: bisecting over non-failing examples 1239-2475 . ignoring examples 1239-1857 (57.25 seconds)
Round 3: bisecting over non-failing examples 1858-2475 . ignoring examples 1858-2166 (29.98 seconds)
Round 4: bisecting over non-failing examples 2167-2475 .. ignoring examples 2322-2475 (46.11 seconds)
Round 5: bisecting over non-failing examples 2167-2321 .. ignoring examples 2245-2321 (43.61 seconds)
Round 6: bisecting over non-failing examples 2167-2244 .. ignoring examples 2206-2244 (42.92 seconds)
Round 7: bisecting over non-failing examples 2167-2205 .. ignoring examples 2187-2205 (43.65 seconds)
Round 8: bisecting over non-failing examples 2167-2186 .. ignoring examples 2177-2186 (41.96 seconds)
Round 9: bisecting over non-failing examples 2167-2176 . ignoring examples 2167-2171 (24.15 seconds)
Round 10: bisecting over non-failing examples 2172-2176 .. ignoring examples 2175-2176 (36.95 seconds)
Round 11: bisecting over non-failing examples 2172-2174 .. ignoring example 2174 (37.3 seconds)
Round 12: bisecting over non-failing examples 2172-2173 . ignoring example 2172 (24.68 seconds)
Bisect complete! Reduced necessary non-failing examples from 2475 to 1 in 9 minutes 1 second.

The minimal reproduction command is:
  rspec ./spec/models/file_1.rb[1:4:3] ./spec/models/file_2.rb[1:1:1,1:1:2,1:1:3] --seed 1234

That is a lot of output. Lets break it down:

  • We call rspec with a given --seed. What this seed value does is determine the order of the tests. When --seed is omitted a random number is chosen which makes the tests order random. This is normally a good thing. But when bisecting, its a bad thing because we want to reproduce a certain arrangement of tests to reliably generate our failure.
  • The test suite initially takes 3 minutes 14 seconds to run. This is the longest test run because it is discovering which test failures we have so it must run the full suite.
  • The first round we ignore the first half of our tests. This takes half the time to run because we are running half the tests.
  • The second round we detected the failure from round one, so we know the half we tested includes the failure. We’ve just eliminated 50% of our tests from being the culprit. The next bisect will ignore the first half of the remaining pool and test again. We half the test run time again (tests don’t have exactly the same duration)
  • The third round we repeat what we did in round two, but with the remaining pool, and so on. If the failure is no longer detected, we test on the half we most recently removed.
  • Finally we get a completed message with a command to reproduce the failure. I can copy and paste this into a terminal and see the two files that are causing a failure.

rspec ./spec/models/file_1.rb[1:4:3] ./spec/models/file_2.rb[1:1:1,1:1:2,1:1:3] --seed 1234 What is that identity notation you might ask? ./spec/models/file_1.rb is straightforward. This is just the filename. The next part is a bit more interesting: [1:4:3]. This is telling us the nesting of the assertion that is failing when combined with the other file. Not only does Rspec bisect tell us the two files that are interfering each other, but which tests within those two files are interfering! The notation 1:4:3 is telling us its the first describe context, then the fourth context within that context, and finally the third test within those previous two contexts. In the case of the second notation: ./spec/models/file_2.rb[1:1:1,1:1:2,1:1:3] its just telling us that we have multiple tests that are failing (3 to be precise), separated by commas. Notice that the command also includes the seed value we used when bisecting. This is still important because your failure might be order dependent.

And what if your tests fail in a different order? Same principle. When you get an intermittent test failure when running the test suite that you suspect is related to order, note the seed value that was used. Perform the same rspec bisect algorithm (and the same command) and use this seed value instead of 1234. This will reproduce the test failure you got by running the same tests in the same order as when the failure was recorded.

The Rspec team did an awesome job with the bisect utility. This used to be a dreaded problem to face as someone running tests. To verify your fix you had to run the entire test suite just to know if your change worked. This would have been a 4 minute wait in the example above, but could easily have been much longer. When reproducing failures, you want the shortest interval possible. Running 4 tests is certainly faster than running 2475 tests!

Another shout out goes to the authors of the transpec gem that is used when upgrading versions of Rspec. This automatically changes your deprecated code with the newer style syntax. These two features together show that the Rspec team has a powerful and mature product.

More information:
* https://relishapp.com/rspec/rspec-core/docs/command-line/bisect
* http://rspec.info/blog/2015/06/rspec-3-3-has-been-released/
* https://en.wikipedia.org/wiki/Big_O_notation

Computers, Open-source, Ruby, Software

Upgrading to Rails 4

Recently, I started a new job and the first big assignment was to upgrade their software stack to a more recent version. They are running Rails 3.2 and wanted to upgrade as far forward as they can. With Rails 3.2 support gone for all but severe bug fixes, and Rails 5 due any month now, this is something they wisely didn’t want to put off.

Its a smaller company, and they have been open to a lot of my feedback and suggestions. I was basically given the reins and told to do what needed to be done to get us upgraded.

So the first task was some research, and I stumbled upon the official Rails upgrade guide pretty quickly. It nicely outlines the breaking changes. Fortunately, the big change was to strong parameters, but this can be deferred by including protected_attributes and kicking this can down the road. We will be logging what controller actions receive which parameters, instead of raising so we will have some time to collect some data before we switch over in one painful release.

The guides stressed that the test suite is critical during the upgrade. I was fortunate enough to have a project with adequate testing coverage. It wasn’t the 80% sweet spot, but it was certainly valuable at ~40%. However, the suite had fallen into disuse, so the first task was to get them back to green.

Once the test suite was green, it became a matter of KEEPING it green. Luck smiled a second time and they had an old CI server that had fallen into disuse. It was powered by CruiseControl.rb and it was little fuss to get it back up and running again. The migrations could no longer be played from the projects inception to the current time.

This is where luck stopped smiling upon me. The project did not track db/schema.rb and the migrations were not playable. The only way to get an instance of the database was to download the schema from production. Not the best practice, so I went about tracking the schema, and getting adoption of this new practice. Further complicating the schema approach was the decision to move all older migrations into subfolders in db/migrate by year (e.g. 2011, 2012, etc). This was done I found out because Textmate doesn’t like big directories. The issue is that db:schema:load isn’t recursive in its retrieval of migration versions. It took me a bit to understand what was happening, and how it was happening. After a failed monkey patch to the migrator logic in ActiveRecord, I decided to just move the migrations back into db/migrate and eliminate the subdirectories. Sorry Textmate!

Now the database could be rapidly provisioned, and I got a seed working with a minimal set of data. Back in CI I reconfigured the build script to use db:schema:load instead of db:migrate and with the green test suite, we got builds working again.

We used a utility called CC Menu to show the build status in the notification bar in OS X: http://ccmenu.org/

To make the builds even more visible, I discovered an integration with Slack to report the build status in our chat. https://github.com/epinault/cruisecontrolrb-to-slack . I made my own fork and added some famous movie quotes for successes and failures since I found the default messages lacking: https://github.com/bsimpson/cruisecontrolrb-to-slack . I didn’t think our female developers would appreciate the “you’re a stud!” message.

Back to the Rails 4 upgrade. The tests are passing in master, so I made a feature branch that will be long lived called “rails-upgrade”. I merge master in daily. The “rails-upgrade” branch will serve as an integration point for other features branches that will merge into it. The plan is to keep any upgrade related changes out of master until its time to deploy. That means separate branches, separate CI builds, and separate staging servers for manual QA.

One lesson I’ve learned is that a deprecation warning may not always be just informational. In particular, Rails 4 requires all scopes to be callable (lambdas, or procs). This was breaking the way that associations with scopes would be built: users.roles.admin.find_or_create! would previously find an associated admin record, or create it. However, in Rails 4, it fails creation because the role’s reference to user is nil. I’m not sure why, but its reproducable, and changing the admin scope on Role to a callable restores this reference back to user.

Ideally, I’d have wanted to get the test suite green before tackling deprecation warnings because I want to change as little as possible before I get back to a known good status. However, not fixing this deprecation warning was actually causing tests to fail.

Now we are down to a handful of failings tests on Rails 4. Most deal with the ActiveRecord syntax changes. Hopeful I can get these knocked out quickly. Then its on to manual QA.

In summary – get your test suite green. Keep it green. Do the upgrade and get it back to green. Then make any changes to remove deprecation warnings, keeping the suite green. The test suite is your guide during an upgrade like this.

Computers, Events, Family, Ruby

First day at Influence Health

Influence Health
October 21st 2015 it was announced that Barclays would be laying us off, and shutting down our division. I’ve never been laid off before, and the emotions went from shock, disbelief, anger, and even sometimes relief. It was finally over. All of the technical debt, and poor business decisions were something in the past. Over the next few months I would be interviewing, relaxing and getting ready to work at a new place, with new people.

I had a lot of decisions to make. The first big decision was whether I was going to work full time remote again. There are a lot of perks, but there are downsides too. I felt lonely, isolated, disconnected. My wife didn’t understand what I did all day. I had no coworkers I could get drinks with, or swap horror stories with over lunch. And with two kids in the house, it has become quite noisy, and I find myself getting frustrated when our youngest has a bad day and spends it crying. And it can be stressful on a marriage to be around each other 24/7. Nothing ever changes. Little things start to become big things.

I decided that for general health, I would be looking for a job with a flexible work schedule. A few days at home, a few days in the office. Once I narrowed down what type of arrangement I wanted, then it was on to the culture. This time around I have enough experience that I felt comfortable donning the title of a senior software developer. I’ve seen enough good and bad that I can tell the differences between them.

Next was the work culture. Did I want another big company with more people than I could ever memorize? A startup struggling to become solvent? Something in between? The size of the company dictates the culture to some extent. The bank culture was slow, fearful, and inefficient. I wanted to get my hands dirty, but I wasn’t ready for 80 hour work weeks in a startup trying to get off the ground. I wanted something smaller, but established. And with a laid back culture. I didn’t want to wear a suit everyday, and I’d like the freedom to not be stuck in a cubicle.

I settled on a fun looking company called Influence Health (formerly BrightWhistle) in Midtown, Atlanta. They are right next to Piedmont Park, and its quite a drive to get here from Johns Creek. But with flexible hours, I miss the rush hour madness, and with 3 days at home, I’m not sitting in the car that often. Though, it is fun to sit in the Mazda RX-8!

The first day was a 9:30 (worst possible time due to traffic) orientation. Then it was on to unpacking my equipment while I met the team, and then I spent the rest of the day juggling orientation tasks and setting up my new laptop the way that I like it.

I decided to use Git to manage my dot files as I’ve read about a few other developers doing. You can see my progress here: https://github.com/bsimpson/.vim

I like the incremental approach, and I’ve expanded it a bit to make it super easy.

The first difficult task (outside of basic environment setup like Rbenv, Ruby, MySQL, Sphinx, etc) was the bundling of gems. This led to quite a few dependencies that couldn’t install because it couldn’t find the native libraries. This was a hit and miss adventure, but I’m reaching the end and I’ve taken notes. I’ve asked if there is a developer wiki somewhere to keep all of this painful learning. Maybe I will push to implement one if I can get some support behind the idea.

The laptop was imaged, but it provides enough freedom to configure however I would like. They use Microsoft Office, which isn’t great, but the Office365 web client has superior searching than its desktop counterpart Outlook so I guess I will just stick with that. They use Slack for communications, and the Mac app makes juggling all of my accounts very simple.

Overall I’m pleased with the team, the culture, and the equipment they have provided to me. I feel rested, focused, and ready to tackle this new job head on. Its been strange to sit in an office with other people after four years of remote work, but I’m adjusting. My voice is worn out, since I never typically speak at home.

Anyway, stay tuned for the next episodic blog post of my adventures in Midtown!

Open-source, Ruby, Software, Thoughts, Uncategorized

Implementing a Configuration

Have you ever seen a configuration in Ruby that yields to a block where properties are set on the yielded object? Rails does this with its environments files:

Your::Application.configure do
  config.cache_classes                     = true
  config.consider_all_requests_local       = false
  config.action_controller.perform_caching = true
  ...
end

Devise is another great example of this pattern:

Devise.setup do |config|
  config.secret_key = '38c8e4958385982971f'
  config.mailer_sender = "noreply@email.com"
  config.mailer = "AuthenticationMailer"
  ...
end

This is an established way to pass in configuration options in the Ruby world. How does this work? What are the alternatives?

Under the Hood

Lets start with our own implementation from scratch. We know that we are calling a method (Application.configure, or Devise.setup) and that method yields to a block:

module Example
  class << self
    def configure
      yield Config
    end
  end
end

You can see that this will yield the class Config when we call Example.configure and pass in block. Config will need to define our properties:

module Example
  class Config
    class << self
      attr_accessor :api_url, :consumer_key, :consumer_secret
    end
  end
end

We can now call our configure method and pass in a block:

Example.configure do |config|
  config.api_url = "http://api.com/v1/"
end

If we try to set a property that does not exist in our Example::Config class, then we get a helpful NoMethodError:

Example.configure do |config|
  config.not_a_real_config_setting = "http://api.com/v1/"
end

# => NoMethodError: undefined method `not_a_real_config_setting=' for Example::Config:Class

This helps to define your configuration interface. Options are explicitly declared. They cannot be mistyped, or dynamically added without an explicit runtime error.

In practice, The definitions of Example, and Example::Class would live in a gem. The Example.configure invokation would live within the code that wants to configure the properties of that gem with things such as credentials, URLs, etc. This seperation of concerns makes sense. Gems should not know anything about the business logic of your application. It should be instantiated with configuration options passed in from the project. However the project should only be aware of the public interface of the gem. We are agreeing upon where these pieces of information are moving from one domain to another. The project doesn’t know where this information will be used, and the gem doesn’t know what the information is until we pass it. So far so good!

Using this Configuration

Now that we’ve passed our information inside the configuration block, we can reference these class level (static) properties in our gem:

module Example
  class Request
    def self.get(path)
      request = Net::HTTP.get(URI.parse(ExampleConfig.api_url + path))
      request['Authorization'] = auth_header(:get, ExampleConfig.api_url)
    end

    private

    def self.auth_header(request_type, uri)
      SimpleOAuth::Header.new(request_type, uri.to_s, {},
                              {consumer_key: Config.consumer_key,
                              consumer_secret: Config.consumer_secret}).to_s
    end
  end
end

This will do a simple GET request passing in SimpleOAuth headers. Inside the get method we call Config.api_url to know where the API lives. This was set by us earlier using the Config object. SimpleOAuth headers are supplied by again calling the Config. You would invoke it like so:

Example.configure do |config|
  config.api_url = "http://api.com/v1/"
  consumer_key = "1895-1192-1234"
  consumer_secret = '76asdfh3heasd8f6akj3hea0a9s76df'
end

Example::Request.get('/products') # => {products: [product1, product2, etc]...}"
Example::Request.get('/users') # => "{users: [user1, user2, etc]...}"

Example::Config becomes the holding location for your configuration information. And by making the properties static, you don’t have to worry about passing around a reference to the instance.

Alternatives

If the yielding to a block is a little too clever for you, you can always instantiate a class and pass in the configuration as part of the constructor:

class Example
  class << self
    attr_accessor :api_url, :consumer_key, :consumer_secret

    def config(api_url:, consumer_key:, consumer_secret:)
      self.api_url = api_url
      self.consumer_key = consumer_key
      self.consumer_secret = consumer_secret
    end
  end
end

This can be instantiated like so:

Example.config(
  api_url: "http://api.com/v1/"
  consumer_key: "1895-1192-1234"
  consumer_secret: '76asdfh3heasd8f6akj3hea0a9s76df'
)

Example.api_url # => "http://api.com/v1/"

This feels less encapsulated to me. Instead of having an interface for our configuration settings, we are just settings properties directly onto a class.

What are your thoughts? What advantages does the block style configure offer over the alternative above?

For more information on the Ruby gem configuration pattern see this excellent blog post:

Computers, Ruby, Software

Lets Talk Strategy

Have you ever used a promotional code on an e-commerce website? How does the promotional code know what items are eligible? Is it a fixed amount off, or a percentage? Does it include shipping and handling? Was there a minimum purchase amount you had to meet to be eligible? These are all design considerations that needed to be taken into account, and the strategy pattern was a great way to represent this in my codebase.

My new requirement was to incorporate a promotional code that took 10% off the shopping cart total (excluding certain items of course – why can requirements never be simple?!) The system already had an existing promotional code system, but it was rudimentary, and only worked for fixed amounts off the total (e.g. $5 off your total).

The strategy pattern is defined as having these qualities:

  • A strategy can be thought of as an algorithm that is conditionally applied at runtime based on conditions.
  • The strategy encapsulates the execution of that algorithm.
  • Finally a strategy is an interface for the algorithm allowing them to be interchangeable.

Implementing the Strategy Pattern

Lets look at the existing promotional code logic before I implement the logic to take a percentage discount:

# app/models/shopping_cart.rb
...
has_many :items

# Returns the most expensive item in the shopping cart
def costliest_item
  items.map { |a,b| b.price <=> a.price}.first
end

def apply_promotional_code!(promotional_code)
  costliest_item.apply_promotional_code! promotional_code
  shopping_cart.total -= promotional_code.amount
end

For added context, a promotional code has a code we can lookup, and an amount property that tells us how much that promotional code is valued at.

We have a ShoppingCart which has many Items associated with it. When we call apply_promotional_code! and pass in a promotional code (e.g. “HOTSUMMER”), it returns the costliest item in the shopping cart (based on price) and associates the promotional code against it.

Why lookup the costliest item at all? Because we want to show an itemized receipt of each item total in the shopping cart. We then subtract this promotional code amount from the shopping basket total, which is the amount that will be charged to the customer’s credit card.

Adding a percentage based promotional code

If we were to change this to a percentage based discount (e.g. 10%) instead of a fixed amount (e.g. $5) we would need to change our calculation in the apply_promotional_code! method to something more complex:

def apply_promotional_code!(promotional_code)
  costliest_item.apply_promotional_code! promotional_code
  if promotional_code.amount.present?
    shopping_cart.total -= promotional_code.amount
  else
    shopping_cart.total -= shopping_cart.total * (promotional_code.percentage / 100.0)
  end
end

We have also changed the number of items this can be applied against (shown in our itemized receipt). If the promotional code is 10% off of the total, then it is 10% off of each item in the shopping cart. This means we will have to revisit our costliest_item method to make it return an array of zero or more items, instead of a single item:

# Returns an array of eligible items for the promotional code
def eligible_items
  if promotional_code.amount.present?
    Array(items.map { |a,b| b.price <=> a.price}.first)
  else
    items
  end
end

def apply_promotional_code!(promotional_code)
  eligible_items.each {|item| item.apply_promotional_code! promotional_code }
  if promotional_code.amount.present?
    shopping_cart.total -= promotional_code.amount
  else
    shopping_cart.total -= shopping_cart.total * (promotional_code.percentage / 100.0)
  end
end

With the addition of a percentage based promotional code alongside the fixed amount promotional code, we have introduced more complexity. apply_promotional_code! now has to be aware of how to apply both a fixed, and a percentage based amount. Further, the eligible_items (formerly costliest_item) now needs to be aware of the type of promotional code. Our promotional logic is spilling over into areas that need not be concerned with promotional logic.

Encapsulating via a strategy

Instead of concerning apply_promotional_code! with the knowledge of all possible types of promotional code types, lets decompose this into two separate strategies. The first will be our original fixed amount against the costliest item in the shopping cart. The second will be our new percentage based discount off the entire shopping cart:

# app/models/promotional_codes/stategies/fixed_amount_against_costliest_item.rb
class FixedAmountAgainstCostliestItem
  attr_accessible :shopping_cart, :promotional_code

  def initialize(items, promotional_code)
    self.shopping_cart = shopping_cart
    self.promotional_code = promotional_code
  end

  def discount
    promotional_code.amount
  end

  def eligible_items
    shopping_cart.eligible_items
  end
end

class PercentageAgainstMultipleItems
  attr_accessible :shopping_cart, :promotional_code

  def initialize(items, promotional_code)
    self.shopping_cart = shopping_cart
    self.promotional_code = promotional_code
  end

  def discount
    shopping_cart.total * (promotional_code.percentage / 100.0)
  end

  def eligible_items
    shopping_cart.eligible_items
  end
end

A few things to note: Firstly there are now two classes, and each is responsible for a single type of promotional code application. Secondly, there is similarity in these two classes. This is a good sign that we want to refactor and make a common super class that these inherit from. Similarity means there is a good chance we can accomplish interchangeability. Lastly, we have proxied the logic of what items are eligible through our two new strategies. As a next pass we will move the eligible_items logic to reside with the strategy. This keeps our promotional code logic in one place:

class BasePromotionalStrategy
  attr_accessible :shopping_cart, :promotional_code

  def initialize(items, promotional_code)
    self.shopping_cart = shopping_cart
    self.promotional_code = promotional_code
  end

  def discount
    raise NotImplementedError
  end

  def eligible_items
    raise NotImplementedError
  end
end

class FixedAmountAgainstCostliestItem < BasePromotionalStrategy
  def discount
    promotional_code.amount
  end

  def eligible_items
    Array(items.map { |a,b| b.price <=> a.price}.select { |item| item.price >= promotional_code.amount }.first)
  end
end

class PercentageAgainstMultipleItems < BasePromotionalStrategy
  def discount
    promotional_code.amount
  end

  def eligible_items
    items
  end
end

Now that we have extracted the common elements into a BasePromotionalStrategyclass, both FixedAmountAgainstCostliestItem and PercentageAgainstMultipleItemscan extend it. With the boilerplate constructor logic out of the way, we can define our interface with method placeholders. As an example of what eligible_items might guard against, we will check that the item cost is greater than or equal to the promotional_code.amount to ensure our shopping cart has a positive balance. (So much for early retirement by buying thousands of paperclips one at a time with a $5 off coupon!). Percentage based promotional codes have a sliding value so there is no minimum spend we need to guard against. You can see that we just return all items as eligible_items.

Regardless of whether we instantiate FixedAmountAgainstCostliestItem or aPercentageAgainstMultipleItems, we can call discount to calculate the amount taken off the shopping cart, and we can call eligible_items to return a list of items that the itemized list should show the promotional discount applying to. In the case of theFixedAmountAgainstCostliestItem it will always be an empty array, or an array with one item, but we consistently return an array type regardless of the number of entities it contains. Consistent return types are an interface must!

Calling our new strategies

Now that we have implemented (and written tests – right?) for our strategy, we can revisit our caller to take advantage of this new logic:

# app/models/promotional_code.rb
class PromotionalCode
  ...
  def strategy
    if amount.present?
      PromotionalCode::Strategies::FixedAmountAgainstCostliestItem
    else
      PromotionalCode::Strategies::PercentageAgainstMultipleItems
    end
  end
end

# app/models/shopping_cart.rb
...
has_many :items

def apply_promotional_code!(promotional_code)
  strategy = promotional_code.strategy.new(self, promotional_code)
  strategy.eligible_items.each { |item| item.apply_promotional_code!(promotional_code) }
  shopping_cart.total -= strategy.discount
end

The PromotionalCode is responsible for knowing which strategy will be executed when it is applied to a shopping cart. This strategy method returns a class reference. This reference is then instantiated passing in the necessary arguments of the shopping_basket, and the promotional_code. This is everything the shopping cart needs to know about executing a promotional code strategy. The ShoppingCart does not know which strategy to instantiate, but simply which arguments to pass to the return result of strategy.

Further, ShoppingBasket now knows nothing about what items are eligible for a PromotionalCode. That logic again resides within the strategy definition. Our promotional code domain is nicely encapsulated. This adheres to the open/closed principle of good object oriented design. Adding a new type of promotional code (e.g. buy one get one free, or a promotional code with a minimum spend) would be as easy as adding a new strategy, and telling the PromotionalCode#strategy when to apply it.

NullStrategy

You might have noticed that we use an if...else as opposed to an if...elsif (for simplicity sake). This means that the percentage based promotional code strategy is returned more often than it should be. Ideally, we’d want to return an inert strategy that confirms to our interface but doesn’t actually give any discounts to our shopping cart. This is where a Null Object pattern could be used in conjunction with a strategy:


# app/models/promotional_codes/stategies/null_strategy.rb
class NullStrategy < BasePromotionalStrategy
  def discount
    0
  end

  def eligible_items
    []
  end
end

# app/models/promotional_code.rb
class PromotionalCode
  ...
  def strategy
    if amount.present?
      PromotionalCode::Strategies::FixedAmountAgainstCostliestItem
    elsif percentage.present?
      PromotionalCode::Strategies::PercentageAgainstMultipleItems
    else
      PromotionalCode::Strategies::NullStrategy
    end
  end
end

One additional benefit of this strategy pattern is the ease of testing. PromotionalCode#strategy could easily be stubbed to return this NullStrategy and still remain consistent with the interface we’ve established in our other strategies. Now in testing, we can simply stub the NullStrategy and continue to chain our interface method calls:

 PromotionalCode.any_instance.stubs(strategy: PromotionalCode::Strategies::NullStrategy) 
 $> PromotionalCode.new.strategy # => PromotionalCode::Strategies::NullStrategy

 $> PromotionalCode.new.discount # => 0
 $> PromotionalCode.new.eligible_items # => []
 ...
 $> shopping_basket.total -= promotional_code.discount

Conclusion

Strategies can be a great way to split different algorithms apart. An indicator that you could benefit from a formalized strategy pattern is when you find conditionals in your methods that change the way something is calculated based on context. In our case, we had a conditional (if...else) that checked whether amount was set on a PromotionalCode. Another indicator was the need to encapsulate the concern of a PromotionalCode. Its neither a concern of ShoppingBasket nor Item. It needed its own home. By implementing a strategy class, we give this logic an obvious place to live.

A strategy can be a great tool when you have a requirement that lives alongside an existing requirement that changes how a calculation is done at runtime.

Additional Information on strategies can be found at https://sourcemaking.com/design_patterns/strategy