Ruby, Uncategorized

Testing and External Calls

Why Are External Calls Bad?

When writing unit tests, the principle is to test each component in isolation from other components. If your tests fail because of concerns outside of your subject, then this is unintentional coupling. A call to an external network resource also (S3, Pusher, Facebook, Google, etc) introduces brittleness, and will slow down your test. Network requests are very expensive compared to memory and local disk access. These quickly add up and cause a slow running test suite.

VCR

Reference a Ruby gem in the testing Gemfile group of your Rails project for VCR. It is responsible for recording request/response traffic (complete with headers, params, etc) and serializing it as a testing fixture. This fixture can then be used by the test instead of making an external network call.
VCR documentation at: https://relishapp.com/vcr/vcr/docs

Webmock

External network calls via a number of avenues are disabled in testing via the Webmock gem. These methods include Net:HTTP, Curl::Easy, etc. If you can think it up, its probably blocked. This is intentional for the reasons listed earlier. Exceptions should not be made in tests.
Webmock documentation at: https://github.com/bblimke/webmock

How to Write a Test Using VCR

Take for example an Image class. Pretend it is responsible (in part) for uploading a file to Amazon’s S3 service. This is hosted on AWS and is an external network call. Currently, creating an image in a test will fail with an exception similar to the following:

it 'uploads an image' do
  FactoryGirl.create(:image) # this fails
end
VCR::Errors::UnhandledHTTPRequestError:
================================================================================
An HTTP request has been made that VCR does not know how to handle:
  PUT https://bwtesting.s3.amazonaws.com/uploads/image/file/1/logo.png

The error explains that there is currently no cassette to play. We can create a cassette to be used going forward by temporarily enabling external network requests:

# spec/vcr_helper.rb
require 'vcr'

VCR.configure do |config|
  config.cassette_library_dir = "spec/fixtures"
  config.hook_into :webmock
  config.allow_http_connections_when_no_cassette = true
  config.configure_rspec_metadata!
end

Uncomment the line

config.allow_http_connections_when_no_cassette = true

Now instruct your test to use VCR by adding an Rspec metadata tag of :vcr:

it 'uploads an image', :vcr do
  FactoryGirl.create(:image)
end

Now you can run your test again with a passing result:
1 example, 0 failures, 1 passed

You will also note a new file has been created under spec/fixtures. The specific path and file name are based on the location of your test. (Note if you rename your test, be sure to rename the fixture accordingly). Look at the fixture and you will see something like:

---
http_interactions:
...
- request:
    method: put
    uri: https://bwtesting.s3.amazonaws.com/uploads/image/file/1/logo.png
...

One fixture can contain multiple network call representations.
If we comment back out in spec/vcr_helper.rb we should now be able to run our spec without a network connection:

# spec/vcr_helper.rb
require 'vcr'

VCR.configure do |config|
  config.cassette_library_dir = "spec/fixtures"
  config.hook_into :webmock
  # config.allow_http_connections_when_no_cassette = true
  config.configure_rspec_metadata!
end

Matching a Cassette

How does VCR know which cassette recording to use when mocking an external network request? The default behavior is to match on method (GET, POST, PUT, etc) and the URI (the full URL of the resource)

You might have guessed that this can cause an issue with RESTful resources, or anything with a unique identifier in the URL. Take this example:

it 'uploads an image', :vcr do
  FactoryGirl.create(:image)
  FactoryGirl.create(:image)
end

We create two images. This will now cause our test to fail with an error from VCR:

VCR::Errors::UnhandledHTTPRequestError:
================================================================================
An HTTP request has been made that VCR does not know how to handle:
  PUT https://bwtesting.s3.amazonaws.com/uploads/image/file/2/logo.png
VCR is currently using the following cassette:
  - spec/fixtures/Image/uploads_an_image.yml
  - :record => :once
  - :match_requests_on => [:method, :uri]

This failed because the URI we previously recorded was for
https://bwtesting.s3.amazonaws.com/uploads/image/file/1/logo.png
and now we have the URI
https://bwtesting.s3.amazonaws.com/uploads/image/file/2/logo.png

Changing Matching behaviors

VCR fortunately provides flexibility in how to match an HTTP request. As part of the metadata, you can specify what to match on. For our test, we don’t really care about the specific URL, just that we simulate a good response from Amazon. This means its sufficient to match on method and host. In your test you can specify by doing:

it 'uploads an image' do
  VCR.use_cassette('spec/fixtures/Image/uploads_an_image', match_requests_on: [:method, :host]) do
    FactoryGirl.create(:image)
    FactoryGirl.create(:image)
  end
end

Note that :vcr is gone from our Rspec metadata. We have wrapped the code responsible for issuing the external network call in an explicit VCR block that allows us to match on different criteria of the HTTP request. This test should now pass.

Custom Matching

Maybe the built in match_requests_on attributes aren’t fine grained enough for your test. In that case, you can write your own matcher. See the example below for how to match on part of the path of a URI:

it 'uploads an image' do
  VCR.use_cassette('spec/fixtures/Image/uploads_an_image', match_requests_on: [:method, :host, method(:s3_matcher)]) do
    FactoryGirl.create(:image)
    FactoryGirl.create(:image)
  end
end

def s3_matcher(request_1, request_2)
  !!URI(request_1.uri).path[/\/uploads\/image\/file\/\d+/]
end

I’ve added a new method into the array of values passed to
match_requests_on
This method takes two arguments (it iterates over each request in the fixture and compares it to this specific request until it finds a match). The return value of this method must be a boolean. If its true, then the match is made against one of the requests. If the match is false, it tries the next HTTP request until it runs out of options, and fails.

Computers, Open-source, Ruby

Rspec Bisect

During a journey to upgrade to Rails 4, the test suite was crucial for success. I ran the test suite and recorded all of the test failures. Then I went through test by test fixing them until I was all done. Barely containing my excitement I pushed to remote and waited. Sadly, our CI server reported a build failure. What the heck?! The tests all passed in isolation. Why do they not pass when run together? Well…

The mark of a good test is that they can be run in any order and still pass. In order to accomplish this, we need each test (or the suite at large) to be responsible for setting up its own data, and tearing down that data when its finished. You want to leave the state of everything exactly like how you found it before the test was run. This can include clearing caches, truncating database tables, or rolling back transactional changes. Many test suites including Rspec have the capability to do these things for you without much headache.

But what happens when your tests are run in a different order and fail? Or what happens when your tests pass in isolation, but fail when run in part of the larger test suite? Likely something in an unknown test is causing your test failure. And finding which other test is being naughty can be a real pain. Fortunately, this is the case for a new feature in Rspec 3.3 called bisect.

In case you are not familiar with bisect its a simple, but powerful algorithm. Suppose you want to drop an egg off a building at different floors to discover the lowest floor on which the egg will break. There are multiple ways to accomplish this task. The most wasteful way would be to start at floor one, drop the egg, if it doesn’t break, go to floor two and repeat. When you drop the egg and it breaks, then you have your answer. This is slow. In Big O notation, this would be O(n) where n is the number of floors in the building.

We can do better. What if we could cut the floors that we test on in half each time we perform our egg drop test? For argument sake, if the building has 100 floors (and we have an unusually strong egg!) we might bisect the floors and start at floor 50. If we drop the egg and it breaks, we know that the answer is somewhere between 1 and 50. If it doesn’t break, we know the answer is between 51 and 100. Lets assume it breaks. We can then do our next test (with a new egg) on floor 25. If it breaks, we know our answer is between 1 and 25. If it doesn’t break, we know our answer is between floors 26 and 50. Just by dropping two eggs on different floors, we’ve cut our sample size from 100 to 25. In Big O notation, this would be O(n/2). The number of times we have to do this test is the number of powers of 2 we can divide into our number of floors. On a 100 floor building, we would have our answer in ~6 tests (2^6). We are going to save a lot of eggs.

So why are we talking about eggs when we have test failures to fix? Rspec offers a sweet utility called “bisect” that will do something analogous to our egg drop algorithm in our test suite. We have a known test failure in the larger suite, but it passes in isolation. And we want to know which currently unknown test is interfering and causing our known test failure. How would you tackle this? Well, you could remove half your tests (keeping your failing test) and see if you still get a test error when running the test suite. If you do, you remove half again, and repeat your testing. Do this until you get down to two tests – the known failing test, and the previously unknown (now known) passing test. But that is tedious work to do manually.

Lets take a look at Rspec bisect:

$ bundle exec rspec --seed 1234 --bisect
Bisect started using options: "--seed 1234"
Running suite to find failures... (3 minutes 14.1 seconds)
Starting bisect with 3 failing examples and 2475 non-failing examples.
Checking that failure(s) are order-dependent... failure appears to be order-dependent

Round 1: bisecting over non-failing examples 1-2475 . ignoring examples 1-1238 (1 minute 38.37 seconds)
Round 2: bisecting over non-failing examples 1239-2475 . ignoring examples 1239-1857 (57.25 seconds)
Round 3: bisecting over non-failing examples 1858-2475 . ignoring examples 1858-2166 (29.98 seconds)
Round 4: bisecting over non-failing examples 2167-2475 .. ignoring examples 2322-2475 (46.11 seconds)
Round 5: bisecting over non-failing examples 2167-2321 .. ignoring examples 2245-2321 (43.61 seconds)
Round 6: bisecting over non-failing examples 2167-2244 .. ignoring examples 2206-2244 (42.92 seconds)
Round 7: bisecting over non-failing examples 2167-2205 .. ignoring examples 2187-2205 (43.65 seconds)
Round 8: bisecting over non-failing examples 2167-2186 .. ignoring examples 2177-2186 (41.96 seconds)
Round 9: bisecting over non-failing examples 2167-2176 . ignoring examples 2167-2171 (24.15 seconds)
Round 10: bisecting over non-failing examples 2172-2176 .. ignoring examples 2175-2176 (36.95 seconds)
Round 11: bisecting over non-failing examples 2172-2174 .. ignoring example 2174 (37.3 seconds)
Round 12: bisecting over non-failing examples 2172-2173 . ignoring example 2172 (24.68 seconds)
Bisect complete! Reduced necessary non-failing examples from 2475 to 1 in 9 minutes 1 second.

The minimal reproduction command is:
  rspec ./spec/models/file_1.rb[1:4:3] ./spec/models/file_2.rb[1:1:1,1:1:2,1:1:3] --seed 1234

That is a lot of output. Lets break it down:

  • We call rspec with a given --seed. What this seed value does is determine the order of the tests. When --seed is omitted a random number is chosen which makes the tests order random. This is normally a good thing. But when bisecting, its a bad thing because we want to reproduce a certain arrangement of tests to reliably generate our failure.
  • The test suite initially takes 3 minutes 14 seconds to run. This is the longest test run because it is discovering which test failures we have so it must run the full suite.
  • The first round we ignore the first half of our tests. This takes half the time to run because we are running half the tests.
  • The second round we detected the failure from round one, so we know the half we tested includes the failure. We’ve just eliminated 50% of our tests from being the culprit. The next bisect will ignore the first half of the remaining pool and test again. We half the test run time again (tests don’t have exactly the same duration)
  • The third round we repeat what we did in round two, but with the remaining pool, and so on. If the failure is no longer detected, we test on the half we most recently removed.
  • Finally we get a completed message with a command to reproduce the failure. I can copy and paste this into a terminal and see the two files that are causing a failure.

rspec ./spec/models/file_1.rb[1:4:3] ./spec/models/file_2.rb[1:1:1,1:1:2,1:1:3] --seed 1234 What is that identity notation you might ask? ./spec/models/file_1.rb is straightforward. This is just the filename. The next part is a bit more interesting: [1:4:3]. This is telling us the nesting of the assertion that is failing when combined with the other file. Not only does Rspec bisect tell us the two files that are interfering each other, but which tests within those two files are interfering! The notation 1:4:3 is telling us its the first describe context, then the fourth context within that context, and finally the third test within those previous two contexts. In the case of the second notation: ./spec/models/file_2.rb[1:1:1,1:1:2,1:1:3] its just telling us that we have multiple tests that are failing (3 to be precise), separated by commas. Notice that the command also includes the seed value we used when bisecting. This is still important because your failure might be order dependent.

And what if your tests fail in a different order? Same principle. When you get an intermittent test failure when running the test suite that you suspect is related to order, note the seed value that was used. Perform the same rspec bisect algorithm (and the same command) and use this seed value instead of 1234. This will reproduce the test failure you got by running the same tests in the same order as when the failure was recorded.

The Rspec team did an awesome job with the bisect utility. This used to be a dreaded problem to face as someone running tests. To verify your fix you had to run the entire test suite just to know if your change worked. This would have been a 4 minute wait in the example above, but could easily have been much longer. When reproducing failures, you want the shortest interval possible. Running 4 tests is certainly faster than running 2475 tests!

Another shout out goes to the authors of the transpec gem that is used when upgrading versions of Rspec. This automatically changes your deprecated code with the newer style syntax. These two features together show that the Rspec team has a powerful and mature product.

More information:
* https://relishapp.com/rspec/rspec-core/docs/command-line/bisect
* http://rspec.info/blog/2015/06/rspec-3-3-has-been-released/
* https://en.wikipedia.org/wiki/Big_O_notation

Apple, Computers, Linux, Open-source, Ruby, Software, Thoughts, Web

PostgreSQL for Ruby on Rails on Ubuntu

My new desktop came in at work this week, and the installation was painless thanks to the great driver support of Ubuntu 11.10. For anyone setting up a Rails development box based on Linux, I have some tips to get around some pain points when using a PostgresSQL database.

Installation:

Postgres can be quickly and easily installed using apt-get on Debian or Ubuntu based distributions. Issue the command:

apt-get install postgresql

Ruby Driver

In order for Ruby to connect to PostgreSQL databases, you will need to install the pg gem. This gem will need the development package of PostgreSQL to successfully build its native extension. To install the PostgreSQL development package, issue the following command:

apt-get install libpq-dev # EDIT: postgresql-dev was replaced by this package on Ubuntu 11.10

Setup A PostgreSQL Role

You can configure PostgreSQL to allow your account to have superuser access, allowing your Rails tasks to create and drop databases. This is useful for development, but is strongly discouraged for a production. That being said, we can create a PostgreSQL role by logging into psql as postgres as follows:

su postgres -c psql

This will open a PostgreSQL prompt as the database owner postgres. Next, we need to create an account for our user. This should match the response from “whoami”:

create role  superuser login;

We can now exit from psql by issuing “q“. Try to connect to psql directly by issuing the following command from your shell account:

psql postgres

This should allow you to connect to the default database postgres without being prompted for credentials. You should now be able to issue the rake commands for creating, and dropping the database:

rake db:create

Rspec Prompts for Credentials

I was being prompted by Rspec for credentials when running my test suite. If you would like to remove this credential prompt, please read the following:

There are differences in how the PostgreSQL package is configured in Homebrew on OS X, and how it is packaged in the Ubuntu and across other distributions. One difference is in the level of security configured in the pg_hba.conf file. This file is responsible for identifying which sources using which authentication mechanisms should be allowed or denied. By default, Rspec will cause a prompt for a password even if your shell account has trusted permissions. This is because Rspec connects not as a local process, but to localhost. To allow connections to localhost to be trusted, you will need to modify the pg_hba.conf file.

Next, we can modify the pg_hba.conf file located at /etc/postgresql/<version>/main/pg_hba.conf

Comment out the lines any lines at the bottom of the file and append the following:

local   all             all                                      trust
host    all             all              127.0.0.1/32            trust
host    all             all              ::1/128                 trust

This will allow connections from the shell, as well as connections to 127.0.0.1 (localhost) using both IPv4 and IPv6.

You will need to restart PostgreSQL for the changes from this file to take affect:

/etc/init.d/postgresql restart

PostgreSQL Extensions

If you want to make use of any of the additional extensions to Postgres, including fuzzystrmatching, you will need to install the postgresql-contrib package:

apt-get install postgresql-contrib

The extensions will install to /usr/share/postgresql/<version>/extension/

Using the Postgres version 9, you can create these extensions in your database by using the new CREATE EXTENSION syntax. In the case of the fuzzystrmatch extensions, you can issue the following command from inside a PostgresSQL command prompt to load the extensions:

psql ;

Once inside your database:

create extension fuzzystrmatch;