Open-source, Ruby, Software, Thoughts, Uncategorized

Implementing a Configuration

Have you ever seen a configuration in Ruby that yields to a block where properties are set on the yielded object? Rails does this with its environments files:

Your::Application.configure do
  config.cache_classes                     = true
  config.consider_all_requests_local       = false
  config.action_controller.perform_caching = true
  ...
end

Devise is another great example of this pattern:

Devise.setup do |config|
  config.secret_key = '38c8e4958385982971f'
  config.mailer_sender = "noreply@email.com"
  config.mailer = "AuthenticationMailer"
  ...
end

This is an established way to pass in configuration options in the Ruby world. How does this work? What are the alternatives?

Under the Hood

Lets start with our own implementation from scratch. We know that we are calling a method (Application.configure, or Devise.setup) and that method yields to a block:

module Example
  class << self
    def configure
      yield Config
    end
  end
end

You can see that this will yield the class Config when we call Example.configure and pass in block. Config will need to define our properties:

module Example
  class Config
    class << self
      attr_accessor :api_url, :consumer_key, :consumer_secret
    end
  end
end

We can now call our configure method and pass in a block:

Example.configure do |config|
  config.api_url = "http://api.com/v1/"
end

If we try to set a property that does not exist in our Example::Config class, then we get a helpful NoMethodError:

Example.configure do |config|
  config.not_a_real_config_setting = "http://api.com/v1/"
end

# => NoMethodError: undefined method `not_a_real_config_setting=' for Example::Config:Class

This helps to define your configuration interface. Options are explicitly declared. They cannot be mistyped, or dynamically added without an explicit runtime error.

In practice, The definitions of Example, and Example::Class would live in a gem. The Example.configure invokation would live within the code that wants to configure the properties of that gem with things such as credentials, URLs, etc. This seperation of concerns makes sense. Gems should not know anything about the business logic of your application. It should be instantiated with configuration options passed in from the project. However the project should only be aware of the public interface of the gem. We are agreeing upon where these pieces of information are moving from one domain to another. The project doesn’t know where this information will be used, and the gem doesn’t know what the information is until we pass it. So far so good!

Using this Configuration

Now that we’ve passed our information inside the configuration block, we can reference these class level (static) properties in our gem:

module Example
  class Request
    def self.get(path)
      request = Net::HTTP.get(URI.parse(ExampleConfig.api_url + path))
      request['Authorization'] = auth_header(:get, ExampleConfig.api_url)
    end

    private

    def self.auth_header(request_type, uri)
      SimpleOAuth::Header.new(request_type, uri.to_s, {},
                              {consumer_key: Config.consumer_key,
                              consumer_secret: Config.consumer_secret}).to_s
    end
  end
end

This will do a simple GET request passing in SimpleOAuth headers. Inside the get method we call Config.api_url to know where the API lives. This was set by us earlier using the Config object. SimpleOAuth headers are supplied by again calling the Config. You would invoke it like so:

Example.configure do |config|
  config.api_url = "http://api.com/v1/"
  consumer_key = "1895-1192-1234"
  consumer_secret = '76asdfh3heasd8f6akj3hea0a9s76df'
end

Example::Request.get('/products') # => {products: [product1, product2, etc]...}"
Example::Request.get('/users') # => "{users: [user1, user2, etc]...}"

Example::Config becomes the holding location for your configuration information. And by making the properties static, you don’t have to worry about passing around a reference to the instance.

Alternatives

If the yielding to a block is a little too clever for you, you can always instantiate a class and pass in the configuration as part of the constructor:

class Example
  class << self
    attr_accessor :api_url, :consumer_key, :consumer_secret

    def config(api_url:, consumer_key:, consumer_secret:)
      self.api_url = api_url
      self.consumer_key = consumer_key
      self.consumer_secret = consumer_secret
    end
  end
end

This can be instantiated like so:

Example.config(
  api_url: "http://api.com/v1/"
  consumer_key: "1895-1192-1234"
  consumer_secret: '76asdfh3heasd8f6akj3hea0a9s76df'
)

Example.api_url # => "http://api.com/v1/"

This feels less encapsulated to me. Instead of having an interface for our configuration settings, we are just settings properties directly onto a class.

What are your thoughts? What advantages does the block style configure offer over the alternative above?

For more information on the Ruby gem configuration pattern see this excellent blog post:

Computers, Open-source, Software, Thoughts

Git Pro Tips

What makes a good commit message? What makes for good commit contents? I present on how to reword commits to provide context, and structure commit contents to be the most meaningful for posterity with git rebase.

Computers, Open-source, Ruby, Software

Don’t Use “#” In the Paperclip Gem

I’ve learned a whole lot more about ImageMagick commands last week than I ever really wanted to know. The problem was that our uploaded images were having content cropped off the top, bottom, and sides.  Like many folks in the Rails world, we pass our attachments through Paperclip to handle all of the nitty gritty resizing operations.

I was interested in understanding how we could prevent our content from being cropped off when I came across an interesting idiom in the geometry settings:

has_croppable_attachment :image,
styles: {
:'630x315' => { geometry: "630x315#", format: :jpg },
}

Well take a look at that. There is a “#” symbol suffixed to my image geometry. I went to ImageMagick to lookup what this flag meant. Spoiler alert: It doesn’t exist there. After some digging around, I discovered that this idiom is provided by the Paperclip gem, and translates to the following convert command:

convert '/path/to/source.jpg' -resize "630x" -crop "630x315+0+0" '/path/to/output.jpg'

You can see the resize + crop combination of commands being built by Paperclip according to their documentation:

Paperclip also adds the “#” option (e.g. “50×50#”), which will resize the image to fit maximally inside the dimensions and then crop the rest off (weighted at the center).

Well, that is no good! If an image is not a 2:1 aspect ratio as per my dimensions 630×315 (or whatever aspect ratio you have) YOU WILL LOSE CONTENT! Time to rethink this…

The dimensions are 306x306
The dimensions are 306×306

 

The image was scaled until it was large enough to cover a 630x315 canvas, then the tops and bottoms cropped off
The image was resized until it was large enough to cover a 630×315 canvas, then the top and bottom was cropped off

Instead of resizing maximally (so that an image is a minimum of 630 width AND a minimum of 315 height) lets resize minimally (so that an image is a maximum of 630 width OR 315 height). The aspect ratio is preserved in both scenarios.

We want to resize while preserving aspect ratio , but we also need to make our canvas 630×315. The canvas dimensions are referred to as the extent command. When we do this, we will likely have space on the top and bottom, or sides we need to fill to have exactly these dimensions. What you fill this background with can be a color (in my case white). We also likely want to center the minimally resized image on this canvas. You can pass these convert options into Paperclip like so:

convert_options: {
:'630x315' => " -background white -gravity center -extent 630x315",
}

The resulting command will look something like this:

convert '/path/to/source.jpg' -resize "630x315" -background white -gravity center -extent 630x315 '/path/to/output.jpg'

Notice that our lossy crop flag has been replaced with a nicer extent flag. We can see the results this has on a similarly sized image:

We have an image smaller than the target 630x315
We have an image smaller than the target 630×315

 

We now have a 630x315 image with the sides filled in with a white background to preserve the dimensions
We now have a 630×315 image with the sides filled in with a white background to preserve the dimensions. Note the sides of the image are white.

As a final note, this works for both images larger and smaller than the target outcome dimensions. Looking at the ImageMagick documentation for flags can be helpful, but daunting as the real power lies in chaining multiple flags for a desired effect.

With a little effort I was able to get what is (in my opinion) a better image resize with just a few custom flags.

Computers, Linux, Open-source, Ruby, Software, Web

Migrating from Bamboo to Cedar

Heroku recently sent me a nice email:

You have one or more applications deployed to Heroku’s Bamboo stack. Bamboo went live almost four years ago, and the time has come for us to retire it.

You may have received a previous message about Bamboo deprecation. That message may have had an erroneous app list, but this email has the correct list of Bamboo apps that you own or are collaborated on. We’re sorry for any confusion.

On June 16th 2015, one year from now, we will shut down the Bamboo stack. The following app/apps you own currently use the Bamboo stack and must be migrated:

This is on the heels of an email about ending legacy routing support. It seems they have been quite busy over at Heroku, but I can’t complain too seriously about free app hosting.

Upgrading a legacy Rails application to the Cedar stack did require a few changes. I’ll document some stumbling blocks for posterity.

Foreman

The first big change described in the general purpose upgrading article from Heroku: https://devcenter.heroku.com/articles/cedar-migration was the use of foreman to manage your web services. I luckily have a simple app, and did not need to worry about resque, mailers, etc. This made my Procfile rather straightforward:

web: bundle exec unicorn -p $PORT -E $RACK_ENV -c config/unicorn.rb

Absent from the foreman crash course is any information about the corresponding .env file. I did need to add a few environmental variables to keep on working in development as usual:


RACK_ENV=development
PORT=3000

Once the Procfile was committed (end the .env) file added to .gitignore I then added Unicorn to the gemfile. I’m not sure if Unicorn is strictly necessary over webrick in development, however this is the example in the Cedar upgrade guide, so I wanted to run as close to production as practical to prevent any surprises.

After installing Unicorn, I then needed to touch config/unicorn.rb since it did not exist. Against, its in the example, but I’m not sure if its strictly necessary especially given that its just an empty file for me. To start your Rails application, you now issue foreman instead of of the older rails s

Devise incompatability

Not directly related to the Cedar changes, but a common gem, so worth mentioning. Devise 2.x has removed migration helpers. I stupidly didn’t lock my version of Devise in my Gemfile so I was confused why this was failing for me. I found the last 1.x version of Devise by running this command:

gem list devise --remote --all, and the specifying ‘1.5.3’ as the second argument in my Gemfile on the devise entry.

PostgreSQL

Heroku requires Postgres in production and I was previously using sqlite in my development environment. Again, to mirror production I wanted to use Postgres my development environment so that I could be as close to production setup as practical. I took a quick trip down memory lane to setup and configure PostgreSQL on my Linux development machine: https://mrfrosti.com/2011/11/postgresql-for-ruby-on-rails-on-ubuntu/ . An interesting observation is that in Ubuntu 14.04 LTS, PostgreSQL runs on port 5433, and not the default 5432. This can be verified with netstat -nlp | grep 5432. If no entries come back, PostgreSQL is running on a non-default port. Try to grep for 5433, or other numbers until you find the process.

Backing up your database

Before I made any server changes, I wanted to have an up to date copy of the production databse on my machine to prepare for the worst. This can be done by using the pgbackup commands:


heroku pgbackups:capture //create a backup
heroku pgbacksup:url //make a public URL for that backup

Then I clicked the public URL and downloaded the file – safe and sound to my computer. By the way, you can import this production database into your local instance of PostgreSQL using the following command:


pg_restore --verbose --clean --no-acl --no-owner -d development /path/to/database

Pre-provisioning your database

A quick note on database availability. I got into a chicken and the egg scenario where the app wouldn’t deploy without the database, and the database seemingly couldn’t be created without the app first being deployed. Heroku has an article on pre provisioning and I found it a necessary prerequisite to deploying to my newly created test Cedar stack: https://devcenter.heroku.com/articles/pre-provision-database

To pre-provision your database, run the following:

heroku addons:add heroku-postgresql

You can even import the database from production, etc as part of the heroku utility. I used the public database URL I created above to populate my new cedar stack database:


heroku pgbackups:restore

Migrating to Cedar

Once I had tested on a new app, and in a feature branch I had confidence everything was working as expected. I was ready to make my changes to the production app by migrating it to Cedar. To do this, the command is:


heroku stack:migrate cedar

I either have an old version of the Heroku gem, or this is a discrepency in the gem and the non-gem packaging, but the docs misidentify this command as: heroku set:stack cedar which was not a valid command for me. The migrate command above appears to be analagous.

Once I merged my cedar feature branch back into master I was ready to push to master. And FAIL. It turns out that I needed to precompile my assets, which had a dependency on the database existing. I tried to pre-provision as I had done on my cedar branch, however the results were the same after running this command.

A quick search yielded https://devcenter.heroku.com/articles/rails-asset-pipeline#troubleshooting the advise to add the following line in the config/application.rb file:

config.assets.initialize_on_precompile = false

Summary

I’ve learned quite a bit about Heroku in this upgrade experience. Their changes force me to use the latest software which is nice in some ways. When everything is running on my website, I don’t often worry about upgrading until I get an email like the one above.

The downside of course is that this upgrade process is a pain in the ass, and is error prone, and affects production websites that are running smoothly. If it isn’t broken, you don’t want to fix it. Except this time you have to in order to have it continue to function after June, 2015.

Best of luck to other people upgrading. Just be patient, and test everything in a new app if you have any doubts.

Computers, Open-source, Software, Thoughts

AngularJS File Uploads with HTML5 FileAPI

AngularJS has an interesting gap in functionality that can make working with file uploads difficult. You might expect attaching a file to an <input type=”file”> to trigger the ng-change event, however this does not happen. There are a number of Stackoverflow questions on the subject, with a popular answer being to use a native onclick attribute and call into Angular’s internals (e.g. onchange=”angular.element(this).scope().fileNameChaged()”)

This solution feels brittle, and relies on some unsupported Angular interactions from the template. To work around this issue, Github user danialfarid has provided the awesome angular-file-upload library to simplify this process by extending Angular’s attributes to include ng-file-select. This is a cleaner implementation. This library also includes an injectable $upload object and its documentation shows how this abstracts the file upload process in the controller. This abstraction (if used) sends the uploaded file to the server immediately, and without the contents of the rest of the form. I wanted to submit this file change with the traditional all-at-once approach that HTML forms take. This way, the user can abandon form changes by neglecting to press the submit button, and keep the original file attachment unmodified.

In order to achieve this, I’ve created a solution that uses the HTML5 FileAPI to base64 encode the contents of the file, and attach it to the form. Instead of reinventing the ng-file-select event, I opted to use the angular-file-upload library described above. However instead of using the injected $upload functionality referenced in its README, we will serialize the attachment with a base64 encoded string.

To begin, create an AngularJS module for your application, and include the angularFileUpload dependency:

window.MyApp = angular.module('MyApp',
  [
    'angularFileUpload'
  ]
)

Next, we will create our AngularJS template and include our HTML input tags:

<div ng-controller="MyCtrl">
  <form ng-submit="save()">
    <input type="file" ng-file-select="onFileSelect($files)" />
    <input type="submit" />
  </form>
</div>

Now we can create our AngularJS controller, and define the onFileSelect function referenced in the the ng-file-select attribute:

class exports.MyCtrl
  @$inject: ['$scope', '$http']

  constructor: (@scope, @$http) ->
    @scope.onFileSelect = @onFileSelect

  onFileSelect: ($files) =>
    angular.forEach $files, (file) =>
      reader = new FileReader()
      reader.onload = (e) =>
        @scope.attachment = e.target.result
      reader.readAsDataURL file

  save: =>
    @$http(
      method: 'POST',
      url: "/path/to/handler",
      data:
        $.param(
          attachment: @scope.attachment
        )
      headers:
        'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8'
        'Accept': 'text/javascript'
    )

Our controller is now in place. When the input’s attachment changes, onFileSelect is called which iterates through the collection of files (if multiple) and creates a FileReader instance for each one. The reader then has functionality attached to its onload event in the way of assigning the result to an attribute in our @scope object. The call to readAsDataURL starts reading the file and creates a data: URL representing the file’s data as a base64 encoded string.

Once the form is submitted, the save function is called from the value of ng-submit on our form tag. This performs a standard AngularJS XHR action, and includes the attachment assignment in the params. I have adjusted the Content-Type in the headers to communicate to the server that the content contains URL encoded data. If we had other form fields, we could serialize and append them to the params collection to send to the server alongside the attachment in the same place in the code.

Image Attachments

For added feedback to the user on image attachments, the img tag’s src attribute can accept a base64 encoded string as a value. Since we have this value from our FileReader object, we can update the view instantly with the file without doing any server side processing. To achieve this, we can add an image tag to our HTML file:

<div ng-controller="MyCtrl">
  <form ng-submit="save()">
    <img ng-src="{{attachment}}" />
    <input type="file" ng-file-select="onFileSelect($files)" />
    <input type="submit" />
  </form>
</div>

Next, we can make a few modifications to our onFileSelect function:

onFileSelect: ($files) =>
  angular.forEach $files, (file) =>
    if file.type in ["image/jpeg", "image/png", "image/gif"]
      reader = new FileReader()
      reader.onload = (e) =>
        @scope.$apply () =>
          @scope.attachment = e.target.result
      @scope.reader.readAsDataURL file

AngularJS two way data binding takes care of the messy details for us. The template is bound to @scope.attachment_url. We do some safety checks that the filetype is an image, and then we assign the attachment_url key to the base64 encoded image. A call to scope.apply() will repaint the screen, and the user will see the image they have attached displayed.

Thanks to Nick Karpenske, and Robert Lasch for help with the implementation!

Open-source, Software, Vacations

Colocation Week in Dallas, TX

Working remote has some unconventional consequences. A big one being its very common to have never met in person the people you have worked side by side with. Its a strange sensation to recognize a voice you’ve heard almost daily but not be able to apply a face. It turns out that many people don’t look at all like what you have imagined them to be.

To have some fun and meet our coworkers, our company hosted its first “colocation week” in Dallas, TX. After a 24 hour meet and greet, we all sat down on our new teams and sat together at the same table for the first time. It was a blast! Aside from a few major issues (hotel Internet, and Dallas being a dry county come to mind!), we did a lot of good.

This evening marked the end of our a 30-hour hackathon to compete for the grand prize of taking home a Google Glass dev kit.

It was hard work, with us stopping the night before at 2am, only to get a few hours sleep and jump right back into coding. Our team pitched the idea of brick and mortar stores integrating iBeacons (Bluetooth LE) devices to target proximity based offers and suggestions. The resulting app had some fun mechanics, that I’d love to see make it into stores:

  • Personalization and announcement when you walk into the store with your device
  • Assistance in locating goods at an aisle level
  • Scan as you go shopping
  • Integration with online payments to avoid checkout lines

 

There were strong tie-ins for the business side as well, with foot traffic analysis and hyper relevant offer targeting. The screen show is the Android activity returned as a user enters the geofencing of the first shop’s aisle.

It was tough to jump back into Android development after a few years, but it came back. Java is the language that just won’t die.

We had an awesome team, and its wonderful to work for a company where everyone is as motivated as you to deliver something kickass. Hopefully we will get a chance to work with some of these technologies.

Open-source, Software, Web

Inserting Large Data Sets in MySQL

Its always interesting for me to work with large data sets. The solutions that work in lower orders of magnitude don’t always scale, and I am left with unusable solutions in production. Often the problems require clever refactoring that at a cursory glance appear identical, but somehow skirt around some expensive operation.

I had a requirement to tune a script that was responsible for inserting 300k records in a database table. The implemented solution of iterating through a collection and calling ‘INSERT’ was not scaling very well and the operation was taking long enough to time out in some runs. This gave me the opportunity to learn about a few things in MySQL including the profiler, and (spoiler!) the INSERT multiple records syntax.

I needed some real numbers to compare the changes I would be making. My plan was to change one thing at a time and run a benchmark to tell if the performance was 1) better 2) worse, or 3) not impacted. MySQL has an easy to use profiler for getting this information. Inside of the MySQL CLI, you can issue the command:

SET profiling=1;

Any subsequent queries you run will now be profiled. You can see a listing of queries you want to know more about by typing:

SHOW profiles;

This command will show an index of queries that have run, along with their associated Query_ID. To view more information about a particular query, issue the following command replacing x with the Query_ID:

SHOW profile FOR QUERY x

Here is an example output:

+------------------------------+----------+
| Status                       | Duration |
+------------------------------+----------+
| starting                     | 0.000094 |
| checking permissions         | 0.000003 |
| checking permissions         | 0.000002 |
| checking permissions         | 0.000001 |
| checking permissions         | 0.000003 |
| Opening tables               | 0.000021 |
| System lock                  | 0.000008 |
| init                         | 0.000039 |
| optimizing                   | 0.000012 |
| statistics                   | 0.000717 |
| preparing                    | 0.000014 |
| Creating tmp table           | 0.000023 |
| executing                    | 0.000002 |
| Copying to tmp table         | 0.016192 |
| converting HEAP to MyISAM    | 0.026860 |
| Copying to tmp table on disk | 2.491668 |
| Sorting result               | 0.269554 |
| Sending data                 | 0.001139 |
| end                          | 0.000003 |
| removing tmp table           | 0.066401 |
| end                          | 0.000009 |
| query end                    | 0.000005 |
| closing tables               | 0.000011 |
| freeing items                | 0.000040 |
| logging slow query           | 0.000002 |
| cleaning up                  | 0.000015 |
+------------------------------+----------+

In one iteration of my SQL query, I was spending an excessive amount of time “copying to tmp table”. After reading the article http://www.dbtuna.com/article/55/Copying_to_tmp_table_-_MySQL_thread_states, I was able to isolate the cause of this to an ORDER clause in my query that wasn’t strictly necessary. In this example, Not too much exciting is going on, which is a Good Thing.

For a comprehensive listing of thread states listed in the Status column, view: http://dev.mysql.com/doc/refman/5.0/en/general-thread-states.html

Now that I know my query is as optimized as it can be, its time to pull out the bigger guns. On to plan B – consolidating those INSERT statements!

An INSERT statement, though executing seemingly instantaneously under small loads is comprised of many smaller operations, each with its own cost. The expense of these operations is roughly the following: (http://dev.mysql.com/doc/refman/5.0/en/insert-speed.html)

  • Connecting: (3)
  • Sending query to server: (2)
  • Parsing query: (2)
  • Inserting row: (1 × size of row)
  • Inserting indexes: (1 × number of indexes)
  • Closing: (1)

As you can see, connecting to the server, sending the query, and parsing are relatively expensive operations. In the script I was modifying, 300k INSERT statements were generating 300k records. Fortunately for us, MySQL doesn’t force our records to be 1:1 with our INSERT statements thanks to allowing multiple insertions per INSERT. To use this feature instead of having 3 INSERT statements:

INSERT INTO foo (col1, col2) VALUES (1, 1);
INSERT INTO foo (col1, col2) VALUES (2, 2);
INSERT INTO foo (col1, col2) VALUES (3, 3);

We can instead coalesce them into a single INSERT statement

INSERT INTO foo (col1, col2) VALUES (1, 1), (2, 2), (3, 3);

How many values can we coalesce into the same INSERT statement? This isn’t driven by a max number of records, but rather a server system variable sysvar_bulk_insert_buffer_size: http://dev.mysql.com/doc/refman/5.0/en/server-system-variables.html#sysvar_bulk_insert_buffer_size This can be modified, but the default is 8388608 bytes. The exact number of records will vary depending on the number of columns, and the amount of data being inserted into those columns. I conservatively chose to coalesce 5k records at a time. I tried to bump this to 10k, but I encountered an exception when I exceeded this server system variable maximum.

With my INSERTS coalesced, I was able to reduce my total number of INSERT statements to 60 (300k / 5k). This yielded massive performance boosts. I was able to take the query from over an hour to run to completing in just 2 minutes. Quite a nice trick, considering the data is unchanged.

Is there room for improvement? Absolutely. A statement executed 60 times may be worth preparing, or wrapping inside of a transactional block. My real world tests didn’t yield a significant enough performance boost to make these complexities worth implementing. This may not be true with data in higher orders of magnitude, or different schema layouts. MySQL also understands INDEX hints, which allow you to suggest INDEXES that may be missed by the query planner, or force the inclusion or exclusion of beneficial, or detrimental INDEXES despite what the query planner thinks! (http://dev.mysql.com/doc/refman/5.0/en/index-hints.html)

Speaking of INDEX, if any are using UNIQUE, BTREE type, these can be dropped while the mass INSERT is running, then added back later to side-step the 1n INDEX operational hit.

In the next order of magnitude, I will probably have to rethink my approach of using INSERT statements to load data. According to the MySQL documentation, LOAD DATA INFILE is “roughly 20 times faster” than a MySQL INSERT statement. My script would no longer generate statements, but rather output to a file in a comma delimited format. This could then be loaded assuming appropriate permissions are in place.

Happy profiling!