Computers, Ruby, Software

Lets Talk Strategy

Have you ever used a promotional code on an e-commerce website? How does the promotional code know what items are eligible? Is it a fixed amount off, or a percentage? Does it include shipping and handling? Was there a minimum purchase amount you had to meet to be eligible? These are all design considerations that needed to be taken into account, and the strategy pattern was a great way to represent this in my codebase.

My new requirement was to incorporate a promotional code that took 10% off the shopping cart total (excluding certain items of course – why can requirements never be simple?!) The system already had an existing promotional code system, but it was rudimentary, and only worked for fixed amounts off the total (e.g. $5 off your total).

The strategy pattern is defined as having these qualities:

  • A strategy can be thought of as an algorithm that is conditionally applied at runtime based on conditions.
  • The strategy encapsulates the execution of that algorithm.
  • Finally a strategy is an interface for the algorithm allowing them to be interchangeable.

Implementing the Strategy Pattern

Lets look at the existing promotional code logic before I implement the logic to take a percentage discount:

# app/models/shopping_cart.rb
...
has_many :items

# Returns the most expensive item in the shopping cart
def costliest_item
  items.map { |a,b| b.price <=> a.price}.first
end

def apply_promotional_code!(promotional_code)
  costliest_item.apply_promotional_code! promotional_code
  shopping_cart.total -= promotional_code.amount
end

For added context, a promotional code has a code we can lookup, and an amount property that tells us how much that promotional code is valued at.

We have a ShoppingCart which has many Items associated with it. When we call apply_promotional_code! and pass in a promotional code (e.g. “HOTSUMMER”), it returns the costliest item in the shopping cart (based on price) and associates the promotional code against it.

Why lookup the costliest item at all? Because we want to show an itemized receipt of each item total in the shopping cart. We then subtract this promotional code amount from the shopping basket total, which is the amount that will be charged to the customer’s credit card.

Adding a percentage based promotional code

If we were to change this to a percentage based discount (e.g. 10%) instead of a fixed amount (e.g. $5) we would need to change our calculation in the apply_promotional_code! method to something more complex:

def apply_promotional_code!(promotional_code)
  costliest_item.apply_promotional_code! promotional_code
  if promotional_code.amount.present?
    shopping_cart.total -= promotional_code.amount
  else
    shopping_cart.total -= shopping_cart.total * (promotional_code.percentage / 100.0)
  end
end

We have also changed the number of items this can be applied against (shown in our itemized receipt). If the promotional code is 10% off of the total, then it is 10% off of each item in the shopping cart. This means we will have to revisit our costliest_item method to make it return an array of zero or more items, instead of a single item:

# Returns an array of eligible items for the promotional code
def eligible_items
  if promotional_code.amount.present?
    Array(items.map { |a,b| b.price <=> a.price}.first)
  else
    items
  end
end

def apply_promotional_code!(promotional_code)
  eligible_items.each {|item| item.apply_promotional_code! promotional_code }
  if promotional_code.amount.present?
    shopping_cart.total -= promotional_code.amount
  else
    shopping_cart.total -= shopping_cart.total * (promotional_code.percentage / 100.0)
  end
end

With the addition of a percentage based promotional code alongside the fixed amount promotional code, we have introduced more complexity. apply_promotional_code! now has to be aware of how to apply both a fixed, and a percentage based amount. Further, the eligible_items (formerly costliest_item) now needs to be aware of the type of promotional code. Our promotional logic is spilling over into areas that need not be concerned with promotional logic.

Encapsulating via a strategy

Instead of concerning apply_promotional_code! with the knowledge of all possible types of promotional code types, lets decompose this into two separate strategies. The first will be our original fixed amount against the costliest item in the shopping cart. The second will be our new percentage based discount off the entire shopping cart:

# app/models/promotional_codes/stategies/fixed_amount_against_costliest_item.rb
class FixedAmountAgainstCostliestItem
  attr_accessible :shopping_cart, :promotional_code

  def initialize(items, promotional_code)
    self.shopping_cart = shopping_cart
    self.promotional_code = promotional_code
  end

  def discount
    promotional_code.amount
  end

  def eligible_items
    shopping_cart.eligible_items
  end
end

class PercentageAgainstMultipleItems
  attr_accessible :shopping_cart, :promotional_code

  def initialize(items, promotional_code)
    self.shopping_cart = shopping_cart
    self.promotional_code = promotional_code
  end

  def discount
    shopping_cart.total * (promotional_code.percentage / 100.0)
  end

  def eligible_items
    shopping_cart.eligible_items
  end
end

A few things to note: Firstly there are now two classes, and each is responsible for a single type of promotional code application. Secondly, there is similarity in these two classes. This is a good sign that we want to refactor and make a common super class that these inherit from. Similarity means there is a good chance we can accomplish interchangeability. Lastly, we have proxied the logic of what items are eligible through our two new strategies. As a next pass we will move the eligible_items logic to reside with the strategy. This keeps our promotional code logic in one place:

class BasePromotionalStrategy
  attr_accessible :shopping_cart, :promotional_code

  def initialize(items, promotional_code)
    self.shopping_cart = shopping_cart
    self.promotional_code = promotional_code
  end

  def discount
    raise NotImplementedError
  end

  def eligible_items
    raise NotImplementedError
  end
end

class FixedAmountAgainstCostliestItem < BasePromotionalStrategy
  def discount
    promotional_code.amount
  end

  def eligible_items
    Array(items.map { |a,b| b.price <=> a.price}.select { |item| item.price >= promotional_code.amount }.first)
  end
end

class PercentageAgainstMultipleItems < BasePromotionalStrategy
  def discount
    promotional_code.amount
  end

  def eligible_items
    items
  end
end

Now that we have extracted the common elements into a BasePromotionalStrategyclass, both FixedAmountAgainstCostliestItem and PercentageAgainstMultipleItemscan extend it. With the boilerplate constructor logic out of the way, we can define our interface with method placeholders. As an example of what eligible_items might guard against, we will check that the item cost is greater than or equal to the promotional_code.amount to ensure our shopping cart has a positive balance. (So much for early retirement by buying thousands of paperclips one at a time with a $5 off coupon!). Percentage based promotional codes have a sliding value so there is no minimum spend we need to guard against. You can see that we just return all items as eligible_items.

Regardless of whether we instantiate FixedAmountAgainstCostliestItem or aPercentageAgainstMultipleItems, we can call discount to calculate the amount taken off the shopping cart, and we can call eligible_items to return a list of items that the itemized list should show the promotional discount applying to. In the case of theFixedAmountAgainstCostliestItem it will always be an empty array, or an array with one item, but we consistently return an array type regardless of the number of entities it contains. Consistent return types are an interface must!

Calling our new strategies

Now that we have implemented (and written tests – right?) for our strategy, we can revisit our caller to take advantage of this new logic:

# app/models/promotional_code.rb
class PromotionalCode
  ...
  def strategy
    if amount.present?
      PromotionalCode::Strategies::FixedAmountAgainstCostliestItem
    else
      PromotionalCode::Strategies::PercentageAgainstMultipleItems
    end
  end
end

# app/models/shopping_cart.rb
...
has_many :items

def apply_promotional_code!(promotional_code)
  strategy = promotional_code.strategy.new(self, promotional_code)
  strategy.eligible_items.each { |item| item.apply_promotional_code!(promotional_code) }
  shopping_cart.total -= strategy.discount
end

The PromotionalCode is responsible for knowing which strategy will be executed when it is applied to a shopping cart. This strategy method returns a class reference. This reference is then instantiated passing in the necessary arguments of the shopping_basket, and the promotional_code. This is everything the shopping cart needs to know about executing a promotional code strategy. The ShoppingCart does not know which strategy to instantiate, but simply which arguments to pass to the return result of strategy.

Further, ShoppingBasket now knows nothing about what items are eligible for a PromotionalCode. That logic again resides within the strategy definition. Our promotional code domain is nicely encapsulated. This adheres to the open/closed principle of good object oriented design. Adding a new type of promotional code (e.g. buy one get one free, or a promotional code with a minimum spend) would be as easy as adding a new strategy, and telling the PromotionalCode#strategy when to apply it.

NullStrategy

You might have noticed that we use an if...else as opposed to an if...elsif (for simplicity sake). This means that the percentage based promotional code strategy is returned more often than it should be. Ideally, we’d want to return an inert strategy that confirms to our interface but doesn’t actually give any discounts to our shopping cart. This is where a Null Object pattern could be used in conjunction with a strategy:


# app/models/promotional_codes/stategies/null_strategy.rb
class NullStrategy < BasePromotionalStrategy
  def discount
    0
  end

  def eligible_items
    []
  end
end

# app/models/promotional_code.rb
class PromotionalCode
  ...
  def strategy
    if amount.present?
      PromotionalCode::Strategies::FixedAmountAgainstCostliestItem
    elsif percentage.present?
      PromotionalCode::Strategies::PercentageAgainstMultipleItems
    else
      PromotionalCode::Strategies::NullStrategy
    end
  end
end

One additional benefit of this strategy pattern is the ease of testing. PromotionalCode#strategy could easily be stubbed to return this NullStrategy and still remain consistent with the interface we’ve established in our other strategies. Now in testing, we can simply stub the NullStrategy and continue to chain our interface method calls:

 PromotionalCode.any_instance.stubs(strategy: PromotionalCode::Strategies::NullStrategy) 
 $> PromotionalCode.new.strategy # => PromotionalCode::Strategies::NullStrategy

 $> PromotionalCode.new.discount # => 0
 $> PromotionalCode.new.eligible_items # => []
 ...
 $> shopping_basket.total -= promotional_code.discount

Conclusion

Strategies can be a great way to split different algorithms apart. An indicator that you could benefit from a formalized strategy pattern is when you find conditionals in your methods that change the way something is calculated based on context. In our case, we had a conditional (if...else) that checked whether amount was set on a PromotionalCode. Another indicator was the need to encapsulate the concern of a PromotionalCode. Its neither a concern of ShoppingBasket nor Item. It needed its own home. By implementing a strategy class, we give this logic an obvious place to live.

A strategy can be a great tool when you have a requirement that lives alongside an existing requirement that changes how a calculation is done at runtime.

Additional Information on strategies can be found at https://sourcemaking.com/design_patterns/strategy

Computers, Open-source, Software, Thoughts

Git Pro Tips

What makes a good commit message? What makes for good commit contents? I present on how to reword commits to provide context, and structure commit contents to be the most meaningful for posterity with git rebase.

Computers, Open-source, Ruby, Software

Don’t Use “#” In the Paperclip Gem

I’ve learned a whole lot more about ImageMagick commands last week than I ever really wanted to know. The problem was that our uploaded images were having content cropped off the top, bottom, and sides.  Like many folks in the Rails world, we pass our attachments through Paperclip to handle all of the nitty gritty resizing operations.

I was interested in understanding how we could prevent our content from being cropped off when I came across an interesting idiom in the geometry settings:

has_croppable_attachment :image,
styles: {
:'630x315' => { geometry: "630x315#", format: :jpg },
}

Well take a look at that. There is a “#” symbol suffixed to my image geometry. I went to ImageMagick to lookup what this flag meant. Spoiler alert: It doesn’t exist there. After some digging around, I discovered that this idiom is provided by the Paperclip gem, and translates to the following convert command:

convert '/path/to/source.jpg' -resize "630x" -crop "630x315+0+0" '/path/to/output.jpg'

You can see the resize + crop combination of commands being built by Paperclip according to their documentation:

Paperclip also adds the “#” option (e.g. “50×50#”), which will resize the image to fit maximally inside the dimensions and then crop the rest off (weighted at the center).

Well, that is no good! If an image is not a 2:1 aspect ratio as per my dimensions 630×315 (or whatever aspect ratio you have) YOU WILL LOSE CONTENT! Time to rethink this…

The dimensions are 306x306
The dimensions are 306×306

 

The image was scaled until it was large enough to cover a 630x315 canvas, then the tops and bottoms cropped off
The image was resized until it was large enough to cover a 630×315 canvas, then the top and bottom was cropped off

Instead of resizing maximally (so that an image is a minimum of 630 width AND a minimum of 315 height) lets resize minimally (so that an image is a maximum of 630 width OR 315 height). The aspect ratio is preserved in both scenarios.

We want to resize while preserving aspect ratio , but we also need to make our canvas 630×315. The canvas dimensions are referred to as the extent command. When we do this, we will likely have space on the top and bottom, or sides we need to fill to have exactly these dimensions. What you fill this background with can be a color (in my case white). We also likely want to center the minimally resized image on this canvas. You can pass these convert options into Paperclip like so:

convert_options: {
:'630x315' => " -background white -gravity center -extent 630x315",
}

The resulting command will look something like this:

convert '/path/to/source.jpg' -resize "630x315" -background white -gravity center -extent 630x315 '/path/to/output.jpg'

Notice that our lossy crop flag has been replaced with a nicer extent flag. We can see the results this has on a similarly sized image:

We have an image smaller than the target 630x315
We have an image smaller than the target 630×315

 

We now have a 630x315 image with the sides filled in with a white background to preserve the dimensions
We now have a 630×315 image with the sides filled in with a white background to preserve the dimensions. Note the sides of the image are white.

As a final note, this works for both images larger and smaller than the target outcome dimensions. Looking at the ImageMagick documentation for flags can be helpful, but daunting as the real power lies in chaining multiple flags for a desired effect.

With a little effort I was able to get what is (in my opinion) a better image resize with just a few custom flags.

Computers, Linux, Open-source, Ruby, Software, Web

Migrating from Bamboo to Cedar

Heroku recently sent me a nice email:

You have one or more applications deployed to Heroku’s Bamboo stack. Bamboo went live almost four years ago, and the time has come for us to retire it.

You may have received a previous message about Bamboo deprecation. That message may have had an erroneous app list, but this email has the correct list of Bamboo apps that you own or are collaborated on. We’re sorry for any confusion.

On June 16th 2015, one year from now, we will shut down the Bamboo stack. The following app/apps you own currently use the Bamboo stack and must be migrated:

This is on the heels of an email about ending legacy routing support. It seems they have been quite busy over at Heroku, but I can’t complain too seriously about free app hosting.

Upgrading a legacy Rails application to the Cedar stack did require a few changes. I’ll document some stumbling blocks for posterity.

Foreman

The first big change described in the general purpose upgrading article from Heroku: https://devcenter.heroku.com/articles/cedar-migration was the use of foreman to manage your web services. I luckily have a simple app, and did not need to worry about resque, mailers, etc. This made my Procfile rather straightforward:

web: bundle exec unicorn -p $PORT -E $RACK_ENV -c config/unicorn.rb

Absent from the foreman crash course is any information about the corresponding .env file. I did need to add a few environmental variables to keep on working in development as usual:


RACK_ENV=development
PORT=3000

Once the Procfile was committed (end the .env) file added to .gitignore I then added Unicorn to the gemfile. I’m not sure if Unicorn is strictly necessary over webrick in development, however this is the example in the Cedar upgrade guide, so I wanted to run as close to production as practical to prevent any surprises.

After installing Unicorn, I then needed to touch config/unicorn.rb since it did not exist. Against, its in the example, but I’m not sure if its strictly necessary especially given that its just an empty file for me. To start your Rails application, you now issue foreman instead of of the older rails s

Devise incompatability

Not directly related to the Cedar changes, but a common gem, so worth mentioning. Devise 2.x has removed migration helpers. I stupidly didn’t lock my version of Devise in my Gemfile so I was confused why this was failing for me. I found the last 1.x version of Devise by running this command:

gem list devise --remote --all, and the specifying ‘1.5.3’ as the second argument in my Gemfile on the devise entry.

PostgreSQL

Heroku requires Postgres in production and I was previously using sqlite in my development environment. Again, to mirror production I wanted to use Postgres my development environment so that I could be as close to production setup as practical. I took a quick trip down memory lane to setup and configure PostgreSQL on my Linux development machine: https://mrfrosti.com/2011/11/postgresql-for-ruby-on-rails-on-ubuntu/ . An interesting observation is that in Ubuntu 14.04 LTS, PostgreSQL runs on port 5433, and not the default 5432. This can be verified with netstat -nlp | grep 5432. If no entries come back, PostgreSQL is running on a non-default port. Try to grep for 5433, or other numbers until you find the process.

Backing up your database

Before I made any server changes, I wanted to have an up to date copy of the production databse on my machine to prepare for the worst. This can be done by using the pgbackup commands:


heroku pgbackups:capture //create a backup
heroku pgbacksup:url //make a public URL for that backup

Then I clicked the public URL and downloaded the file – safe and sound to my computer. By the way, you can import this production database into your local instance of PostgreSQL using the following command:


pg_restore --verbose --clean --no-acl --no-owner -d development /path/to/database

Pre-provisioning your database

A quick note on database availability. I got into a chicken and the egg scenario where the app wouldn’t deploy without the database, and the database seemingly couldn’t be created without the app first being deployed. Heroku has an article on pre provisioning and I found it a necessary prerequisite to deploying to my newly created test Cedar stack: https://devcenter.heroku.com/articles/pre-provision-database

To pre-provision your database, run the following:

heroku addons:add heroku-postgresql

You can even import the database from production, etc as part of the heroku utility. I used the public database URL I created above to populate my new cedar stack database:


heroku pgbackups:restore

Migrating to Cedar

Once I had tested on a new app, and in a feature branch I had confidence everything was working as expected. I was ready to make my changes to the production app by migrating it to Cedar. To do this, the command is:


heroku stack:migrate cedar

I either have an old version of the Heroku gem, or this is a discrepency in the gem and the non-gem packaging, but the docs misidentify this command as: heroku set:stack cedar which was not a valid command for me. The migrate command above appears to be analagous.

Once I merged my cedar feature branch back into master I was ready to push to master. And FAIL. It turns out that I needed to precompile my assets, which had a dependency on the database existing. I tried to pre-provision as I had done on my cedar branch, however the results were the same after running this command.

A quick search yielded https://devcenter.heroku.com/articles/rails-asset-pipeline#troubleshooting the advise to add the following line in the config/application.rb file:

config.assets.initialize_on_precompile = false

Summary

I’ve learned quite a bit about Heroku in this upgrade experience. Their changes force me to use the latest software which is nice in some ways. When everything is running on my website, I don’t often worry about upgrading until I get an email like the one above.

The downside of course is that this upgrade process is a pain in the ass, and is error prone, and affects production websites that are running smoothly. If it isn’t broken, you don’t want to fix it. Except this time you have to in order to have it continue to function after June, 2015.

Best of luck to other people upgrading. Just be patient, and test everything in a new app if you have any doubts.

Computers, Open-source, Software, Thoughts

AngularJS File Uploads with HTML5 FileAPI

AngularJS has an interesting gap in functionality that can make working with file uploads difficult. You might expect attaching a file to an <input type=”file”> to trigger the ng-change event, however this does not happen. There are a number of Stackoverflow questions on the subject, with a popular answer being to use a native onclick attribute and call into Angular’s internals (e.g. onchange=”angular.element(this).scope().fileNameChaged()”)

This solution feels brittle, and relies on some unsupported Angular interactions from the template. To work around this issue, Github user danialfarid has provided the awesome angular-file-upload library to simplify this process by extending Angular’s attributes to include ng-file-select. This is a cleaner implementation. This library also includes an injectable $upload object and its documentation shows how this abstracts the file upload process in the controller. This abstraction (if used) sends the uploaded file to the server immediately, and without the contents of the rest of the form. I wanted to submit this file change with the traditional all-at-once approach that HTML forms take. This way, the user can abandon form changes by neglecting to press the submit button, and keep the original file attachment unmodified.

In order to achieve this, I’ve created a solution that uses the HTML5 FileAPI to base64 encode the contents of the file, and attach it to the form. Instead of reinventing the ng-file-select event, I opted to use the angular-file-upload library described above. However instead of using the injected $upload functionality referenced in its README, we will serialize the attachment with a base64 encoded string.

To begin, create an AngularJS module for your application, and include the angularFileUpload dependency:

window.MyApp = angular.module('MyApp',
  [
    'angularFileUpload'
  ]
)

Next, we will create our AngularJS template and include our HTML input tags:

<div ng-controller="MyCtrl">
  <form ng-submit="save()">
    <input type="file" ng-file-select="onFileSelect($files)" />
    <input type="submit" />
  </form>
</div>

Now we can create our AngularJS controller, and define the onFileSelect function referenced in the the ng-file-select attribute:

class exports.MyCtrl
  @$inject: ['$scope', '$http']

  constructor: (@scope, @$http) ->
    @scope.onFileSelect = @onFileSelect

  onFileSelect: ($files) =>
    angular.forEach $files, (file) =>
      reader = new FileReader()
      reader.onload = (e) =>
        @scope.attachment = e.target.result
      reader.readAsDataURL file

  save: =>
    @$http(
      method: 'POST',
      url: "/path/to/handler",
      data:
        $.param(
          attachment: @scope.attachment
        )
      headers:
        'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8'
        'Accept': 'text/javascript'
    )

Our controller is now in place. When the input’s attachment changes, onFileSelect is called which iterates through the collection of files (if multiple) and creates a FileReader instance for each one. The reader then has functionality attached to its onload event in the way of assigning the result to an attribute in our @scope object. The call to readAsDataURL starts reading the file and creates a data: URL representing the file’s data as a base64 encoded string.

Once the form is submitted, the save function is called from the value of ng-submit on our form tag. This performs a standard AngularJS XHR action, and includes the attachment assignment in the params. I have adjusted the Content-Type in the headers to communicate to the server that the content contains URL encoded data. If we had other form fields, we could serialize and append them to the params collection to send to the server alongside the attachment in the same place in the code.

Image Attachments

For added feedback to the user on image attachments, the img tag’s src attribute can accept a base64 encoded string as a value. Since we have this value from our FileReader object, we can update the view instantly with the file without doing any server side processing. To achieve this, we can add an image tag to our HTML file:

<div ng-controller="MyCtrl">
  <form ng-submit="save()">
    <img ng-src="{{attachment}}" />
    <input type="file" ng-file-select="onFileSelect($files)" />
    <input type="submit" />
  </form>
</div>

Next, we can make a few modifications to our onFileSelect function:

onFileSelect: ($files) =>
  angular.forEach $files, (file) =>
    if file.type in ["image/jpeg", "image/png", "image/gif"]
      reader = new FileReader()
      reader.onload = (e) =>
        @scope.$apply () =>
          @scope.attachment = e.target.result
      @scope.reader.readAsDataURL file

AngularJS two way data binding takes care of the messy details for us. The template is bound to @scope.attachment_url. We do some safety checks that the filetype is an image, and then we assign the attachment_url key to the base64 encoded image. A call to scope.apply() will repaint the screen, and the user will see the image they have attached displayed.

Thanks to Nick Karpenske, and Robert Lasch for help with the implementation!

Computers, Personal, Software, Thoughts

“The Deadline”: Part 3 on a Series on Leadership in Technology

Quite a bit has changed since my last post on leadership. I’ve been promoted to the capacity of full time team lead after my mentor, and manager left to work for another company. I was learning a great deal everyday, and being turned loose to manage the team on my own has been a baptism by fire. I’ve applied what I’ve learned already from my previous mentorships, and I’ve learned a great deal more over the last few weeks.

High bandwidth communication is key

I would have never gotten through the last few weeks on emails, and instant messenger alone. Its very important to have communications by voice to keep everyone updated and on the same page while respecting each other’s time. Several times in the past week I’ve watched an email thread bounce back and forth between many participants, only to get boiled down into a concise message at a standup meeting. A five minute verbal QA would take hours via email. Especially across different timezones.

Delegate, delegate, delegate

When moving into a managerial capacity, the content of your day to day work shifts dramatically. The first few days without being in the code left me anxious. But you know what? We have good guys that understand even more than I do, and if you give them a problem, they come back with a solution. Its hard to trust that it will happen, but it did. Over and over again. The key is not in just throwing out a problem and coming back the next day to find a solution. You should be available for questions, and clarification continuously throughout the development of the solution. I often found that checking in a few times a day on each developer was sufficient to answer any questions, understand the progress, and get a rough idea of when something would be delivered. A few times they would casually mention that they are stuck trying to figure out ‘x’. Turns out I know about ‘x’ and after a brief chat with some pointing to places in the codebase they got it squared away.

Be crystal clear about your requirements

We had a new screen we wanted to develop. We had a mockup done by our front end guys. Our BA loved it. Everyone was on the same page. Until we began to implement it and all of these tiny edge cases popped up. I’d assume a course of action was the correct one to take, only to discover that our stories were getting rejected by QA. Turns out we don’t share the same brain, and what I call working, someone else will call a defect. That creates a lot of extra work to resolve what is now a defect. Closer to the deadline of the deliverable, when I switched over to voice communication instead of IM, I was able to lock in requirements quickly and get instant feedback on how we should handle certain edge cases. Don’t be afraid to bother the higher ups with technical questions, because it is up to them to provide the definitive answer. You aren’t doing yourself any favors by shielding them from it and substituting their would be solution with something you invented.

Front load work you are less certain of

If you aren’t clear of how something is going to be executed, that is a problem. Its your job as a manager to find out! You don’t need to know every technical detail, but you do need a clear understanding of who the key players are, what the testing process is like, and how you will get this feature through QA and accepted by the business. We had some work on a feature that impacted a few systems downstream. Did we put this off until we took care of the easy stuff? Nope – we started with the pain in the butt feature. And good thing too because we were working on it up until the last minute. We didn’t realize how much latency there would be in getting just a ‘its working’, or ‘its not working’ from all the parties downstream from us. If you aren’t sure, begin immediately to error on the side of caution. The easy work you have a clear understanding of, and it should take a backseat to anything that isn’t as clear.

Pad your estimates

This entire iteration I was too optimistic with what we could turn out in a day. Estimating is hard, and I’m convinced we all suck at it. I’d like to believe my guys can come up to speed on a new technology stack and crank out some impressive code but there is a learning curve. Sometimes the only way forward is a painfully slow dive into the documentation. There are test suites that take forever to run. There are rejections when it comes time to merge, meaning the suite has to be run again. There are the seemingly simple issues that once you dive into become much more complex than anticipated. Rarely did we close a feature or bug in the amount of time I assumed it would take. Because of this I’m going to start padding my estimates to compensate for all of these little details that add up to something bigger.

And finally – know when to ask for help!

There was a certain point in the sprint when I knew we weren’t in good shape to hit our deadline. I mulled it over in my head and stressed about it, and let my pride get in the way a bit and was convinced that asking for help wouldn’t be necessary. It was a mentality of “if I can just get through these few stories here we will be back on top”. But of course, I’m sucked into a meeting, or new bugs pop up, or the feature I’m working on before it runs way over on time. Don’t hesitate to ask for help when you first identify the need for it. Forget how bad you think it will look. Forget the anxiety of revealing rough new code to outsiders. Its worse if you don’t ask and end up missing your target. I was surprised at how strong the support was once we asked for help. I had 10 extra people jump in and had them all working in parallel on the feature. My entire day was pointing people to new work, and answering questions, and following up on when it would be merged. And the astounding thing was by NOT touching the code, we were delivering more than if I had put on headphones and jumped in myself. And for the record, there was a minimum of sniping at our technical solutions from outsiders. It felt good to know we were a team and it didn’t need to be a perfect solution before you left outside people in for help.

Open-source, Software, Vacations

Colocation Week in Dallas, TX

Working remote has some unconventional consequences. A big one being its very common to have never met in person the people you have worked side by side with. Its a strange sensation to recognize a voice you’ve heard almost daily but not be able to apply a face. It turns out that many people don’t look at all like what you have imagined them to be.

To have some fun and meet our coworkers, our company hosted its first “colocation week” in Dallas, TX. After a 24 hour meet and greet, we all sat down on our new teams and sat together at the same table for the first time. It was a blast! Aside from a few major issues (hotel Internet, and Dallas being a dry county come to mind!), we did a lot of good.

This evening marked the end of our a 30-hour hackathon to compete for the grand prize of taking home a Google Glass dev kit.

It was hard work, with us stopping the night before at 2am, only to get a few hours sleep and jump right back into coding. Our team pitched the idea of brick and mortar stores integrating iBeacons (Bluetooth LE) devices to target proximity based offers and suggestions. The resulting app had some fun mechanics, that I’d love to see make it into stores:

  • Personalization and announcement when you walk into the store with your device
  • Assistance in locating goods at an aisle level
  • Scan as you go shopping
  • Integration with online payments to avoid checkout lines

 

There were strong tie-ins for the business side as well, with foot traffic analysis and hyper relevant offer targeting. The screen show is the Android activity returned as a user enters the geofencing of the first shop’s aisle.

It was tough to jump back into Android development after a few years, but it came back. Java is the language that just won’t die.

We had an awesome team, and its wonderful to work for a company where everyone is as motivated as you to deliver something kickass. Hopefully we will get a chance to work with some of these technologies.

Open-source, Software, Web

Inserting Large Data Sets in MySQL

Its always interesting for me to work with large data sets. The solutions that work in lower orders of magnitude don’t always scale, and I am left with unusable solutions in production. Often the problems require clever refactoring that at a cursory glance appear identical, but somehow skirt around some expensive operation.

I had a requirement to tune a script that was responsible for inserting 300k records in a database table. The implemented solution of iterating through a collection and calling ‘INSERT’ was not scaling very well and the operation was taking long enough to time out in some runs. This gave me the opportunity to learn about a few things in MySQL including the profiler, and (spoiler!) the INSERT multiple records syntax.

I needed some real numbers to compare the changes I would be making. My plan was to change one thing at a time and run a benchmark to tell if the performance was 1) better 2) worse, or 3) not impacted. MySQL has an easy to use profiler for getting this information. Inside of the MySQL CLI, you can issue the command:

SET profiling=1;

Any subsequent queries you run will now be profiled. You can see a listing of queries you want to know more about by typing:

SHOW profiles;

This command will show an index of queries that have run, along with their associated Query_ID. To view more information about a particular query, issue the following command replacing x with the Query_ID:

SHOW profile FOR QUERY x

Here is an example output:

+------------------------------+----------+
| Status                       | Duration |
+------------------------------+----------+
| starting                     | 0.000094 |
| checking permissions         | 0.000003 |
| checking permissions         | 0.000002 |
| checking permissions         | 0.000001 |
| checking permissions         | 0.000003 |
| Opening tables               | 0.000021 |
| System lock                  | 0.000008 |
| init                         | 0.000039 |
| optimizing                   | 0.000012 |
| statistics                   | 0.000717 |
| preparing                    | 0.000014 |
| Creating tmp table           | 0.000023 |
| executing                    | 0.000002 |
| Copying to tmp table         | 0.016192 |
| converting HEAP to MyISAM    | 0.026860 |
| Copying to tmp table on disk | 2.491668 |
| Sorting result               | 0.269554 |
| Sending data                 | 0.001139 |
| end                          | 0.000003 |
| removing tmp table           | 0.066401 |
| end                          | 0.000009 |
| query end                    | 0.000005 |
| closing tables               | 0.000011 |
| freeing items                | 0.000040 |
| logging slow query           | 0.000002 |
| cleaning up                  | 0.000015 |
+------------------------------+----------+

In one iteration of my SQL query, I was spending an excessive amount of time “copying to tmp table”. After reading the article http://www.dbtuna.com/article/55/Copying_to_tmp_table_-_MySQL_thread_states, I was able to isolate the cause of this to an ORDER clause in my query that wasn’t strictly necessary. In this example, Not too much exciting is going on, which is a Good Thing.

For a comprehensive listing of thread states listed in the Status column, view: http://dev.mysql.com/doc/refman/5.0/en/general-thread-states.html

Now that I know my query is as optimized as it can be, its time to pull out the bigger guns. On to plan B – consolidating those INSERT statements!

An INSERT statement, though executing seemingly instantaneously under small loads is comprised of many smaller operations, each with its own cost. The expense of these operations is roughly the following: (http://dev.mysql.com/doc/refman/5.0/en/insert-speed.html)

  • Connecting: (3)
  • Sending query to server: (2)
  • Parsing query: (2)
  • Inserting row: (1 × size of row)
  • Inserting indexes: (1 × number of indexes)
  • Closing: (1)

As you can see, connecting to the server, sending the query, and parsing are relatively expensive operations. In the script I was modifying, 300k INSERT statements were generating 300k records. Fortunately for us, MySQL doesn’t force our records to be 1:1 with our INSERT statements thanks to allowing multiple insertions per INSERT. To use this feature instead of having 3 INSERT statements:

INSERT INTO foo (col1, col2) VALUES (1, 1);
INSERT INTO foo (col1, col2) VALUES (2, 2);
INSERT INTO foo (col1, col2) VALUES (3, 3);

We can instead coalesce them into a single INSERT statement

INSERT INTO foo (col1, col2) VALUES (1, 1), (2, 2), (3, 3);

How many values can we coalesce into the same INSERT statement? This isn’t driven by a max number of records, but rather a server system variable sysvar_bulk_insert_buffer_size: http://dev.mysql.com/doc/refman/5.0/en/server-system-variables.html#sysvar_bulk_insert_buffer_size This can be modified, but the default is 8388608 bytes. The exact number of records will vary depending on the number of columns, and the amount of data being inserted into those columns. I conservatively chose to coalesce 5k records at a time. I tried to bump this to 10k, but I encountered an exception when I exceeded this server system variable maximum.

With my INSERTS coalesced, I was able to reduce my total number of INSERT statements to 60 (300k / 5k). This yielded massive performance boosts. I was able to take the query from over an hour to run to completing in just 2 minutes. Quite a nice trick, considering the data is unchanged.

Is there room for improvement? Absolutely. A statement executed 60 times may be worth preparing, or wrapping inside of a transactional block. My real world tests didn’t yield a significant enough performance boost to make these complexities worth implementing. This may not be true with data in higher orders of magnitude, or different schema layouts. MySQL also understands INDEX hints, which allow you to suggest INDEXES that may be missed by the query planner, or force the inclusion or exclusion of beneficial, or detrimental INDEXES despite what the query planner thinks! (http://dev.mysql.com/doc/refman/5.0/en/index-hints.html)

Speaking of INDEX, if any are using UNIQUE, BTREE type, these can be dropped while the mass INSERT is running, then added back later to side-step the 1n INDEX operational hit.

In the next order of magnitude, I will probably have to rethink my approach of using INSERT statements to load data. According to the MySQL documentation, LOAD DATA INFILE is “roughly 20 times faster” than a MySQL INSERT statement. My script would no longer generate statements, but rather output to a file in a comma delimited format. This could then be loaded assuming appropriate permissions are in place.

Happy profiling!

Apple, Computers, Ruby, Software

Upgrading Ruby with Rbenv+Homebrew

Heroku has defaulted to Ruby 2.0 for all applications, so its probably time you updated that crufty old version you have been running. Unfortunately the process is less than straightforward, especially when using a version manager. Assuming you are running rbenv with a Homebrew version of ruby-build, this guide will get you running the latest version of Ruby:

To begin, check which versions of Ruby Rbenv knows about. Rbenv delegates this work to ruby-build:

rbenv install --list

Best case scenario you have a recent version of ruby-build and you see the version of Ruby you want in this listing. At the time of this writing, version 2.0.0-p247 is the most current. If your desired version is present, skip the following steps and just install the Ruby version via:

rbenv install 2.0.0-p247

If your version is not present in the list, you will need to upgrade ruby-build so rbenv knows about the more recent versions of Ruby. Assuming you installed ruby-build via Homebrew, you can update it by issuing:

sudo brew update

Issuing this command complained about having untracked files within the Homebrew directory (which is actually just a git clone of the homebrew project). This may not be the correct way to fix this problem, but I issued the following command to stash these untracked files, and uncommited changes so they don’t interfere with the upgrade process:

cd /usr/local && git add . && git stash

This should now be a clean directory, and you can issue the brew update command again.

Now that brew is updated, you should have the latest “formula” for ruby-build. You can then issue the command to update ruby-build itself:

brew upgrade ruby-build

Once this completes, we can list versions of Ruby via rbenv again to ensure our desired Ruby version is now in the list. Once you see this, you can issue the following command to install a known version of Ruby:

rbenv install 2.0.0-p247

To use your shiny new version of Ruby, you can set this to be the default version:

rbenv global 2.0.0-p247

You can also set this per project, or by setting an environmental variable to override, so don’t worry if not all your projects are Ruby 2.0 ready. You can easily switch between versions – and that’s the point of version management right?

You can confirm you are running the latest Ruby version by issuing:

ruby -v

Note that you will need to re-bundle any gems from your Gemfile against the new Ruby version, as well as rehash any rbenv shims for these gem executables:

gem install bundler && bundle install && rbenv rehash

For more information, check out the rbenv, and ruby-build documentation. To discover the latest stable version of Ruby you can peek at the official Ruby download page to find out the latest version and patch number. Finally, check out the Homebrew docs if you are still stuck.