Computers, Open-source, Software, Thoughts

AngularJS File Uploads with HTML5 FileAPI

AngularJS has an interesting gap in functionality that can make working with file uploads difficult. You might expect attaching a file to an <input type=”file”> to trigger the ng-change event, however this does not happen. There are a number of Stackoverflow questions on the subject, with a popular answer being to use a native onclick attribute and call into Angular’s internals (e.g. onchange=”angular.element(this).scope().fileNameChaged()”)

This solution feels brittle, and relies on some unsupported Angular interactions from the template. To work around this issue, Github user danialfarid has provided the awesome angular-file-upload library to simplify this process by extending Angular’s attributes to include ng-file-select. This is a cleaner implementation. This library also includes an injectable $upload object and its documentation shows how this abstracts the file upload process in the controller. This abstraction (if used) sends the uploaded file to the server immediately, and without the contents of the rest of the form. I wanted to submit this file change with the traditional all-at-once approach that HTML forms take. This way, the user can abandon form changes by neglecting to press the submit button, and keep the original file attachment unmodified.

In order to achieve this, I’ve created a solution that uses the HTML5 FileAPI to base64 encode the contents of the file, and attach it to the form. Instead of reinventing the ng-file-select event, I opted to use the angular-file-upload library described above. However instead of using the injected $upload functionality referenced in its README, we will serialize the attachment with a base64 encoded string.

To begin, create an AngularJS module for your application, and include the angularFileUpload dependency:

window.MyApp = angular.module('MyApp',
  [
    'angularFileUpload'
  ]
)

Next, we will create our AngularJS template and include our HTML input tags:

<div ng-controller="MyCtrl">
  <form ng-submit="save()">
    <input type="file" ng-file-select="onFileSelect($files)" />
    <input type="submit" />
  </form>
</div>

Now we can create our AngularJS controller, and define the onFileSelect function referenced in the the ng-file-select attribute:

class exports.MyCtrl
  @$inject: ['$scope', '$http']

  constructor: (@scope, @$http) ->
    @scope.onFileSelect = @onFileSelect

  onFileSelect: ($files) =>
    angular.forEach $files, (file) =>
      reader = new FileReader()
      reader.onload = (e) =>
        @scope.attachment = e.target.result
      reader.readAsDataURL file

  save: =>
    @$http(
      method: 'POST',
      url: "/path/to/handler",
      data:
        $.param(
          attachment: @scope.attachment
        )
      headers:
        'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8'
        'Accept': 'text/javascript'
    )

Our controller is now in place. When the input’s attachment changes, onFileSelect is called which iterates through the collection of files (if multiple) and creates a FileReader instance for each one. The reader then has functionality attached to its onload event in the way of assigning the result to an attribute in our @scope object. The call to readAsDataURL starts reading the file and creates a data: URL representing the file’s data as a base64 encoded string.

Once the form is submitted, the save function is called from the value of ng-submit on our form tag. This performs a standard AngularJS XHR action, and includes the attachment assignment in the params. I have adjusted the Content-Type in the headers to communicate to the server that the content contains URL encoded data. If we had other form fields, we could serialize and append them to the params collection to send to the server alongside the attachment in the same place in the code.

Image Attachments

For added feedback to the user on image attachments, the img tag’s src attribute can accept a base64 encoded string as a value. Since we have this value from our FileReader object, we can update the view instantly with the file without doing any server side processing. To achieve this, we can add an image tag to our HTML file:

<div ng-controller="MyCtrl">
  <form ng-submit="save()">
    <img ng-src="{{attachment}}" />
    <input type="file" ng-file-select="onFileSelect($files)" />
    <input type="submit" />
  </form>
</div>

Next, we can make a few modifications to our onFileSelect function:

onFileSelect: ($files) =>
  angular.forEach $files, (file) =>
    if file.type in ["image/jpeg", "image/png", "image/gif"]
      reader = new FileReader()
      reader.onload = (e) =>
        @scope.$apply () =>
          @scope.attachment = e.target.result
      @scope.reader.readAsDataURL file

AngularJS two way data binding takes care of the messy details for us. The template is bound to @scope.attachment_url. We do some safety checks that the filetype is an image, and then we assign the attachment_url key to the base64 encoded image. A call to scope.apply() will repaint the screen, and the user will see the image they have attached displayed.

Thanks to Nick Karpenske, and Robert Lasch for help with the implementation!

Computers, Personal, Software, Thoughts

“The Deadline”: Part 3 on a Series on Leadership in Technology

Quite a bit has changed since my last post on leadership. I’ve been promoted to the capacity of full time team lead after my mentor, and manager left to work for another company. I was learning a great deal everyday, and being turned loose to manage the team on my own has been a baptism by fire. I’ve applied what I’ve learned already from my previous mentorships, and I’ve learned a great deal more over the last few weeks.

High bandwidth communication is key

I would have never gotten through the last few weeks on emails, and instant messenger alone. Its very important to have communications by voice to keep everyone updated and on the same page while respecting each other’s time. Several times in the past week I’ve watched an email thread bounce back and forth between many participants, only to get boiled down into a concise message at a standup meeting. A five minute verbal QA would take hours via email. Especially across different timezones.

Delegate, delegate, delegate

When moving into a managerial capacity, the content of your day to day work shifts dramatically. The first few days without being in the code left me anxious. But you know what? We have good guys that understand even more than I do, and if you give them a problem, they come back with a solution. Its hard to trust that it will happen, but it did. Over and over again. The key is not in just throwing out a problem and coming back the next day to find a solution. You should be available for questions, and clarification continuously throughout the development of the solution. I often found that checking in a few times a day on each developer was sufficient to answer any questions, understand the progress, and get a rough idea of when something would be delivered. A few times they would casually mention that they are stuck trying to figure out ‘x’. Turns out I know about ‘x’ and after a brief chat with some pointing to places in the codebase they got it squared away.

Be crystal clear about your requirements

We had a new screen we wanted to develop. We had a mockup done by our front end guys. Our BA loved it. Everyone was on the same page. Until we began to implement it and all of these tiny edge cases popped up. I’d assume a course of action was the correct one to take, only to discover that our stories were getting rejected by QA. Turns out we don’t share the same brain, and what I call working, someone else will call a defect. That creates a lot of extra work to resolve what is now a defect. Closer to the deadline of the deliverable, when I switched over to voice communication instead of IM, I was able to lock in requirements quickly and get instant feedback on how we should handle certain edge cases. Don’t be afraid to bother the higher ups with technical questions, because it is up to them to provide the definitive answer. You aren’t doing yourself any favors by shielding them from it and substituting their would be solution with something you invented.

Front load work you are less certain of

If you aren’t clear of how something is going to be executed, that is a problem. Its your job as a manager to find out! You don’t need to know every technical detail, but you do need a clear understanding of who the key players are, what the testing process is like, and how you will get this feature through QA and accepted by the business. We had some work on a feature that impacted a few systems downstream. Did we put this off until we took care of the easy stuff? Nope – we started with the pain in the butt feature. And good thing too because we were working on it up until the last minute. We didn’t realize how much latency there would be in getting just a ‘its working’, or ‘its not working’ from all the parties downstream from us. If you aren’t sure, begin immediately to error on the side of caution. The easy work you have a clear understanding of, and it should take a backseat to anything that isn’t as clear.

Pad your estimates

This entire iteration I was too optimistic with what we could turn out in a day. Estimating is hard, and I’m convinced we all suck at it. I’d like to believe my guys can come up to speed on a new technology stack and crank out some impressive code but there is a learning curve. Sometimes the only way forward is a painfully slow dive into the documentation. There are test suites that take forever to run. There are rejections when it comes time to merge, meaning the suite has to be run again. There are the seemingly simple issues that once you dive into become much more complex than anticipated. Rarely did we close a feature or bug in the amount of time I assumed it would take. Because of this I’m going to start padding my estimates to compensate for all of these little details that add up to something bigger.

And finally – know when to ask for help!

There was a certain point in the sprint when I knew we weren’t in good shape to hit our deadline. I mulled it over in my head and stressed about it, and let my pride get in the way a bit and was convinced that asking for help wouldn’t be necessary. It was a mentality of “if I can just get through these few stories here we will be back on top”. But of course, I’m sucked into a meeting, or new bugs pop up, or the feature I’m working on before it runs way over on time. Don’t hesitate to ask for help when you first identify the need for it. Forget how bad you think it will look. Forget the anxiety of revealing rough new code to outsiders. Its worse if you don’t ask and end up missing your target. I was surprised at how strong the support was once we asked for help. I had 10 extra people jump in and had them all working in parallel on the feature. My entire day was pointing people to new work, and answering questions, and following up on when it would be merged. And the astounding thing was by NOT touching the code, we were delivering more than if I had put on headphones and jumped in myself. And for the record, there was a minimum of sniping at our technical solutions from outsiders. It felt good to know we were a team and it didn’t need to be a perfect solution before you left outside people in for help.

Open-source, Software, Vacations

Colocation Week in Dallas, TX

Working remote has some unconventional consequences. A big one being its very common to have never met in person the people you have worked side by side with. Its a strange sensation to recognize a voice you’ve heard almost daily but not be able to apply a face. It turns out that many people don’t look at all like what you have imagined them to be.

To have some fun and meet our coworkers, our company hosted its first “colocation week” in Dallas, TX. After a 24 hour meet and greet, we all sat down on our new teams and sat together at the same table for the first time. It was a blast! Aside from a few major issues (hotel Internet, and Dallas being a dry county come to mind!), we did a lot of good.

This evening marked the end of our a 30-hour hackathon to compete for the grand prize of taking home a Google Glass dev kit.

It was hard work, with us stopping the night before at 2am, only to get a few hours sleep and jump right back into coding. Our team pitched the idea of brick and mortar stores integrating iBeacons (Bluetooth LE) devices to target proximity based offers and suggestions. The resulting app had some fun mechanics, that I’d love to see make it into stores:

  • Personalization and announcement when you walk into the store with your device
  • Assistance in locating goods at an aisle level
  • Scan as you go shopping
  • Integration with online payments to avoid checkout lines

 

There were strong tie-ins for the business side as well, with foot traffic analysis and hyper relevant offer targeting. The screen show is the Android activity returned as a user enters the geofencing of the first shop’s aisle.

It was tough to jump back into Android development after a few years, but it came back. Java is the language that just won’t die.

We had an awesome team, and its wonderful to work for a company where everyone is as motivated as you to deliver something kickass. Hopefully we will get a chance to work with some of these technologies.

Open-source, Software, Web

Inserting Large Data Sets in MySQL

Its always interesting for me to work with large data sets. The solutions that work in lower orders of magnitude don’t always scale, and I am left with unusable solutions in production. Often the problems require clever refactoring that at a cursory glance appear identical, but somehow skirt around some expensive operation.

I had a requirement to tune a script that was responsible for inserting 300k records in a database table. The implemented solution of iterating through a collection and calling ‘INSERT’ was not scaling very well and the operation was taking long enough to time out in some runs. This gave me the opportunity to learn about a few things in MySQL including the profiler, and (spoiler!) the INSERT multiple records syntax.

I needed some real numbers to compare the changes I would be making. My plan was to change one thing at a time and run a benchmark to tell if the performance was 1) better 2) worse, or 3) not impacted. MySQL has an easy to use profiler for getting this information. Inside of the MySQL CLI, you can issue the command:

SET profiling=1;

Any subsequent queries you run will now be profiled. You can see a listing of queries you want to know more about by typing:

SHOW profiles;

This command will show an index of queries that have run, along with their associated Query_ID. To view more information about a particular query, issue the following command replacing x with the Query_ID:

SHOW profile FOR QUERY x

Here is an example output:

+------------------------------+----------+
| Status                       | Duration |
+------------------------------+----------+
| starting                     | 0.000094 |
| checking permissions         | 0.000003 |
| checking permissions         | 0.000002 |
| checking permissions         | 0.000001 |
| checking permissions         | 0.000003 |
| Opening tables               | 0.000021 |
| System lock                  | 0.000008 |
| init                         | 0.000039 |
| optimizing                   | 0.000012 |
| statistics                   | 0.000717 |
| preparing                    | 0.000014 |
| Creating tmp table           | 0.000023 |
| executing                    | 0.000002 |
| Copying to tmp table         | 0.016192 |
| converting HEAP to MyISAM    | 0.026860 |
| Copying to tmp table on disk | 2.491668 |
| Sorting result               | 0.269554 |
| Sending data                 | 0.001139 |
| end                          | 0.000003 |
| removing tmp table           | 0.066401 |
| end                          | 0.000009 |
| query end                    | 0.000005 |
| closing tables               | 0.000011 |
| freeing items                | 0.000040 |
| logging slow query           | 0.000002 |
| cleaning up                  | 0.000015 |
+------------------------------+----------+

In one iteration of my SQL query, I was spending an excessive amount of time “copying to tmp table”. After reading the article http://www.dbtuna.com/article/55/Copying_to_tmp_table_-_MySQL_thread_states, I was able to isolate the cause of this to an ORDER clause in my query that wasn’t strictly necessary. In this example, Not too much exciting is going on, which is a Good Thing.

For a comprehensive listing of thread states listed in the Status column, view: http://dev.mysql.com/doc/refman/5.0/en/general-thread-states.html

Now that I know my query is as optimized as it can be, its time to pull out the bigger guns. On to plan B – consolidating those INSERT statements!

An INSERT statement, though executing seemingly instantaneously under small loads is comprised of many smaller operations, each with its own cost. The expense of these operations is roughly the following: (http://dev.mysql.com/doc/refman/5.0/en/insert-speed.html)

  • Connecting: (3)
  • Sending query to server: (2)
  • Parsing query: (2)
  • Inserting row: (1 × size of row)
  • Inserting indexes: (1 × number of indexes)
  • Closing: (1)

As you can see, connecting to the server, sending the query, and parsing are relatively expensive operations. In the script I was modifying, 300k INSERT statements were generating 300k records. Fortunately for us, MySQL doesn’t force our records to be 1:1 with our INSERT statements thanks to allowing multiple insertions per INSERT. To use this feature instead of having 3 INSERT statements:

INSERT INTO foo (col1, col2) VALUES (1, 1);
INSERT INTO foo (col1, col2) VALUES (2, 2);
INSERT INTO foo (col1, col2) VALUES (3, 3);

We can instead coalesce them into a single INSERT statement

INSERT INTO foo (col1, col2) VALUES (1, 1), (2, 2), (3, 3);

How many values can we coalesce into the same INSERT statement? This isn’t driven by a max number of records, but rather a server system variable sysvar_bulk_insert_buffer_size: http://dev.mysql.com/doc/refman/5.0/en/server-system-variables.html#sysvar_bulk_insert_buffer_size This can be modified, but the default is 8388608 bytes. The exact number of records will vary depending on the number of columns, and the amount of data being inserted into those columns. I conservatively chose to coalesce 5k records at a time. I tried to bump this to 10k, but I encountered an exception when I exceeded this server system variable maximum.

With my INSERTS coalesced, I was able to reduce my total number of INSERT statements to 60 (300k / 5k). This yielded massive performance boosts. I was able to take the query from over an hour to run to completing in just 2 minutes. Quite a nice trick, considering the data is unchanged.

Is there room for improvement? Absolutely. A statement executed 60 times may be worth preparing, or wrapping inside of a transactional block. My real world tests didn’t yield a significant enough performance boost to make these complexities worth implementing. This may not be true with data in higher orders of magnitude, or different schema layouts. MySQL also understands INDEX hints, which allow you to suggest INDEXES that may be missed by the query planner, or force the inclusion or exclusion of beneficial, or detrimental INDEXES despite what the query planner thinks! (http://dev.mysql.com/doc/refman/5.0/en/index-hints.html)

Speaking of INDEX, if any are using UNIQUE, BTREE type, these can be dropped while the mass INSERT is running, then added back later to side-step the 1n INDEX operational hit.

In the next order of magnitude, I will probably have to rethink my approach of using INSERT statements to load data. According to the MySQL documentation, LOAD DATA INFILE is “roughly 20 times faster” than a MySQL INSERT statement. My script would no longer generate statements, but rather output to a file in a comma delimited format. This could then be loaded assuming appropriate permissions are in place.

Happy profiling!

Apple, Computers, Ruby, Software

Upgrading Ruby with Rbenv+Homebrew

Heroku has defaulted to Ruby 2.0 for all applications, so its probably time you updated that crufty old version you have been running. Unfortunately the process is less than straightforward, especially when using a version manager. Assuming you are running rbenv with a Homebrew version of ruby-build, this guide will get you running the latest version of Ruby:

To begin, check which versions of Ruby Rbenv knows about. Rbenv delegates this work to ruby-build:

rbenv install --list

Best case scenario you have a recent version of ruby-build and you see the version of Ruby you want in this listing. At the time of this writing, version 2.0.0-p247 is the most current. If your desired version is present, skip the following steps and just install the Ruby version via:

rbenv install 2.0.0-p247

If your version is not present in the list, you will need to upgrade ruby-build so rbenv knows about the more recent versions of Ruby. Assuming you installed ruby-build via Homebrew, you can update it by issuing:

sudo brew update

Issuing this command complained about having untracked files within the Homebrew directory (which is actually just a git clone of the homebrew project). This may not be the correct way to fix this problem, but I issued the following command to stash these untracked files, and uncommited changes so they don’t interfere with the upgrade process:

cd /usr/local && git add . && git stash

This should now be a clean directory, and you can issue the brew update command again.

Now that brew is updated, you should have the latest “formula” for ruby-build. You can then issue the command to update ruby-build itself:

brew upgrade ruby-build

Once this completes, we can list versions of Ruby via rbenv again to ensure our desired Ruby version is now in the list. Once you see this, you can issue the following command to install a known version of Ruby:

rbenv install 2.0.0-p247

To use your shiny new version of Ruby, you can set this to be the default version:

rbenv global 2.0.0-p247

You can also set this per project, or by setting an environmental variable to override, so don’t worry if not all your projects are Ruby 2.0 ready. You can easily switch between versions – and that’s the point of version management right?

You can confirm you are running the latest Ruby version by issuing:

ruby -v

Note that you will need to re-bundle any gems from your Gemfile against the new Ruby version, as well as rehash any rbenv shims for these gem executables:

gem install bundler && bundle install && rbenv rehash

For more information, check out the rbenv, and ruby-build documentation. To discover the latest stable version of Ruby you can peek at the official Ruby download page to find out the latest version and patch number. Finally, check out the Homebrew docs if you are still stuck.

Computers, Software

The Regular Expression Behind Currency Formatting

I came across an interesting requirement today to format a floating number as currency. In the US and many other countries, currency greater than or equal to 1000 units ($1,000.00) is typically broken up into triads via a comma delimiter. My initial reaction was to use a recursive function, but I glanced at the Rails’ ActiveSupport gem to see how their number_with_delimiter method performed this same task. I was surprised to learn that this was handled via a clever regular expression.

View at https://coderwall.com/p/uccfpq