Apple, Computers, Hardware, Linux, Software, Windows

Using Synergy Over VPN

I’ve been watching a lot of Linus on Tech and one of their sponsors sells a product called Synergy. More information on their product page: https://symless.com/synergy . To summarize, this is a software KVM (no video, so I guess it is “KM”?) solution. The use case was perfect for me. On my desk I have an Apple work laptop, a Windows desktop, and a Linux personal laptop. Anyone that has done extensive work on a laptop keyboard and touchpad know that it isn’t optimal. I didn’t want multiple keyboards and mice on top of my desk because of the clutter. I dropped some money for Synergy, and it just works!

20170731_110826_HDR.jpg

That is until I had to connect to our company VPN the next week. They use a full tunneling solution. When I connect, I lose everything. I can’t print, I can’t access my NAS, but most importantly I can’t access my keyboard and mouse. (The video is fine because it is a hard wire to an external monitor). What to do?

SSH to the rescue! What is SSH? This is a protocol that will allow one computer to securely interface with another computer. Secure SHell. However, we will just be using it for port forwarding, and not for an interactive session. The goal is to take the OS X machine (Synergy client), and SSH into the Windows server (Synergy server). Using this SSH connection, we can forward ports within it. It is a tunnel running inside the SSH connection. This will expose a local port on OS X for 24800 that is actually pointing to the remote server port 24800. This is the port that Synergy uses for its connections.

You will need a few tools, and a little patience. Having just gone through this, I’m sharing for posterity, or maybe for anyone that has thrown in the towel with how crippled VPN makes accessing home devices.

I have the following Synergy setup:

  • Windows 10 Synergy server (keyboard and mouse are physically connected to the desktop)
  • OS X Synergy Client
  • Linux Synergy Client
  • Router with a local area network all these devices share
  • Admin access to the router for port forwarding
  • Autossh package for OS X (available via brew)

First step, get Windows 10 up to speed with SSH. How this isn’t built in as a service in the year 2017 I have no idea. Grab the OpenSSH server package for Windows from https://www.mls-software.com/opensshd.html . After downloading, extract and run the setup file. This will create a new Windows service for OpenSSH that will run on port 22. It prompts you to generate an SSH key for the server.

Once this server is running, you will need to add your user to the list of SSH users. Open up PowerShell as an administrator and change into the C:\Program Files\OpenSSH\bin directory. Run the following commands:

mkgroup -l >> ..\etc\group
mkpasswd -l >> ..\etc\passwd

Try and connect to your SSH server from the OS X client:

ssh <user>@<server IP> # e.g. ssh Ben@192.168.1.95

You should be prompted for your Windows password. Once you can successfully login to the server, we can setup public key authentication. This removes the need for you to type in your password because you identify yourself with an SSH public key. From your OS X machine get your public key:

cat ~/.ssh/id_rsa.pub

Put the contents of this file on your SSH server in the file C:\Program Files\OpenSSH\home\<user>.ssh . This is actually a symlink to C:\Users\<user>.ssh . If the directory .ssh doesn’t exist, you will need to create it first. Now we need to configure the server to allow public key authentication. Edit the C:\Program Files\OpenSSH\etc\sshd_config file and change the following lines:

StrictModes no
PubkeyAuthentication yes
AuthorizedKeysFile .ssh/authorized_keys

Restart the OpenSSH server for the changes to take effect:

net stop opensshd
net start opensshd

You should now be able to SSH into the server same as before but without being prompted for a password.

Now we are ready to create an SSH tunnel. Before we incorporate AutoSSH (which handles retries and monitoring) we will do a naive attempt to SSH. In the following command:

  • -f backgrounds the process
  • -L does port tunneling in the format of <local port>:<remote host>:<remote port>
  • -N do not run a command – just tunnel the port
ssh -f <user>@<remote public IP> -L 24800:<remote public IP>:24800 -N

If this works, you should see a [LISTEN] entry for port 24800 when you list open files:

lsof -n -i | grep 24800

You may need to set your server as the DMZ on your network. Or to be safer you can simply setup port forwarding. We will need port 22 and port 24800 to resolve to the Windows server. The instructions for how to do this on a router widely vary by vendor. Typically it is under a WAN section. It typically prompts for a port, a destination IP, and destination port, and protocol. You want ports 22 and 24800 to route to your server IP for TCP and UDP.

Configure your Synergy client to use localhost instead of the remote IP. You should now be able to operate your client from the server’s peripherals via Synergy.

Everything works great until the VPN connection is made. The reason is that the SSH connection is severed. In order to recover automatically, I have added autossh to persist this tunnel. On the OS X client instead of running SSH do the following:

AUTOSSH_POLL=10 autossh -M 20000 -f -N <user>@<remote public IP> -L 24800:<remote public IP>:24800

Now when a VPN connection is made, or a disconnection happens, the autossh package will detect that it is no longer alive and retry. Because Synergy’s software also retries, after a few seconds your connectivity should begin working again.

Thanks to Synergy for making a solid product, and for having first class Linux support.

Advertisements
Hardware, Linux, Open-source, Ruby, Software, Uncategorized

Delayed Job Performance Tuning

We found a bug. A bug that affected a lot of historical records that we now have the pleasure of reprocessing. Fortunately we already had an async job infrastructure in place with the delayed job gem. Unfortunately, this gem is intended for fairly small batches of records and isn’t tuned to handle 1M+ records in the delayed_jobs table.

After reading some best practices we decided on a queue based approach. To keep our day to day async jobs running we would use a “default” queue. And to reprocess our old records we used a new queue. Starting the workers up with a “–queue” flag did the trick. We had hardware dedicated for day-to-day operations, and new hardware dedicated to our new queue operations. Now it was simply a matter of filling up the queue with the jobs to reprocess the records.

Our initial approach maxed out the CPU on our database server. This was largely due to us not tuning our SQL in our async jobs. Because the volume we processed was always low, this was never really a noticeable problem. But when we threw lots of new jobs into the queues, it became very noticeable. The workers would start up, then mysteriously die. After some digging in /var/log/kern.log we discovered the workers were being killed due to an out of memory manager. Attaching a small swap partition helped, but once you’ve hit swap, things become horribly slow. Whats the point in keeping the worker alive if its running a thousand times slower? Clearly we needed to profile and tune. So we did. (The specifics are out of scope for this article, but it involved consolidating N+1 queries and limiting columns returned by the SELECT).

With our newly tuned SQL our spirits were high as we cranked up the workers again. Only to reach the next process bottleneck. And this is where databases gets frustrating. Delayed job workers run a query each time they are looking for a new job to find out which job to pick up. It puts a mutex lock on the record by setting locked_at and locked_by. The query looks like this:

UPDATE delayed_jobs
SET `delayed_jobs`.`locked_at` = '2016-06-05 11:48:28', 
 `delayed_jobs`.`locked_by` = 'delayed_job.2 host:ip-10-203-174-216 pid:3226' 
WHERE ((run_at <= '2016-06-05 11:48:28' 
AND (locked_at IS NULL OR locked_at < '2016-06-05 07:48:28') OR locked_by = 'delayed_job.2 host:ip-10-203-174-216 pid:3226') 
AND failed_at IS NULL) 
ORDER BY priority ASC, run_at ASC 
LIMIT 1;

The UPDATE does an ORDER which results in a filesort. Filesorts are typically something an index can resolve. So I optimistically added the following:

CREATE INDEX delayed_job_priority
ON delayed_jobs(priority,run_at);

Sadly, this index was completely ignored when I ran an EXPLAIN on my UPDATE. And the reason is that MySQL doesn’t execute an UPDATE query the same way as if you did a SELECT with the same conditions. The index probably made things worse, because now with each record update, we now also have an index update as well. I could probably fork the code and probably use some type of isolation level in a transaction to get the best of both worlds with an index based SELECT, and a quick UPDATE on a single record by id. But there are easier solutions to try first.

My UPDATE statements were pushing 40 seconds in some cases according to MySQL. Eventually the lock wait timeout is exceeded and you see an error in the delayed_jobs.log:

Error while reserving job: Mysql2::Error: Lock wait timeout exceeded; 
try restarting transaction

Jobs were moving very slowly, and throwing more workers at it didn’t make an improvement. This is because each time a worker picks up a job, it was waiting 40+ seconds. The UPDATE was doing a filesort, and any index was being ignored. (And MySQL doesn’t support UPDATE hints). It was pretty clear that all of the jobs from the reprocessing queue needed to find a new home that didn’t negatively impact my filesort. I settled on the following solution:

CREATE TABLE delayed_jobs_backup LIKE delayed_jobs;

INSERT INTO delayed_jobs_backup
SELECT * FROM delayed_jobs WHERE queue='new_queue';

DELETE FROM delayed_jobs WHERE queue='new_queue';

This creates a new database table with the structure of the existing delayed_jobs table. The table is then populated with the jobs that needed to find a new home (All 1M+ of them). And finally, deleted from the original delayed_jobs table. Be careful doing this, and do some SELECT/EXPLAIN queries in between to ensure you are doing what you think you are doing. (Deleting 1M+ records from a production database makes me sit up in my chair a bit).

Looking at MySQL’s process list I no longer have System locks on my UPDATE statements (presumably because the table size is small enough that the filesort is mostly painless):

mysql> SHOW FULL PROCESSLIST;
# Id, User, Host, db, Command, Time, State, Info
1, user, 0.0.0.0:34346, localhost, Query, 0, System lock, UPDATE ...

The important columns here are the Time (in seconds), State, and Info. This proves that my job locking was happening quickly. I was seeing Time values of 40+ seconds before. I kept referring back to this process list to verify that the UPDATES were remaining snappy while I modified the number of workers running, and the number of jobs in the queue. I had a goal of keeping the UPDATE system lock times under 2 seconds. Adding more workers pushed the times up. Adding more jobs to the queue pushed the times up. Its a balance that probably depends very much on what you are processing, how much your database can handle, and what your memory constraints are on your worker servers.

To conclude – my job over the next few days will be to run the following command to put some jobs into the delayed_jobs table 10,000 at a time:

INSERT INTO delayed_jobs
SELECT * FROM delayed_jobs_backup LIMIT 10000;

DELETE FROM delayed_jobs_backup LIMIT 10000;

You can of course automate this. But my objective was met. The jobs can reprocess old records without impacting day to day jobs in the default queue. When the delayed_jobs table is almost empty, I move over another batch of jobs from the delayed_jobs_backup table. Rinse and repeat until there are no more jobs left to process. Its a bit more painful, but day to day operations continue to function, and I can cross of the reprocessing task from my list of things to do. All without any code changes!

I’ve been reading up on transaction isolation levels thinking something like a SELECT FOR UPDATE lock might be worthy contribution to the delayed_job codebase: http://dev.mysql.com/doc/refman/5.7/en/innodb-transaction-isolation-levels.html

Computers, Linux, Open-source, Ruby, Software, Web

Migrating from Bamboo to Cedar

Heroku recently sent me a nice email:

You have one or more applications deployed to Heroku’s Bamboo stack. Bamboo went live almost four years ago, and the time has come for us to retire it.

You may have received a previous message about Bamboo deprecation. That message may have had an erroneous app list, but this email has the correct list of Bamboo apps that you own or are collaborated on. We’re sorry for any confusion.

On June 16th 2015, one year from now, we will shut down the Bamboo stack. The following app/apps you own currently use the Bamboo stack and must be migrated:

This is on the heels of an email about ending legacy routing support. It seems they have been quite busy over at Heroku, but I can’t complain too seriously about free app hosting.

Upgrading a legacy Rails application to the Cedar stack did require a few changes. I’ll document some stumbling blocks for posterity.

Foreman

The first big change described in the general purpose upgrading article from Heroku: https://devcenter.heroku.com/articles/cedar-migration was the use of foreman to manage your web services. I luckily have a simple app, and did not need to worry about resque, mailers, etc. This made my Procfile rather straightforward:

web: bundle exec unicorn -p $PORT -E $RACK_ENV -c config/unicorn.rb

Absent from the foreman crash course is any information about the corresponding .env file. I did need to add a few environmental variables to keep on working in development as usual:


RACK_ENV=development
PORT=3000

Once the Procfile was committed (end the .env) file added to .gitignore I then added Unicorn to the gemfile. I’m not sure if Unicorn is strictly necessary over webrick in development, however this is the example in the Cedar upgrade guide, so I wanted to run as close to production as practical to prevent any surprises.

After installing Unicorn, I then needed to touch config/unicorn.rb since it did not exist. Against, its in the example, but I’m not sure if its strictly necessary especially given that its just an empty file for me. To start your Rails application, you now issue foreman instead of of the older rails s

Devise incompatability

Not directly related to the Cedar changes, but a common gem, so worth mentioning. Devise 2.x has removed migration helpers. I stupidly didn’t lock my version of Devise in my Gemfile so I was confused why this was failing for me. I found the last 1.x version of Devise by running this command:

gem list devise --remote --all, and the specifying ‘1.5.3’ as the second argument in my Gemfile on the devise entry.

PostgreSQL

Heroku requires Postgres in production and I was previously using sqlite in my development environment. Again, to mirror production I wanted to use Postgres my development environment so that I could be as close to production setup as practical. I took a quick trip down memory lane to setup and configure PostgreSQL on my Linux development machine: https://mrfrosti.com/2011/11/postgresql-for-ruby-on-rails-on-ubuntu/ . An interesting observation is that in Ubuntu 14.04 LTS, PostgreSQL runs on port 5433, and not the default 5432. This can be verified with netstat -nlp | grep 5432. If no entries come back, PostgreSQL is running on a non-default port. Try to grep for 5433, or other numbers until you find the process.

Backing up your database

Before I made any server changes, I wanted to have an up to date copy of the production databse on my machine to prepare for the worst. This can be done by using the pgbackup commands:


heroku pgbackups:capture //create a backup
heroku pgbacksup:url //make a public URL for that backup

Then I clicked the public URL and downloaded the file – safe and sound to my computer. By the way, you can import this production database into your local instance of PostgreSQL using the following command:


pg_restore --verbose --clean --no-acl --no-owner -d development /path/to/database

Pre-provisioning your database

A quick note on database availability. I got into a chicken and the egg scenario where the app wouldn’t deploy without the database, and the database seemingly couldn’t be created without the app first being deployed. Heroku has an article on pre provisioning and I found it a necessary prerequisite to deploying to my newly created test Cedar stack: https://devcenter.heroku.com/articles/pre-provision-database

To pre-provision your database, run the following:

heroku addons:add heroku-postgresql

You can even import the database from production, etc as part of the heroku utility. I used the public database URL I created above to populate my new cedar stack database:


heroku pgbackups:restore

Migrating to Cedar

Once I had tested on a new app, and in a feature branch I had confidence everything was working as expected. I was ready to make my changes to the production app by migrating it to Cedar. To do this, the command is:


heroku stack:migrate cedar

I either have an old version of the Heroku gem, or this is a discrepency in the gem and the non-gem packaging, but the docs misidentify this command as: heroku set:stack cedar which was not a valid command for me. The migrate command above appears to be analagous.

Once I merged my cedar feature branch back into master I was ready to push to master. And FAIL. It turns out that I needed to precompile my assets, which had a dependency on the database existing. I tried to pre-provision as I had done on my cedar branch, however the results were the same after running this command.

A quick search yielded https://devcenter.heroku.com/articles/rails-asset-pipeline#troubleshooting the advise to add the following line in the config/application.rb file:

config.assets.initialize_on_precompile = false

Summary

I’ve learned quite a bit about Heroku in this upgrade experience. Their changes force me to use the latest software which is nice in some ways. When everything is running on my website, I don’t often worry about upgrading until I get an email like the one above.

The downside of course is that this upgrade process is a pain in the ass, and is error prone, and affects production websites that are running smoothly. If it isn’t broken, you don’t want to fix it. Except this time you have to in order to have it continue to function after June, 2015.

Best of luck to other people upgrading. Just be patient, and test everything in a new app if you have any doubts.

Computers, Hardware, Linux, Software

Trials and Tribulations of the D-Link DNS-320

A funny noise, is something you never want to hear coming out of your primary disk drive. If you are like me, you probably never think too much about backing up your important data before it becomes a looming emergency. I realized while listening to that noise, and watching the red HDD indicator light stay on constantly that I had some pretty important stuff on that drive. Wedding photos, baby photos, videos, documents and other things that you can’t just re-download. Why not have cloud backup? I’m a bit skeptical of entrusting others with my data. At the end of the day, a failure on their end may net you a refund, and you could even take them to court, but no matter what, you are never getting that content back. It was time to take matters into my own hands.

The Setup: I bought a two-bay D-Link DNS-320, and two Western Digital 2TB Red hard disk drives. The plan was to setup the disks in RAID 1, so the odds of a simultaneous failure were as statistically low as my finances would allow. I liked the D-Link 320 because it had compelling features, and a two drive system for a fraction of the cost of other names like Synology. I wanted CIFS and NFS sharing, and RAID 1. Everything else was a perk. Pleasantly enough the DNS-320 also comes with a UPnP server, and has some nice SMART monitoring options, which will send an email to me in the event that errors are detected. I would create one 2TB partition out of the disks, and share this over the network with restricted access. My wife and I would both be able to connect from our computers to back up any data we wanted.

Configuration: CIFS setup was a breeze, but NFS required a bit more poking around. This post is dedicated to overcoming some of the issues I had. Within the NAS web interface, partition your drives, and grab a cup of coffee. Straight forward stuff. I created a single partition labeled “Volume_1”. After this is complete, go to “Management” -> “Application Management” -> “NFS Service”, and check “Enable”, then click “Save Settings”.

Go into “Management” -> “Account Management” -> “Users / Groups” and create a user account if you have not already done so. Within “Account Management”, click on “Network Shares”, and click the “New” button. This will launch the wizard for setting up a share. Select the appropriate users, groups, settings, and on the “Step 2-1: Assign Privileges – Access Methods”, ensure that the “NFS” checkbox is checked.

Move along by clicking “Next” until you reach “Step 2-1-2: NFS Settings”. You will need to specify the Host IP address of the client that will be connecting to this share. This will white list the IP address supplied as a client location. I’m not positive what format you would use the denote multiple IP addresses, however, an asterisk character allows all hosts. I connect with multiple machines via NFS, so using a particular IP address is not sufficient. Despite being accessible to any IP address, the client will still need to authenticate using their credentials entered in “Step 1” and “Step 1-2”. I consider this good enough. Also, ensure the check the “Write” check box if you wish to be able to write files to this location.

Client Configuration: You will need to install the nfs package for your distribution of Linux. I am running Ubuntu 12.10, so the package is named “nfs-client”. Install it using “sudo apt-get install nfs-client”.

Now that you have the nfs-client package, you can use the “showmount” utility to list the shares on the NAS device: “showmount -e NAS_IP_ADDRESS” (e.g. showmount -e 192.168.1.1″). Depending on how you have the disks partitioned and shared in the NAS device, this path will differ.

Export list for 192.168.1.1:
/mnt/HD/HD_a2 *

This path should be consistent with the information in the “Network Shares Information” dialog. This can be accessed by clicking the magnifying glass icon underneath the NFS column in the “Network Shares” interface.

You can now mount your device using the “mount” command: “mount -t nfs NAS_IP_ADDRESS:/REAL_PATH /path/to/mount_point“. The NAS_IP_ADDRESS is the IP address of the NAS device. The REAL_PATH is the information obtained either via showmount, or the “Network Shares Information” dialog. The “/path/to/mount_point” is just an empty directory somewhere on your local machine.

You can also set this mounting option up to be persistent on reboots using the “/etc/fstab” file. Add a new line to this file, and format your entry similar to as follows:

# /etc/fstab: static file system information.
# ...
# <file system> <mount point>   <type>  <options>       <dump>  <pass>
# ...
192.168.1.1:/mnt/HD/HD_a2  /media/nas  nfs rw,hard,intr 0  0

The values used are identical to the values used in the mount command preceding this example. The options specify changes to the mount behavior. “rw” specifies “read/write” permissions. “hard” retries requests indefinitely. Coupled with “intr”, this allows requests to retry until the NFS server becomes unreachable, in which case the retries would stop. I would recommend these option when copying large amounts of data, or when on a wireless network as failed transmissions will be silently repeated without raising an exception on temporary timeouts, etc.

Note on Permissions: Initially when I copied my content over to the NAS, I did so via NFS, and I was not able to view the contents via CIFS (Windows sharing). The problem came down to permissions. The directories did not have a executable bit set for “other”, so permission was denied when a request was made to show the contents of a directory. This was difficult to locate as the UPnP server showed all my media without any permissions issues. A quick search of UNIX permissions revealed that the executable bit is necessary to list directory contents, and the NAS is accessing content created via NFS as “other” (neither user, nor group permissions apply). You can recursively grant the execution bit of any existing content by issuing the following command from the top directory: “chmod o+x -R /root_directory“, where “root_directory” is the folder you want to change. The “-R” flag will recursively apply this permission to all content within.

Lessons Learned: I have about 50GB of pictures and video, and another 20GB in purchased content. I underestimated how long this would take to transmit on a 802.11g connection. 54Mbps is just under 7MB/s. This means that a 50GB transfer would take over 2 hours to complete. And that is not counting temporary speed drops, hard drive access times, retries, etc. When working with large amounts of data, I recommend a 1Gbps Ethernet connection. I will probably be investing in a new router soon that can accommodate these higher speeds.

The D-Link DNS-320 is a solid first time NAS device for under $150. Other than a few gotchas when setting up the NAS (most self inflicted), this device is NFS friendly, and has made a fine edition to my hardware ecosystem.

Computers, Hardware, Linux, Personal, Software, Thoughts

Self Realizations – Part I

During World War II, when you needed to get communications between two points, you often had to run a telegraph wire through enemy territory. I’m picturing the scene from Enemy at the Gates – where a soldier puts on a helmet, gets a spool of wire, and crawls on his belly through the mud, dodging enemy fire, and landmines. The goal is to not get picked off before your reach your destination because everyone is counting on you to make the connection.

Lately I have been engrossed in a side project that has given me an opportunity to work with the Android SDK. I have been so tickled at figuring out everything for the first time. Though I am moving at a snail’s pace, and it can be painful to have to constantly reference the documentation, StackOverflow, and Google at large, it has been a fun experience. Small things like talking to a database, or rotating a bitmap feel like big achievements, and make the struggling worth it. Seeing the Java side of the world puts some things about Ruby into perspective too. I know I am better having tinkered with it, and I had fun while doing it.

I have come to realize that its why I love programming. I love running that first line across unknown territory. It is proof that I can accomplish what I set out to do even with almost no prior knowledge about an environment. It is the same rush I get when tinkering with my car, or building computers, installing a ceiling fan, compiling a kernel, or raising a kid. It is about creating something to solve a problem using common tools and applying knowledge to make something awesome of it all. If I didn’t program, I’m not sure what other career I would have that would give me this same chance to tinker with new stuff.

As part of this self realization, I have discovered by my child-like excitement in my accomplishments, how much I miss this in my current work capacity. I’m not building new things anymore. I’m just polishing the same things, and the details don’t really excite me like the prototypes do. I like “broad strokes”. We need people that do the detail work too, but its decidedly not for me.

So find out what it is that you love, and make it happen. Your job and your passion aren’t always in phase, but don’t let let your passion die out just because you are getting paid to do something else.

Apple, Computers, Linux, Open-source, Ruby, Software, Thoughts, Web

PostgreSQL for Ruby on Rails on Ubuntu

My new desktop came in at work this week, and the installation was painless thanks to the great driver support of Ubuntu 11.10. For anyone setting up a Rails development box based on Linux, I have some tips to get around some pain points when using a PostgresSQL database.

Installation:

Postgres can be quickly and easily installed using apt-get on Debian or Ubuntu based distributions. Issue the command:

apt-get install postgresql

Ruby Driver

In order for Ruby to connect to PostgreSQL databases, you will need to install the pg gem. This gem will need the development package of PostgreSQL to successfully build its native extension. To install the PostgreSQL development package, issue the following command:

apt-get install libpq-dev # EDIT: postgresql-dev was replaced by this package on Ubuntu 11.10

Setup A PostgreSQL Role

You can configure PostgreSQL to allow your account to have superuser access, allowing your Rails tasks to create and drop databases. This is useful for development, but is strongly discouraged for a production. That being said, we can create a PostgreSQL role by logging into psql as postgres as follows:

su postgres -c psql

This will open a PostgreSQL prompt as the database owner postgres. Next, we need to create an account for our user. This should match the response from “whoami”:

create role  superuser login;

We can now exit from psql by issuing “q“. Try to connect to psql directly by issuing the following command from your shell account:

psql postgres

This should allow you to connect to the default database postgres without being prompted for credentials. You should now be able to issue the rake commands for creating, and dropping the database:

rake db:create

Rspec Prompts for Credentials

I was being prompted by Rspec for credentials when running my test suite. If you would like to remove this credential prompt, please read the following:

There are differences in how the PostgreSQL package is configured in Homebrew on OS X, and how it is packaged in the Ubuntu and across other distributions. One difference is in the level of security configured in the pg_hba.conf file. This file is responsible for identifying which sources using which authentication mechanisms should be allowed or denied. By default, Rspec will cause a prompt for a password even if your shell account has trusted permissions. This is because Rspec connects not as a local process, but to localhost. To allow connections to localhost to be trusted, you will need to modify the pg_hba.conf file.

Next, we can modify the pg_hba.conf file located at /etc/postgresql/<version>/main/pg_hba.conf

Comment out the lines any lines at the bottom of the file and append the following:

local   all             all                                      trust
host    all             all              127.0.0.1/32            trust
host    all             all              ::1/128                 trust

This will allow connections from the shell, as well as connections to 127.0.0.1 (localhost) using both IPv4 and IPv6.

You will need to restart PostgreSQL for the changes from this file to take affect:

/etc/init.d/postgresql restart

PostgreSQL Extensions

If you want to make use of any of the additional extensions to Postgres, including fuzzystrmatching, you will need to install the postgresql-contrib package:

apt-get install postgresql-contrib

The extensions will install to /usr/share/postgresql/<version>/extension/

Using the Postgres version 9, you can create these extensions in your database by using the new CREATE EXTENSION syntax. In the case of the fuzzystrmatch extensions, you can issue the following command from inside a PostgresSQL command prompt to load the extensions:

psql ;

Once inside your database:

create extension fuzzystrmatch;
Computers, Linux, Open-source, Ruby, Software, Web

Setup PostgreSQL with Rails on Linux

Today, I found myself needing to setup a Rails application to work with the PostgreSQL database. I found that the documentation on the PostgreSQL website was like drinking from a fire hose. Worse was every community response for an error message has a slightly different approach to the solution. Lets run through a basic Rails PostgreSQL configuration assuming Rails 3, Postgres 8.x, and Ubuntu 11.04:

Step 1: Installing PostgreSQL and Libraries

Install the PostgresSQL server, the client package to connect (psql), and the pg library needed to compile the Ruby PostgreSQL driver:

$ sudo apt-get install postgresql postgresql-client libpq-dev

After this finishes installing, you can turn to your OS X co-worker and laugh at him while he is still downloading the first tarball file. PostgreSQL will start automatically, under the user postgres. You can verify that the installation is a success by using the psql command line utility to connect as the user postgres. This can be accomplished using the following command:

$ sudo -u postgres psql

This uses sudo to elevate your basic user privileges, and the “-u” switch will execute the following command as an alternate user. As the postgres user, this will run psql. If you connect successfully, you should be at the psql interactive prompt. If not, ensure PostgreSQL is running, and that psql is in the path for postgres.

Note: From the psql interactive prompt, type “q” to exit.

Step 2: Configure a New PostgreSQL database

From the psql prompt, you can run SQL to view the current PostgreSQL users:

select * from pg_user;

You should see a table of database users returned:

usename usesysid usecreatedb usesuper usecatupd passwd valuntil useconfig
postgres 10 t t t ********    

(1 row)

We can see the postgres user that was created automatically during the installation of PostgreSQL. Lets add another user to be an owner for our Rails database. The path of least resistance may be to use your shell account username, since it will keep us from having to change some options in the database configuration file.

$ sudo -u postgres createuser 
# Shall the new role be a superuser? (y/n) n
# Shall the new role be allowed to create databases? (y/n) y
# Shall the new role be allowed to create more new roles? (y/n) n

This will create a new database user (named your shell account name), and grant that user access to login to the database. This will ask you a few questions regarding the user account. It is important for Rails that you answer “y” to whether the user should be able to create databases. If you say no, you will not be able to run any rake tasks that create or drop the database.

We can confirm by selecting from the pg_user table again.

$ sudo -u postgres psql
select * from pg_user;
usename usesysid usecreatedb usesuper usecatupd passwd valuntil useconfig
postgres 10 t t t ********    
<username> 16391 f f f ********    

(2 rows)

Step 3: Configure Rails

Switching to the Rails side, lets configure our application for Postgres. This requires the pg gem. Open your Gemfile and append:

# Gemfile
gem "pg"

Now run bundle install to update your project gems.

$ bundle install

This should compile the Ruby pg database driver, allowing Ruby to talk to Postgres. Now, lets tell our Rails application how to access our database. Open up config/database.yml, and change the adapter line to read “postgresql”. The database name by convention is the name of your project plus “_development”. Finally, your shell username is needed. Because PostgreSQL will authenticate this account locally, you will not need to supply a password option. Delete this line.

# config/database.yml
development:
  adapter: postgresql
  encoding: unicode
  database: _development
  pool: 5
  username: 

To test, run the rake task to create your database:

rake db:create

If everything works, you should have a newly created database owned by your shell account. You can login using psql by passing the name of the database as an option:

$ psql -d _development

Happy migrating!

Troubleshooting

If you get the error: “FATAL: Ident authentication failed for user “, ensure that you can see your newly created account in the pg_user table of the postgres database. (See Step 2 above)

If you get the error: “PGError: ERROR: permission denied to create database”, then ensure that your database user account has been granted the privilege CREATE. This can be done during the “createuser” command line account creation by answering “y” to the corresponding question about this permission.

If you get the error: “FATAL: role is not permitted to log in”, try manually granting the privilege to login on your database user account. This can be done by executing the following as postgres in the psql prompt:

ALTER ROLE  LOGIN;

Notes on Alternative Authentications

PostgreSQL integrates very deeply into the Linux authentication world, allowing for quite an array of connection options. By default passwords are not accepted for local connections. Instead, PostgreSQL is configured to use the “ident sameuser” method of user authentication. See more at http://www.postgresql.org/docs/8.4/static/auth-pg-hba-conf.html.