Rails and PostgreSQL job at Paperless Post 3

Posted by Tom Copeland Tue, 10 Aug 2010 17:17:00 GMT

Another Rails and PostgreSQL job, this time at Paperless Post (via the Github jobs board). Looks like they have a few more open positions. And you'd work with Aaron Quint, who's their CTO. Good times!

Rails and PostgreSQL job at elevenlearning.com

Posted by Tom Copeland Fri, 06 Aug 2010 13:33:00 GMT

PGCon 2010 talk on Rails and PostgreSQL 4

Posted by Tom Copeland Tue, 03 Aug 2010 21:04:00 GMT

A while back I posted a link to a talk by Gleb Arshinov that he gave at the SF PUG. This talk was on "PostgreSQL for high performance Rails apps", and was full of fine suggestions from their experiences with their Rails apps.

Gleb is back again, this time on May 21 2010 at PGCon where he and Alexander Dymo talked about PostgreSQL as a secret weapon for Rails apps. Some of the same ground is covered (use SQL DDL vs ActiveRecord create_table, etc), but there's lots of new information too. Here are some notes:


  • 1:10 They're using PostgreSQL 8.4, nginx, and mongrel

  • 4:00-6:00 Talks about dropping down into SQL via ActiveRecord

  • 6:30 Use include to eliminate N+1 queries.

  • 7:30 Watch for things like acts_as_tree that reintroduce lots of queries in exchange for the improvement in abstraction.

  • 9:00 One query, 12 joins - complicated, but query time goes from 8 seconds to 60 ms.

  • 14:00-17:00 A technique for recording SQL queries; this helps ensure you're not running unexpected queries

  • 19:00 Suggests use straight SQL for DDL rather than the ActiveRecord DSL

  • 20:00 Use constraints, FKs, etc to preserve data integrity - "anything you don't have a constraint on will get corrupted"

  • 23:00 Don't use CASCADE since app won't know about the deletions

  • 28:00 Keep a log of times for the most frequent user requests. Alex suggests using integration tests for this; code is at 29:10 and 29:30.

  • 32:30 A technique for loading data with ActiveRecord's select option with PostgreSQL arrays to save on object creation. Questions from the audience about normalization vs efficiency.

  • 38:50 Role/user/privilege checking can be slow; shows a technique for using PostgreSQL's bool_or and GROUP BY to get the data in one fell swoop. Query time went from 2+ seconds to 64 ms.

  • 42:00 Do analytics in the database. Saw speed improve from 90s to 5s and saved tons of RAM.

  • 44:40 Some excellent new PostgreSQL features that are either here now or are on the way (replication, windowing functions)

  • 46:30 Demonstrates a problem with PostgreSQL's LIMIT and OFFSET when used with subselects. Some discussion of pagination with the audience. Here's an excellent discussion of pagination alternatives written by Justin French.

  • 50:30 How to force PostgreSQL to use a subselect vs a join; the example goes from 605ms to 325 ms.

  • 52:20 Be careful with generate_series. Apparently these functions cannot generate hints for the planner.

  • 55:30 General props to PostgreSQL community.

  • 59:40 Need to test queries both in cold state and hot state; they saw 14x speed difference.

  • 1:01:40 Tune PostgreSQL - shared_buffers, work_mem, autovacuum, etc. Rely on community knowledge for initial configuration.

Lots of good stuff there, enjoy!

Pivotal Labs Talk - Scaling a Rails App with Postgres 5

Posted by Tom Copeland Sat, 24 Jul 2010 01:01:00 GMT

I'm slowly catching up with my podcast backlog and came across a Pivotal Labs talk from May 2009. In this talk Josh Susser and Damon McCormick are presenting on Scaling a Rails App with Postgres . It's a little dated now - this talk was given was when PostgreSQL 8.4 was in beta - but, still, lots of good stuff. Here are some notes:
  • They started with an existing Rails app with lots of data, so they had some constraints - not greenfield development.

  • Around the 5-6 minute mark there's a good discussion of PostgreSQL's query optimizer and how it analyzes a table's data distribution. One takeaway (mentioned around 16:20) is to run vacuum more often on a particular table if there are a lot of writes.

  • 10:00 How to set STATISTICS for a particular table.

  • 11:00 Using partial indexes.

  • 14:00 Indexing on expressions.

  • 18:10-23:00 A nice discussion of the EXPLAIN output.

  • 23:45 Here they talk about wide columns. I've seen this in MySQL as well, where splitting text data out into a separate table yielded some good speedups.

  • 26:10 Some discussion of pg_bench.

  • 30:20 Discusses the PostgreSQL log analyzer pgFouine.

  • 35:30 How long does it take to add an index to large tables? They saw times of up to an hour for tables with millions of rows.

  • 36:30 clustering your data in order to get PostgreSQL to write it more efficiently.

  • 37:30-48:00 A thorough discussion of partitioning tables via table inheritance. They used an ActiveRecord model (39:23) with a bunch of utility methods. They also had a cron to periodically create new partitions. At 45:15 they make a nice distinction between using partial indexes and partitions - one advantage is that a partition's indexes can be different than its parents indexes. At 49:00 they mention maybe doing a plugin, not sure if that happened.

  • 52:00 Some discussion of full text search via tsearch.

  • 53:00 PostgreSQL's lack of built in replication outside of WAL shipping, Slony, etc. Thank goodness 9.0 will address this!

  • 54:00 Some props to Engine Yard on their PostgreSQL support.

Good stuff all around, and thanks to Pivotal for posting these great talks!

Intro to PostGIS 2

Posted by Doug Cole Thu, 05 Nov 2009 06:25:00 GMT

If you are planning on building a Rails application that uses spatial data in any way, then you owe it to yourself to take the time to investigate Postgis. Out of the box you’ll be able to perform an array of powerful functions on your spatial data: from bounding boxes and distance queries to polygon area calculations.

Installation

I won’t go into setup in too much detail, under most linux distributions postgis is a simple package install, it’s in macports if you’re a mac user, so either way installation shouldn’t be too hard. For integration with rails I recommend the GeoRuby gem and spatial_adapter. From the docs: “GeoRuby provides data types intended to hold data returned from PostGIS…”. spatial adapter is a rails plugin that extends ActiveRecord so that it understands geometric columns, transparently converting them to subclasses of the GeoRuby::Geometry class, supporting geometry columns in migrations, etc. spatial_adapter is hosted on github: http://github.com/fragility/spatial_adapter.

Data Storage

Postgis supports lots of datatypes. For the purpose of this blog post we’ll focus on just using points, all other datatypes are composed of points and most of the concepts are easily applied to the other datatypes. Points are made up of at least three points of data: x, y, and Spatial Reference System Identifier (SRID). SRIDs are used to describe the coordinate system used by the point. My gis background is poor so I’ll just suggest that if you are planning on working in lat/long values, SRID 4269 is probably the right choice for you (or maybe 4326, see the comments). The one downside of using lat/long data is it makes distance queries more difficult, 1 degree isn’t a fixed number of meters so you can’t directly ask the database for all rows within 100 meters of a given point, or at least not without having the database run a sequential scan of all rows. If people are interested in distance queries I’m happy to write about the subject.

Queries

The most basic query is the bounding box - which geometries lay within the given box? It is defined in postgis as && and is able to use spatial indexes directly so is very fast. Beyond that most queries take the form of a function and are well documented in the PostGIS documentation, but enough talk - let’s start an example.

Example

Let’s assume we have a table of restaurants that we want to display on a map. First let’s add a geometry column to the restaurants table, with spatial_adapter it is a simple migration:

add_column :restaurants, :the_geom, :point, :srid => 4269
add_index :restaurants, :the_geom, :spatial => true

If you open up a psql console and look at the table definition you will see that this adds the column the_geom with the type the_geom as well as adding three new table constraints: srid=4269, number of dimensions is 2 (postgis supports more), and the geometry type is a point. Handy! You can see we’ve also added an index, spatial_adapter adds the :spatial option to indexes to specify simplify the creation of spatial indexes.

Now to add points let’s open up a console and add geometry data to our restaurants:

r = Resaurant.first r.the_geom = Point.from_x_y(-122.39, 47.5123, 4269) r.save!

Of course in real life you aren’t going to just make up data, you’ll likely want to use a geocoding service like google’s geocoding api to determine the correct lat/long information. You’ll probably also want to store your chosen SRID in a constant somewhere, but now I’m nitpicking - let’s looks query our new data.

Restaurant.first(:conditions => ["the_geom && ?", Polygon.from_coordinates([[[x_min, y_min], [x_min, y_max], [x_max, y_max], [x_max, y_min], [x_min, y_min]]], 4269)])

Simple! Hope this helps show the basics of using PostGIS with rails. This post just barely scrapes the surface, luckily PostGIS and GeoRuby have excellent documentation, but if you have any more questions don’t hesitate to ask in the comments and I’ll try and help.

Doug Cole is the CTO of www.estately.com a real estate search website. Interested in working with Estately? Let us know!

Rails and PostgreSQL job in Honolulu

Posted by Tom Copeland Tue, 03 Nov 2009 23:20:00 GMT

Saw this advertisement for a Ruby/Rails/PostgreSQL job in Honolulu today. It's for eggup.com, which unfortunately is completely protected by a HTTP basic authentication challenge. First thing to do if you get that job - put up a nice "coming soon" page!

RailsOnPg by Alexander Tretyakov

Posted by Tom Copeland Fri, 23 Oct 2009 04:39:00 GMT

Thanks to Robby on Rails I heard about Alexander Tretyakov's interesting RailsOnPg plugin. This plugin makes it a bit easier to create PostgreSQL functions, triggers, views, and foreign keys by providing a nicer front end to calls to ActiveRecord::Base.connection.execute.

For example, here's a migration to add a foreign key to a Comment model that belongs to a User:

class AddForeignKeyFromCommentsToUsers < ActiveRecord::Migration 
  def self.up
    add_foreign_key :comments, :user_id, :users
  end
  def self.down
    remove_foreign_key :comments, :user_id, :users
  end
end

This results in the following SQL:

ALTER TABLE comments
                 ADD CONSTRAINT fk_comments_user_id
                 FOREIGN KEY (user_id)
                 REFERENCES users(id) 
                 ON UPDATE NO ACTION 
                 ON DELETE NO ACTION 

As you can see, this provides some sensible defaults and a consistent naming scheme so that you can reliably roll back a migration that created a foreign key.

I ran into some problems when creating a function; my migration failed with a PGError. Turns out that the plugin attempts to execute CREATE LANGUAGE plpgsql before it creates a function; in my case that language was already in place. I commented out line 16 of railsonpg/lib/functions.rb (the call to setlang) and everything worked fine. It looks like this need for a CREATE LANGUAGE IF NOT EXISTS (or something) has come up before, but I'm not sure what the status is. I'm using PostgreSQL 8.4.1 and that statement doesn't seem to be supported.

At any rate, this looks like a handy plugin that could remove a lot of raw SQL from your migrations. Good stuff!

PGConf West 2007 video - Best Practices with Rails and PostgreSQL

Posted by Tom Copeland Tue, 29 Sep 2009 04:20:00 GMT

This is kind of a blast from the past - it's a talk by Bricolage lead developer David Wheeler at PostgreSQL Conference West 2007. It's mainly an introduction to Rails, but David's a real PostgreSQL guru (and had a Rails app that was bought by Twitter) and thus brings out some interesting points. Here are some highlights from an initial listen:

  • Some basics on Rails competitors and philosophy. Someone in the audience mentions Grails.

  • ActiveRecord validations and callbacks. A few minutes on ActionView and ActionController

  • Migrations. This was prior to timestamped migrations, so he talks about numbered migrations. Discusses creating indexes, the lack of built support for views. Suggests using ActiveRecord::Base.connection.execute to just run raw SQL as necessary. Shows an example that uses SET DEFAULT CURRENT TIMESTAMP.

  • Talks about the araddconstraint plugin. Probably foreigner (as mentioned on Ruby5) is the current leader for adding foreign key constraints, although I haven't used it. Looks good though.

  • Demonstrates a class-level finder that uses PostgreSQL-specific SQL - specifically, the LOWER function. This talk predates named_scope, so, there ya go.

  • At around 28:00 he notes problem with using Slony and migrations - e.g., how do you get to the SQL so you can send it off to your Slony instance? In his case they just stopped using Slony and went to a warm standby, probably with WAL shipping.

  • Talks about loading large data sets using COPY. I've found that this is the right way to get lots of data in a PostgreSQL database as well. Don't waste time using models for stuff like that.

  • Discusses skinny controllers and fat models.

  • At 32:30 talks about associations and some ActiveRecord conventions. has_many :through was new stuff then, I think; he uses has_and_belongs_to_many in the example.

  • At 40:00 he talks about created_at and updated_at and time zones. He suggests that you always stores times in UTC, which is pretty standard. I think some of the possible complexities here are now built in to Rails, but I'm not sure.

  • At 44:30 he talks about reopening classes. This seems to be a new topic for the audience and he gets some pushback. Someone refers to it as "Ruby's GOTO." The class he reopens is the PostgreSQLAdapter; he plugs in his own version of quoted_date.

  • Around 50:00 he asks if someone could please update the PostgreSQL driver. Jeff Davis is in the audience and responds to him about the updates he's doing. Today this has all been taken care of as I noted in a previous post on Ruby PostgreSQL drivers.

It's a nice presentation in front of a small group, with a nice feel to it. The audio quality is decent, although the slides are a little hard to read. Also, thanks to David for the nice email regarding this post. Enjoy!

Rails and PostgreSQL job in Denver

Posted by Tom Copeland Tue, 22 Sep 2009 20:19:00 GMT

Just noticed this job posting about an opening at Zerista. It asks for Rails and PostgreSQL experience, thus the mention here. The person to contact is Charlie Savage, who's done great work on improving the libxml-ruby gem. So you'd be working with smart folks.

I googled around and didn't find any tech interviews or videos or whatever... if someone from Zerista is reading this and wants to share information about how you're using Rails and PostgreSQL in interesting ways, please post a comment or contact me!

Rails apps using PostgreSQL in production 6

Posted by Tom Copeland Thu, 17 Sep 2009 19:58:00 GMT

Occasionally I see a job description or interview where someone will mention that they're using PostgreSQL + Rails in production for some big application. I'd like to do more detailed writeups on these... but here are some that I've seen:

  • Ryan Heneise writes in to say that Donor Tools runs on PostgreSQL.

  • Doug Cole wrote in to confirm that the nifty real estate service estately.com runs on Rails + PostgreSQL. They use PostgreSQL full text search and PostGIS. Doug also noted that (as of 9/21/09) they're hiring developers, so, give them a holler if you're looking. You can also see a nifty PostGIS presentation they did at Seattle Tech Startups.

  • Mark Tremblay adds screenlight.tv to the list.

  • Nathen Harvey writes in that VisualCV is a Rails + PostgreSQL app.

  • Eric Hodel says that he's using Rails + PostgreSQL on rubypan.org. He uses PostgreSQL FTE via the texticle gem.

  • This job advertisement indicates that Yammer is using Rails with PostgreSQL. From that advertisement, they're also using PostgreSQL's full text search.

  • Heroku provides each Rails app with a PostgreSQL database. I've googled all over for more information but haven't dug up any other interesting details, although I bet there's some neat stuff going on there.

That's all that come to mind at the moment. If anyone has more details on any of these, or more examples, please let me know - would be great to have some more detailed information!