Finding and Indexing: You May Be Doing It Wrong



Written By : Kevin Lawver


May 22, 2012

Rails makes building dynamic web apps a lot easier than it used to be. ActiveRecord lulls us into a false sense of security. We relax, thinking that ActiveRecord will always be there to do the right thing and fill our lives with SQL-scented unicorns and rainbows. Oh, but there’s a dark side to ActiveRecord, dear reader! Yes, it makes it easier to put thing in our database, get them back out, update and destroy them, but it also makes shooting yourself in the foot easier too. How, you ask? I’ll tell you!

Like I said in my last post, I’m the new guy here at Rails Machine. I’ve been here a little over a month, and in that time, I’ve seen two ActiveRecord no- no’s that I thought everyone knew not to do. So, in an effort to make the web a better place, I’m going to tell you about them so you can fix your code too.

Let’s assume that we have an ActiveRecord class called Thing. Over and over again, I see folks using Thing.find(:all) to loop through everything. This is frequently in background jobs, but I’ve seen it in controllers too. Why is this bad? Well, you could have a million things in your database, and you’ve just loaded them all into memory. It’s a potentially unbounded query, which is always a bad thing. I can’t tell you how many times I’ve seen queries like this bring otherwise hefty servers to their knees, bowed under Rails processes using up all available system memory.

So, unbounded queries are evil naughty things and we should never do them. We all agree on this now. But what should we do instead? If you really need to select all of the records in your table, you should use find_each or find_in_batches (these are both available in Rails 2.3). If you’re exposing these records to your users, you should use a gem like will_paginate that handles paginating results for you.

You might think this only applies to ActiveRecord, but it’s true for any database. I’ve seen developers do it with MongoMapper too.

Got it? Good. No more find(:all), please!

Next, we need to talk about indexes. This is a topic that can get complex fairly quickly, so I’m just going to cover the basics. I’ll go deeper into indexes in another blog post. Whenever you select anything from your database, it has to find the rows to return. Depending on what columns you select on it, you’ll either hit an index or cause the database to do a table scan. Table scans are bad. Why? Because they crawl the file system looking for rows to return, which gets slower the more records you have.

And this is the problem with both the .find(:all) and index problems - they will probably never bite you in development or testing because you’ll rarely create enough rows to trigger performance problems. It usually takes tens of thousands of rows before either issue causes problems, but when it does, the problems are large and unpleasant. So, it’s better not to have them in the first place, right?

Sorry, back to indexes. The basics rule I use for indexes is this: if I’m going to select on a column, it needs an index. So, if you have an association, that column needs an index! Let’s go back to our Thing model. Let’s say each Thing belongs to a User. That means you’ll have a user_id column. Whenever you want to get all the Things that belong to a specific user, you’ll be selecting from the things table where user_id equals that User’s id. That means you need to create an index on things.user_id!

Now, you don’t need to worry about creating indexes on your primary keys (the id column in your tables), but you should create an index for every _id you use for associations. Creating indexes is easy. From the command line in your Rails app directory run:

script/rails generate migration add_index_things_user_id

For indexes, I try to keep that naming convention: “add_index_TABLENAME_COLUMN”. Oh, for Rails 2.3, the first part of that command is script/generate. That will create a new migration file in the db/migrate that should look like this:

class AddIndexThingsUserId < ActiveRecord::Migration
  def up
  end

  def down
  end
end

Now we need to add the index. Your migration should end up looking like this:

class AddIndexThingsUserId < ActiveRecord::Migration
  def up
    add_index :things, :user_id
  end

  def down
    remove_index :things, :user_id
  end
end

Now you just need to run the migration and you should be good to go! A small warning… if your things table has tens of thousands of rows in it, this migration could take a while to complete so you should probably put your site into maintenance mode while the index is being added. Also, for some more reading on indexes, check out this great post from our own Ernie Miller

There you go, two simple things that will now never bite you after you launch your app. Will you miss some indexes? Undoubtedly. But, we’ve fixed the simple ones. And now that you’re aware that you need them, you should be able to figure out where else you need to add them!

I hope this saves you some sleepless nights trying to figure out how things got horrible and keep your app and database happy and healthy! If you’ve got questions or ideas for things you’d like us to cover in future blog posts, let us know in the comments!