homeASCIIcasts

181: Include vs Joins 

(view original Railscast)

Other translations: Cn It

The find method in ActiveRecord can take a number of different options. Two of these options, the ones we’ll focus on in this episode, are include and joins. These are often confused as they perform similar tasks, but there are key times when we should choose one over the other.

To explain the differences between include and joins we’ll use a Rails application in which users can write comments. The two models are joined with a standard has_many / belongs_to relationship.

class User < ActiveRecord::Base
  has_many :comments
end
class Comment < ActiveRecord::Base
  belongs_to :user
end

The application has a page that shows all of the comments that have been made. For each comment the author’s name is shown, along with the text “(admin)” if that user is an admin.

The application's comments page.

Let’s say that we want to change the page to show only the comments that have been written by admin users. To do so we’ll have to modify the find in the index action of the CommentsController.

def index
  @comments = Comment.all(:order => "comments.created_at desc")
end

We want to filter the comments on an attribute in the users table so we’ll have to join the users table into a single query with our Comments find. But should we use joins or include for this? To work it out we’ll compare the two in the console.

First we’ll try using joins, adding a condition so that only the admin users are returned.

>> c = Comment.all(:joins => :user, :conditions => { :users => { :admin => true } })  Comment Load (1.2ms)   SELECT "comments".* FROM "comments" INNER JOIN "users" ON "users".id = "comments".user_id WHERE ("users"."admin" = 't') 
+----+-----------------------+---------+-----------------------+-----------------------+
| id | content               | user_id | created_at            | updated_at            |
+----+-----------------------+---------+-----------------------+-----------------------+
| 3  | Some people, when ... | 1       | 2009-09-28 19:00:3... | 2009-09-28 19:00:3... |
| 5  | Write the code as ... | 2       | 2009-09-28 19:44:0... | 2009-09-28 19:44:0... |
| 6  | Walking on water a... | 1       | 2009-09-28 19:46:2... | 2009-09-28 19:46:2... |
| 8  | It should be noted... | 2       | 2009-09-28 19:49:3... | 2009-09-28 19:49:3... |
+----+-----------------------+---------+-----------------------+-----------------------+
4 rows in set

This performs a single query that only returns attributes from the comments table. If we try to get the first comment’s User then another database call will have to be made as none of the users’ attributes were retrieved by the query above.

>> c.first.user
  User Load (0.3ms)   SELECT * FROM "users" WHERE ("users"."id" = 1) 
+----+--------+-------+-------------------------+-------------------------+
| id | name   | admin | created_at              | updated_at              |
+----+--------+-------+-------------------------+-------------------------+
| 1  | Eifion | true  | 2009-09-28 18:51:53 UTC | 2009-09-28 18:51:53 UTC |
+----+--------+-------+-------------------------+-------------------------+
1 row in set

So let’s try doing the same thing again, but with include instead of joins.

>> c = Comment.all(:include => :user, :conditions => { :users => { :admin => true } })
  Comment Load Including Associations (0.7ms)   SELECT "comments"."id" AS t0_r0, "comments"."content" AS t0_r1, "comments"."user_id" AS t0_r2, "comments"."created_at" AS t0_r3, "comments"."updated_at" AS t0_r4, "users"."id" AS t1_r0, "users"."name" AS t1_r1, "users"."admin" AS t1_r2, "users"."created_at" AS t1_r3, "users"."updated_at" AS t1_r4 FROM "comments" LEFT OUTER JOIN "users" ON "users".id = "comments".user_id WHERE ("users"."admin" = 't') 
+----+-----------------------+---------+-----------------------+-----------------------+
| id | content               | user_id | created_at            | updated_at            |
+----+-----------------------+---------+-----------------------+-----------------------+
| 3  | Some people, when ... | 1       | 2009-09-28 19:00:3... | 2009-09-28 19:00:3... |
| 5  | Write the code as ... | 2       | 2009-09-28 19:44:0... | 2009-09-28 19:44:0... |
| 6  | Walking on water a... | 1       | 2009-09-28 19:46:2... | 2009-09-28 19:46:2... |
| 8  | It should be noted... | 2       | 2009-09-28 19:49:3... | 2009-09-28 19:49:3... |
+----+-----------------------+---------+-----------------------+-----------------------+
4 rows in set

This time a much more complicated SELECT query is run and columns from both the comments and users tables are retrieved. The related User models are stored in memory so when we get the first comment’s User this time there’s no need to make another database call.

>> c.first.user
+----+--------+-------+-------------------------+-------------------------+
| id | name   | admin | created_at              | updated_at              |
+----+--------+-------+-------------------------+-------------------------+
| 1  | Eifion | true  | 2009-09-28 18:51:53 UTC | 2009-09-28 18:51:53 UTC |
+----+--------+-------+-------------------------+-------------------------+
1 row in set

Changing The Comments Page

Now that we have some understanding of the differences between include and joins, which one should we choose for our comments page? The question we need to ask is “are we using any of the related model’s attributes?” In our case the answer is “yes” as we’re showing the user’s name against each comment. This means that we want to get the users at the same time as we retrieve the comments and so we should be using include here.

Back in our CommentsController we’ll modify the index action so that it gets the users along with the comments.

def index
  @comments = Comment.all(:include => :user, :conditions => { :users => { :admin => true} }, :order => "comments.created_at desc")
end

This find is looking a little complex now, so we’d probably move parts of it into a named scope if this was an application we were putting into production, but we’ll leave it as it is.

What if the situation was a little different and we weren’t showing the users’ names with their comments? Well let’s see. First we’ll remove the part that shows the user’s name (and whether they’re an admin) from the comment partial.

<div class="comment">
  <%= simple_format comment.content %>
  <p class="author">
    <%= h comment.user.name %>
    <% if comment.user.admin? %>(admin)<% end %>
  </p>
  <p class="actions">
    <%= link_to "edit", edit_comment_path(comment) %> |
    <%= link_to "destroy", comment, :method => :delete, :confirm => "Are you sure?" %>
  </p>
</div>

Reloading the page now shows that the user names have disappeared.

The comments page with the users’s names removed.

We aren’t displaying any user information on the page any more so our include is now rather inefficient as we’re getting all of the related User information but not using it. In this case the correct option to use is joins; this way we’re not getting User information we don’t need. All we have to do is replace include with joins in the find.

def index
  @comments = Comment.all(:joins => :user, :conditions => { :users => { :admin => true} }, :order => "comments.created_at desc")
end

This way we’re only using the users table to perform conditions on so our comments page will be more performance and memory efficient.

Another Example

Let’s take another look at the SQL statement that was generated when we executed our find with include in the console earlier.

SELECT "comments"."id" AS t0_r0, "comments"."content" AS t0_r1, "comments"."user_id" AS t0_r2, "comments"."created_at" AS t0_r3, "comments"."updated_at" AS t0_r4, "users"."id" AS t1_r0, "users"."name" AS t1_r1, "users"."admin" AS t1_r2, "users"."created_at" AS t1_r3, "users"."updated_at" AS t1_r4 FROM "comments" LEFT OUTER JOIN "users" ON "users".id = "comments".user_id WHERE ("users"."admin" = 't')

This statement is fairly complicated as it gets each column from both the comments and users table, renaming them all. This means that the include option doesn’t really work with the select option as we don’t have control over how the first part of the SELECT statement is generated. If you need control over the fields in the SELECT then you should use joins over include.

So where might this be useful? On our users page we have a list of the users and, against each user, a count of the number of the comments that user has made.

The users page.

In the view file for the page the number of comments is retrieved using this line of code.

<%= pluralize user.comments.count, "comment" %>

This code will perform a separate query for every user in the list, which is not ideal. It would be better if we could retrieve the number of comments at the same time we get the rest of the information about the users.

We can do this by using joins with the select option. We’ll use the console again to demonstrate how.

This time we’ll fetch all of the users and use joins to join with the comments table. We’ll use the select option to restrict the columns to all of the columns from the users table and to get a count of the comments, and also the group option that ActiveRecord provides to group the results by each user’s id.

>> User.all(:joins => :comments, :select => "users.*, count(comments.id) as comments_count", :group => "users.id")

When we run this we get all of the details for the users and a count of the number of comments each one has made.

User Load (1.3ms)   SELECT users.*, count(comments.id) as comments_count FROM "users" INNER JOIN "comments" ON comments.user_id = users.id GROUP BY users.id
+----+--------+-------+------------------+------------------+----------------+
| id | name   | admin | created_at       | updated_at       | comments_count |
+----+--------+-------+------------------+------------------+----------------+
| 1  | Eifion | true  | 2009-09-28 18... | 2009-09-28 18... | 2              |
| 2  | Susan  | true  | 2009-09-28 18... | 2009-09-28 18... | 2              |
| 3  | Paul   | false | 2009-09-28 18... | 2009-09-28 18... | 3              |
| 4  | John   | false | 2009-09-28 18... | 2009-09-28 18... | 1              |
+----+--------+-------+------------------+------------------+----------------+
4 rows in set

Now that we can get the users and the number of comments each one has made in one query we can modify the users index page to use our new query. We’ll need to make two small changes. In the controller we’ll replace User.all with our new find.

def index
  @users = User.all(:joins => :comments, :select => "users.*, count(comments.id) as comments_count", :group => "users.id")
end

And in the index view we can use the comments_count field to display the number of comments each user has written, replacing the comments.count code that caused another query to be made for each user on the page.

<%= pluralize user.comments_count, "comment" %>

The users page will look exactly the same when we reload it now, but will be much more efficient in the way it accesses the database as it’s performing everything in a single query.

Another Use For joins

We’ll finish this episode by showing another good candidate for using joins over include. Below are the User and Comment models we’ve been using, along with two new ones: Group and Membership.

class Group < ActiveRecord::Base
  has_many :memberships
  has_many :users, :through => :memberships
end

class Membership < ActiveRecord::Base
  belongs_to :user
  belongs_to :group
end

class User < ActiveRecord::Base
  has_many :memberships
  has_many :groups, :through => :memberships
  has_many :comments
end

class Comment < ActiveRecord::Base
  belongs_to :user
end

In this setup User and Group have a many-to-many relationship with each other through Membership. We want to show all of the comments that have been made by users who belong to a specific group. Ideally we’d want some kind of association between Group and Comment, maybe something like this.

class Group < ActiveRecord::Base
  has_many :membership
  has_many :users, :through => :memberships
  has_many :comments, :through => :users
end

Ruby doesn’t support nested has_many :through associations like this, though, so we’ll need to find another way to do this and happily we can use joins again here.

Here’s what the page we’re building, the GroupController’s show view, looks like so far. We have a list of the groups members but we’ve not yet added their comments.

The groups page.

We’ll go back to the console again to work out the code we’ll need to display the comments. First we’ll get our group.

>> g = Group.first
  Group Load (0.4ms)   SELECT * FROM "groups" LIMIT 1
+----+------------------+-------------------------+-------------------------+
| id | name             | created_at              | updated_at              |
+----+------------------+-------------------------+-------------------------+
| 1  | Musician's Guild | 2009-10-01 20:09:11 UTC | 2009-10-01 20:09:11 UTC |
+----+------------------+-------------------------+-------------------------+
1 row in set

And then we’ll use joins to get the comments for the group’s members. We need to join the users and memberships tables so we can join both the User and Memberships associations. Then we can add conditions and restrict the memberships to those whose group_id is the id of our group.

>> Comment.all(:joins => { :user => :memberships }, 
:conditions => { :memberships => { :group_id => g.id } } )
  Comment Load (0.7ms)   SELECT "comments".* FROM "comments" 
INNER JOIN "users" ON "users".id = "comments".user_id 
INNER JOIN "memberships" ON memberships.user_id = users.id 
WHERE ("memberships"."group_id" = 1) 
+----+--------------------+---------+--------------------+--------------------+
| id | content            | user_id | created_at         | updated_at         |
+----+--------------------+---------+--------------------+--------------------+
| 1  | I have always w... | 3       | 2009-09-28 18:5... | 2009-09-28 18:5... |
| 3  | Some people, wh... | 1       | 2009-09-28 19:0... | 2009-09-28 19:0... |
| 4  | Java is to Java... | 3       | 2009-09-28 19:0... | 2009-09-28 19:0... |
| 5  | Write the code ... | 2       | 2009-09-28 19:4... | 2009-09-28 19:4... |
| 6  | Walking on wate... | 1       | 2009-09-28 19:4... | 2009-09-28 19:4... |
| 7  | Never trust a c... | 3       | 2009-09-28 19:4... | 2009-09-28 19:4... |
| 8  | It should be no... | 2       | 2009-09-28 19:4... | 2009-09-28 19:4... |
+----+--------------------+---------+--------------------+--------------------+
7 rows in set

Running the query shows all of the comments from the users in our group so we can use it to finish off our group page. We’ll use it in a new comments method in our Group model.

class Group < ActiveRecord::Base
  has_many :memberships
  has_many :users, :through => :memberships

  def comments
    Comment.all(:joins => { :user => :memberships}, :conditions => { :memberships => { :group_id => id } } )
  end
end

We still need to display the comments on our page so we’ll have to update our view. We already have a comment partial so all we need to do is render the comments.

<h2>Comments</h2>
<%= render @group.comments %>

If we reload the page now we’ll see the comments below the list of users.

The groups page with the comments added.

Looking again at our Group model’s comments method it would appear to work if we replace the joins with include and these two options often appear to be interchangeable. Bear in mind though that using include here would load the user and memberships associated models into memory which in this case we don’t want.

Our comments method is almost creating another association and in these cases we can used scoped instead of all. This will behave almost like a named scope, but one which is generated dynamically. The advantage is that we can then chain other scopes to it to further limit and change the limit of the find.

If you found this episode useful and you’d like more information on performing ActiveRecord queries Ryan Bates has produced a series of screencasts called “Everyday Active Record” that go into more depth on the areas we’ve covered.