Sql server Blog Forum Helping SQL server DBAs and Developers

5Oct/1134

Merge join Vs Hash join Vs Nested loop join

 

This month’s TSQL Tuesday party is being hosted by Stuart R Ainsworth (Blog| Twitter). I am very glad to write my first blog post as t-SQL Tuesday post on my newly designed website.

SQL server has three types of internal joins. I know most of folks never heard this join type because it’s not logical join and it’s not often used in their codes.

Then, when it will be used?

Well the answer is “it depends”.

This means it depends upon the record sets and indexes. The query optimizer will be smart and always try to pick up the most optimal physical joins. As we know SQL optimizer creates a plan cost based and depends upon the query cost it will choose the best join.

How the query optimizer will choose the join type internally?

Well, there is some algorithm has written internally for the query optimizer to choose the join type.

Let’s go for some practical examples and will finally summarize it.
First I will give some basic idea how the join will work and when/How the optimizer will decide to use anyone of the internal join (Physical join).

• It depends upon the table size
• It depends upon the index on the join column
• It depends upon the Sorted order on the join column

Test:

The test has done by following configuration.

RAM: 4 GB
Server : SQL server 2008 (RTM)

 

create table tableA (id int identity ,name varchar(50))
declare @i int
set @i=0
while (@i<100)
begin
insert into tableA (name)
select name from master.dbo.spt_values
set @i=@i+1
end
--select COUNT(*) from dbo.tableA --250600
go
create table tableB (id int identity ,name varchar(50))
declare @i int
set @i=0
while (@i<100)
begin
insert into tableB (name)
select name from master.dbo.spt_values
set @i=@i+1
end
-- select COUNT(*) from dbo.tableB --250600
select * from dbo.tableA A join tableB B
on (a.id=b.id)

 

Test1: Without Index

Let's create a clustered index

create unique clustered index cx_tableA on tableA (id)
create unique clustered index cx_tableB on tableB (id)

Test1: With Index

If either of the table has indexed then it goes hash join. I haven’t shown this picture here. You can drop either of the table indexes and test it.

Test2: Without Index

Let's create a medium table

 

create table tableC (id int identity,name varchar(50))
insert into tableC (name)
select name from master.dbo.spt_values
-- select COUNT(*) from dbo.tableC --2506
create table tableD (id int identity,name varchar(50))
insert into tableD (name)
select name from master.dbo.spt_values
select * from dbo.tableC C join tableD D
on (C.id=D.id)
-- select COUNT(*) from dbo.tableD --2506

 

Test2: With Index

Let's create a clustered index

create unique clustered index cx_tableC on tableC (id)
create unique clustered index cx_tableD on tableD (id)

If either of the table has indexed then it goes merge join. I haven’t shown this picture here. You can drop either of the table indexes and test it.

Test3: Without Index

 

create table tableE (id int identity,name varchar(50))
insert into tableE (name)
select top 10 name from master.dbo.spt_values
-- select COUNT(*) from dbo.tableE --10
create table tableF (id int identity,name varchar(50))
insert into tableF (name)
select top 10 name from master.dbo.spt_values
-- select COUNT(*) from dbo.tableF --10

 

Let's create a clustered index

 

create unique clustered index cx_tableE on tableE (id)
create unique clustered index cx_tableF on tableF (id)

Test3: With Index

If either of the table has indexed then it goes Nested loop join. I haven’t shown this picture here. You can drop either of the table indexes and test it.

You can also join tables vice versa like big table Vs Medium table Vs small table

select * from dbo.tableA A join tableC C
on (a.id=C.id)

select * from dbo.tableA A join tableE E
on (a.id=E.id)

select * from dbo.tableC C join tableE E
on (C.id=E.id)

 

 

In this case if all the table has indexed then it goes Nested loop join. If they don’t then hash join. If either of the table has indexed then it goes Nested loop join. I haven’t shown this picture here.

Still you can force optimizer to use any one of the internal joins, but it's not good practice. The query optimizer is smart it will dynamically choose the best one.

Here just I used the merge hint so the optimizer goes to merge join instead of a hash join (Test1 without an index)

 

select * from dbo.tableA A join tableB B
on (A.id=B.id)option (merge join)

select * from dbo.tableA A inner merge join tableB B
on (A.id=B.id)

 

 

 

Table 1: Test uses a unique clustered index

From the table diagram:

  • If both the tables have NO index then the query optimizer will choose “Hash joins” internally.
  • If both the tables have indexes then the query optimizer will choose “Merge (For big tables) /Nested loop (For small tables)” internally.
  • If either of the tables have indexes then the query optimizer will choose “Merge (For medium tables) /Hash (For big tables) /Nested loop (For small & big Vs small tables)” internally.

Table 1: Test using clustered index

(create clustered index cx_tableA on tableA (id))
Table size With index (Both) Without Index(Both) Either of table has index
Big (Both) HASH HASH HASH
Medium (Both) HASH HASH HASH
Small (Both) NESTED LOOP NESTED LOOP HASH
Big Vs Small(medium) HASH HASH HASH

 

 

From the table diagram:

This test has done using without a unique clustered index. See if the index created without unique keyword then there is no guarantee SQL will not know its unique (data) so it will create 4-byte integer GUID unique-identifier by default.

Look the diagram there is no MERGE join if the clustered index created without unique.

Thanks @Dave for your email 🙂 . Now the second diagram added.

 

Conclusion:

Merge Join

Merge join is possible for the tables have an index on the join column. The index either clustered or covering non-clustered index. It's best join for this circumstance. Why it’s best join? Because it needs an index for both the tables. So it’s already sorted and easily match and return the data.

Hash Join

Hash joins is possible for tables with no index (or) either of the big tables has indexed. It's best join for this circumstance. Why it’s best join? Because it’s worked great for big tables with no index and run the query parallel (more than one processor) and give the best performance. Most of folk says its heavy lifter join.

Nested loop Join

Nested loop join is possible for small tables with index (or) either of the big tables have indexed. It's best join for this circumstance. Why it’s best join? Because it works great for small tables like, compares each row from one table to each row from the other table ‘looping’.

For more: Read Gail Shaw’s blog. It has Craig Freedman’s link too. Why I gave Gail’s link instead of directly give Craig‘s link. It’s worth reading and most of my readers are from India and they have to read all her posts.

I hope now you can understand how the query optimizer will choose the most optimal join types.

Muthukkumaran Kaliyamoorthy

I’m currently working as a SQL server DBA in one of the top MNC. I’m passionate about SQL Server And I’m specialized in Administration and Performance tuning. I’m an active member of SQL server Central and MSDN forum. I also write articles in SQL server Central. For more Click here

More Posts

Comments (34) Trackbacks (0)
  1. Hi, hope you don’t mind me using your comment form to get in touch? It seemed the quickest and most pertinent method. As a fellow blogger, I couldn’t help noticing your site isn’t optimised for mobile users. Did you know the mobile market is 100 times bigger than the whole of the internet? There’s a video here that explains it all pretty well: http://ow.ly/6M4OF
    If you have a few minutes, do have a look, I think you might be surprised at what you’re missing.

  2. Thanks for the post.

  3. It is hard to find knowledgeable people on this specific topic, but you sound like you know very well what you’re writing about! Cheers

  4. awesome blog, do you have twitter or facebook? i will bookmark this page thanks.

  5. Hello muthukkumaran

    Thanks for the cool post. I run your scripts but its seems missed some lines in your first test script. Fix it.

  6. Test1: With Index

    I wrote a while loop as same like you but for me it shows hash join.

    I want to confirm. can you fix your script!

  7. Excellent! it is the word that comes to my mind when I look at your blog and when I read what you have to say. Your opinions are very much alike mine. I happy that I decided to enter your website and find out what you are interested in.

  8. Hi Thanks! Mine 2008 r2.

    I have tested the same code but I created clustered index without unique and row counts are vary (250800).

    Yep you are right if I created the clustered index as unique it shows merge.

  9. Unusual icons on my desktop?

  10. I really liked your blog! It helped me alot… Awesome. Exactly what I was looking for. Thanks!

  11. whoah this blog is fantastic i love reading your articles. Keep up the good work! You know, a lot of people are looking around for this info, you can aid them greatly.

  12. Thanks Really Very good Explanation..

  13. Very useful article, well written and informative – thanks !

  14. Hey very nice info…just learned abt them today and when tried more info for knowledge, was routed to ur blog..I really liked the comparison idea..good article..thanks for sharing the knowledge.

  15. Awesome blog…

  16. Hi, Muthukkumaran kaliyamoorthy
    I am much thankful of this post. You solve my confusions. God bless you.

  17. awesome

  18. Good one with explanation using example,help to understand clearly,Thank U 🙂


Leave a comment

No trackbacks yet.