Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
Home
Discussion GroupsGeneralPHPASPPerlColdFusionFlashHTML, CSS, ScriptsBrowsers

Webmaster Forum / ASP / Database Access / October 2008



Tip: Looking for answers? Try searching our database.

Which is better... IN() or EXISTS()

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
Mike Wazowski - 07 Oct 2008 12:35 GMT
Hello

We're trying to resolve some performance issues and would be grateful for
your thoughts on the benefits of each of these methods to get the same
result.

Which is more performant?

 SELECT Fields
  FROM Table
  WHERE FKeyField IN(
     SELECT KeyField FROM OtherTable WHERE OtherValue = 'FilterValue'
  )

or...

 SELECT Fields
  FROM Table
  WHERE EXISTS(
     SELECT Anything FROM OtherTable WHERE KeyField = Table.FKeyField AND
OtherValue = 'FilterValue'
  )

Many thanks

Mike
Uri Dimant - 07 Oct 2008 12:40 GMT
Mike
I prefer EXISTS/NOT EXISTS
For example if you use NOT IN and the column has a NULL value you are about
to get a wrong result

SELECT * FROM tbl WHERE col NOT IN (SELECT col FROM anothertbl ...)

> Hello
>
[quoted text clipped - 22 lines]
>
> Mike
Frank Uray - 07 Oct 2008 12:54 GMT
Hi Mike

The meaning of IN/NOT IN is different to
EXISTS/NOT EXISTS ...

You will use IN when the SubQuery returns more than
one result. In fact, SQL Server is just joining the tables.

In you case you can also use "WHERE FKeyField = (SELECT ...)"

Regards
Frank

> Hello
>
[quoted text clipped - 22 lines]
>
> Mike
Dan Guzman - 07 Oct 2008 13:38 GMT
> The meaning of IN/NOT IN is different to
> EXISTS/NOT EXISTS ...

True but EXISTS and IN can often be expressed as a JOIN or correlated
subquery.  I get identical execution plans with all of the methods below:

CREATE TABLE dbo.MyTable(
MyTableKey int NOT NULL
 CONSTRAINT PK_MyTable PRIMARY KEY,
Fields int NOT NULL,
FKeyField int NOT NULL
);

CREATE TABLE dbo.OtherTable(
KeyField int NOT NULL
 CONSTRAINT PK_OtherTable PRIMARY KEY,
OtherValue varchar(20) NOT NULL
);

CREATE INDEX index1 ON dbo.OtherTable(
OtherValue, KeyField);

ALTER TABLE dbo.MyTable
ADD CONSTRAINT FK_MyTable_OtherTable
FOREIGN KEY (FKeyField)
REFERENCES dbo.OtherTable(KeyField);

SELECT Fields
FROM dbo.MyTable
WHERE FKeyField IN(
SELECT OtherTable.KeyField
FROM dbo.OtherTable
WHERE OtherValue = 'FilterValue'
  );

SELECT Fields
FROM dbo.MyTable
WHERE EXISTS(
SELECT *
FROM dbo.OtherTable
WHERE OtherTable.KeyField = MyTable.FKeyField AND
 OtherTable.OtherValue = 'FilterValue'
  );

SELECT Fields
FROM dbo.MyTable
JOIN dbo.OtherTable ON
OtherTable.KeyField = MyTable.FKeyField
AND OtherTable.OtherValue = 'FilterValue';

SELECT Fields
FROM dbo.MyTable
WHERE FKeyField =
(
SELECT KeyField
FROM dbo.OtherTable
WHERE
 OtherTable.OtherValue = 'FilterValue'
 AND OtherTable.KeyField = MyTable.FKeyField
  );

Signature

Hope this helps.

Dan Guzman
SQL Server MVP
http://weblogs.sqlteam.com/dang/

> Hi Mike
>
[quoted text clipped - 5 lines]
>
> In you case you can also use "WHERE FKeyField = (SELECT ...)"
Aaron Bertrand [SQL Server MVP] - 07 Oct 2008 14:11 GMT
> You will use IN when the SubQuery returns more than
> one result.

Do you mean compared to EXISTS, or compared to = ?  You can also use EXISTS
regardless of the number of results returned by the subquery.

For the OP, I prefer EXISTS for a couple of reasons.  One is the reason that
Uri pointed out; if your subquery can return NULLs, then IN/NOT IN will not
necessarily produce correct results.  The other is that EXISTS has the
possibility of performing better; it can stop "looking" once it has found
its first result, and you can also set it to return a constant (e.g. SELECT
1) instead of a column... This makes it easier to spot as a
non-data-returning element in your query.

A
Mike Wazowski - 07 Oct 2008 15:17 GMT
Thanks Aaron.

It was my thoughts too that EXISTS does not need to retrieve or return
actual data, so would perform better.

>> You will use IN when the SubQuery returns more than
>> one result.
[quoted text clipped - 15 lines]
>
> A
Dan Guzman - 07 Oct 2008 13:04 GMT
> Which is more performant?

Note that SQL is a declarative rather than a procedural language.  If the
queries are semantically the same, the SQL Server cost-based optimizer ought
to be able to come up with the same optimal execution plan in both cases.

Proper index use is the key to performance.  I suggest you view the query
execution plan (in SQL Server Management Studio, select the query text and
press ctrl-L) to ensure that indexes are used as expected.  I would expect a
composite one on KeyField and OtherValue would be optimal for the subquery
performance and perhaps one on KeyField too.

Signature

Hope this helps.

Dan Guzman
SQL Server MVP
http://weblogs.sqlteam.com/dang/

> Hello
>
[quoted text clipped - 22 lines]
>
> Mike
Mike Wazowski - 07 Oct 2008 14:14 GMT
Thank you all for your responses.

I have been advised that the first method, using IN(), results in one
subquery where the results are then used to filter the parent result set.
Allegedly using the second EXISTS method, a subquery is effectively
performed for each record in the parent result set, before any other filter
criteria is applied.

Just going to do some more research in Query Analyser.  Thanks again.

Mike

> Hello
>
[quoted text clipped - 22 lines]
>
> Mike
Aaron Bertrand [SQL Server MVP] - 07 Oct 2008 14:23 GMT
I'd love to see a case where IN performs better.  It sounds to me like your
advisor is either guessing, or making facts up, or speaking to one very
specific case... If they are looking at estimated execution plan, for
example, remind them that they should be checking statistics i/o and actual
execution plan.  There may be some cases where this is true but in general
you should always test and compare instead of building some imaginary
"always use x" rule.  The only rule is, it always depends.

> Thank you all for your responses.
>
[quoted text clipped - 7 lines]
>
> Mike
Tibor Karaszi - 07 Oct 2008 21:13 GMT
My guess is that the person who offered that explanation confused the logical query flow from the
physical execution plan. It might be true that *logically* a query is performed in this or that way,
but since we have optimizers, we, with a reasonable optimizer, tend to end up with identical
execution plan in many of the cases (some type of join)...

Signature

Tibor Karaszi, SQL Server MVP
http://www.karaszi.com/sqlserver/default.asp
http://sqlblog.com/blogs/tibor_karaszi

> I'd love to see a case where IN performs better.  It sounds to me like your
> advisor is either guessing, or making facts up, or speaking to one very
[quoted text clipped - 15 lines]
>>
>> Mike
Mike Wazowski - 07 Oct 2008 15:18 GMT
Hi Guys

Thanks again for the comments of those following this thread.

I produced execution plans for both versions and both plans are the same!  I
guess now I need to dig in to the statistics mentioned by Aaron to see if I
can get a definitive 'best method' for my DB.

Mike

> Hello
>
[quoted text clipped - 22 lines]
>
> Mike
Roy Harvey (SQL Server MVP) - 07 Oct 2008 16:32 GMT
>I produced execution plans for both versions and both plans are the same!  I
>guess now I need to dig in to the statistics mentioned by Aaron to see if I
>can get a definitive 'best method' for my DB.

One more thought on the IN vs EXISTS question.  IN only works with
single-column keys.  EXISTS works with single or multiple column keys.
My tendency is to want to use a consistent style, and EXISTS lets me
use the same approach in both cases.

Roy Harvey
Beacon Falls, CT
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2009 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.