sql-server – 检查是否存在EXISTS优于COUNT! ……不是吗?
|
副标题[/!--empirenews.page--]
我经常阅读当必须检查行的存在时,应始终使用EXISTS而不是COUNT. 然而,在最近的几个场景中,我测量了使用计数时的性能提升. LEFT JOIN (
SELECT
someID,COUNT(*)
FROM someTable
GROUP BY someID
) AS Alias ON (
Alias.someID = mainTable.ID
)
我不熟悉告诉SQL Server“内部”发生了什么的方法,所以我想知道是否存在一个带有EXISTS的无法解决的缺陷,这对我已经完成的测量非常有意义(可以说是RBAR吗?!). 你对这种现象有一些解释吗? 编辑: 这是您可以运行的完整脚本: SET NOCOUNT ON
SET STATISTICS IO OFF
DECLARE @tmp1 TABLE (
ID INT UNIQUE
)
DECLARE @tmp2 TABLE (
ID INT,X INT IDENTITY,UNIQUE (ID,X)
)
; WITH T(n) AS (
SELECT
ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM master.dbo.spt_values AS S
),tally(n) AS (
SELECT
T2.n * 100 + T1.n
FROM T AS T1
CROSS JOIN T AS T2
WHERE T1.n <= 100
AND T2.n <= 100
)
INSERT @tmp1
SELECT n
FROM tally AS T1
WHERE n < 10000
; WITH T(n) AS (
SELECT
ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM master.dbo.spt_values AS S
),tally(n) AS (
SELECT
T2.n * 100 + T1.n
FROM T AS T1
CROSS JOIN T AS T2
WHERE T1.n <= 100
AND T2.n <= 100
)
INSERT @tmp2
SELECT T1.n
FROM tally AS T1
CROSS JOIN T AS T2
WHERE T1.n < 10000
AND T1.n % 3 <> 0
AND T2.n < 1 + T1.n % 15
PRINT '
COUNT Version:
'
WAITFOR DELAY '00:00:01'
SET STATISTICS IO ON
SET STATISTICS TIME ON
SELECT
T1.ID,CASE WHEN n > 0 THEN 1 ELSE 0 END AS DoesExist
FROM @tmp1 AS T1
LEFT JOIN (
SELECT
T2.ID,COUNT(*) AS n
FROM @tmp2 AS T2
GROUP BY T2.ID
) AS T2 ON (
T2.ID = T1.ID
)
WHERE T1.ID BETWEEN 5000 AND 7000
OPTION (RECOMPILE) -- Required since table are filled within the same scope
SET STATISTICS TIME OFF
PRINT '
EXISTS Version:'
WAITFOR DELAY '00:00:01'
SET STATISTICS TIME ON
SELECT
T1.ID,CASE WHEN EXISTS (
SELECT 1
FROM @tmp2 AS T2
WHERE T2.ID = T1.ID
) THEN 1 ELSE 0 END AS DoesExist
FROM @tmp1 AS T1
WHERE T1.ID BETWEEN 5000 AND 7000
OPTION (RECOMPILE) -- Required since table are filled within the same scope
SET STATISTICS TIME OFF
在SQL Server 2008R2(七个64位)上我得到了这个结果 COUNT版本:
EXISTS版本:
解决方法
任何事情都是非常罕见的,特别是涉及到数据库时.在SQL中有许多表达相同语义的方法.如果有一个有用的经验法则,可能是使用最自然的语法编写查询(并且,是的,这是主观的),并且只有在您获得的查询计划或性能不可接受时才考虑重写. 对于它的价值,我自己对这个问题的看法是存在查询最自然地使用EXISTS来表达.这也是我的经验,EXISTS tends to optimize better比OUTER JOIN拒绝NULL替代.使用COUNT(*)和过滤= 0是另一种选择,恰好在SQL Server查询优化器中有一些支持,但我个人发现这在更复杂的查询中是不可靠的.无论如何,对于我来说,EXISTS似乎比任何一种替代品更自然.
您的特定示例很有趣,因为它突出了优化程序处理CASE表达式(特别是EXISTS测试)中的子查询的方式. CASE表达式中的子查询 考虑以下(完全合法的)查询: DECLARE @Base AS TABLE (a integer NULL);
DECLARE @When AS TABLE (b integer NULL);
DECLARE @Then AS TABLE (c integer NULL);
DECLARE @Else AS TABLE (d integer NULL);
SELECT
CASE
WHEN (SELECT W.b FROM @When AS W) = 1
THEN (SELECT T.c FROM @Then AS T)
ELSE (SELECT E.d FROM @Else AS E)
END
FROM @Base AS B;
semantics of 只有当传递谓词返回false时,才会计算嵌套循环连接的内侧.总体效果是CASE表达式按顺序进行测试,并且仅在没有满足先前表达式的情况下才评估子查询. 带有EXISTS子查询的CASE表达式 在CASE子查询使用EXISTS的情况下,逻辑存在测试实现为半连接,但是在后面的子句需要时,必须保留通常被半连接拒绝的行.流经这种特殊类型的半连接的行获取一个标志,以指示半连接是否找到匹配.此标志称为探测列. 实现的细节是逻辑子查询被相关联接(‘apply’)替换为探测列.该工作由查询优化器中的简化规则执行,该规则称为RemoveSubqInPrj(在投影中删除子查询).我们可以使用跟踪标志8606查看详细信息: SELECT
T1.ID,CASE
WHEN EXISTS
(
SELECT 1
FROM #T2 AS T2
WHERE T2.ID = T1.ID
) THEN 1
ELSE 0
END AS DoesExist
FROM #T1 AS T1
WHERE T1.ID BETWEEN 5000 AND 7000
OPTION (QUERYTRACEON 3604,QUERYTRACEON 8606);
(编辑:邯郸站长网) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |

