使用非索引列的 SQL 计数查询

Posted 2023-03-30

技术标签:

【中文标题】使用非索引列的 SQL 计数查询【英文标题】：SQL Count Query Using Non-Index Column 【发布时间】：2013-09-05 15:54:35 【问题描述】：

我有一个类似的查询，我需要查找特定客户在某个时间范围内的交易数量：

select customer_id, count(transactions)
from transactions
where customer_id = 'FKJ90838485'
and purchase_date between '01-JAN-13' and '31-AUG-13'
group by customer_id

表 transactions 不是在 customer_id 上建立索引，而是在另一个名为 transaction_id 的字段上建立索引。 Customer_ID 是字符类型，而 transaction_id 是数字类型。

“accounting_month”字段也被索引。该字段仅存储交易发生的月份...即，purchase_date = '03-MAR-13' 将具有 accounting_month = '01-MAR-13'

事务表在“01-JAN-13”和“31-AUG-13”的时间范围内有大约 2000 万条记录

当我运行上述查询时，花了 40 多分钟才回来，有什么想法或提示吗？

【问题讨论】：

能否添加覆盖索引？这个查询 - 按原样 - 会出错。如果您想要真正的答案，请添加实际查询。虽然我会说任何答案都会涉及添加索引 - 这是你做不到的。要添加的最佳索引可能是 (customer_id, purchase_date) 上的复合索引。但正如 ypercube 所说，您的查询给出了错误...您有一个聚合函数，但没有“分组依据”子句...为什么是“计数（交易）”而不是“计数（*）”？是否还有“交易”列？联系数据库管理员。请求他在 CustomerId 上添加一个索引。请不要依赖隐式数据类型转换。 '01-JAN-13' 是字符串文字而不是日期。如果从具有不同 NLS 设置的计算机运行，这将失败。 【参考方案1】：

正如其他人已经评论的那样，最好是添加一个覆盖查询的索引，所以：

联系数据库管理员并请求他们在(customer_id, purchase_date) 上添加索引，否则查询正在执行表扫描。

旁注：

使用日期而不是字符串字面量（您可能已经知道并且已经这样做了，这里仍然为未来的读者注意）

您不必将customer_id 放入SELECT 列表中，如果您从那里删除它，它也可以从GROUP BY 中删除，因此查询变为：

select count(*) as number_of_transactions
from transactions
where customer_id = 'FKJ90838485'
  and purchase_date between DATE '2013-01-01' and DATE '2013-08-31' ;

如果您在customer_id 上没有WHERE 条件，您可以将它放在GROUP BY 和SELECT 列表中，以编写一个查询来计算每个客户的交易数量。上述建议的索引也将对此有所帮助：

select customer_id, count(*) as number_of_transactions
from transactions
where purchase_date between DATE '2013-01-01' and DATE '2013-08-31' 
group by customer_id  ;

【讨论】：

【参考方案2】：

这只是我想到的一个想法。它可能会起作用，请尝试运行它，看看它是否比您当前拥有的有所改进。

我正在尝试尽可能多地使用您所说的已编入索引的transaction_id。

WITH min_transaction (tran_id)
AS (
   SELECT MIN(transaction_ID)
   FROM TRANSACTIONS
   WHERE
      CUSTOMER_ID = 'FKJ90838485'
      AND purchase_date >= '01-JAN-13'
   ), max_transaction (tran_id)
AS (
   SELECT MAX(transaction_ID)
   FROM TRANSACTIONS
   WHERE 
      CUSTOMER_ID = 'FKJ90838485'
      AND purchase_date <= '31-AUG-13'
   )
SELECT customer_id, count(transaction_id)
FROM transactions
WHERE
   transaction_id BETWEEN min_transaction.tran_id AND max_transaction.tran_id
GROUP BY customer_ID

【讨论】：

为什么都是1 = 1s？我希望它运行得更慢，因为我假设这两个 CTE 将无法使用索引，因此您最终会得到两个表扫描而不是一个。而且1=1 完全没用（我知道这是生成代码的自定义习惯，但在手写代码中根本没有必要）我也是这么想的。如果您无法添加新索引，请查看是否有任何方法可以使用 customer_ID 上的现有索引来获取 transaction_ids 的不同列表（或上述范围）。 @CharlesBretana 只是出于习惯。 1=1 是一个简洁的快捷方式，可让您评论输入和输出参数，而无需解决缺失的 AND 等问题【参考方案3】：

这可能会运行得更快，因为它查看transaction_id 的范围而不是purchase_date。我还考虑到 accounting_month 已编入索引：

select customer_id, count(*)
from transactions
where customer_id = 'FKJ90838485'
and transaction_id between (select min(transaction_id)
                            from transactions
                           where accounting_month = '01-JAN-13' 
                           )  and
                           (select max(transaction_id)
                            from transactions
                           where accounting_month = '01-AUG-13' 
                           ) 
group by customer_id

也许你也可以试试：

select customer_id, count(*)
from transactions
where customer_id = 'FKJ90838485'
and accounting_month between '01-JAN-13' and '01-AUG-13'
group by customer_id

【讨论】：

那是Oracle，很可能没有名为"transaction_id"的列（只有一个名称transaction_id或"TRANSACTION_ID"）假设交易只能按时间顺序输入。 @GarethD 我们只能假设我们从 OP 获得的东西此外，您还需要对事务进行 2 次表扫描才能获得最小和最大 ID。正如我所说：transaction_id 与 Oracle 中的 "transaction_id" 不同。双引号使其区分大小写。但是在 Oracle 中，未加引号的名称会被折叠成大写。因此，如果 transcation_id 适用于 OP，则真正的列名是 TRANSACTION_ID 并且 "transaction_id" 将不适用于 OP。示例见此处：sqlfiddle.com/#!4/ab2ce/1

以上是关于使用非索引列的 SQL 计数查询的主要内容，如果未能解决你的问题，请参考以下文章