查询多表时如何准确使用聚合函数?
Posted
技术标签:
【中文标题】查询多表时如何准确使用聚合函数?【英文标题】:How to accurately use aggregate functions when querying multiple tables? 【发布时间】:2017-10-11 13:51:39 【问题描述】:编写一个包含聚合函数、查询多个表并得到准确数字的查询并希望我能得到一些帮助,这比我预期的要难。
SQL Fiddle
类别表示例
我需要报告的类别是:
|----|-----------|
| id | name |
|----|-----------|
| 1 | furniture |
| 2 | music |
| 3 | kitchen |
| 4 | adventure |
|----|-----------|
产品表示例
产品表示例:
|-----|----------------|-------------|
| id | name | category_id |
|-----|----------------|-------------|
| 101 | couch | 1 |
| 102 | chair | 1 |
| 103 | drum | 2 |
| 104 | flute | 2 |
| 105 | pot | 3 |
| 106 | pan | 3 |
| 107 | kitchen sink | 3 |
| 108 | unicorn saddle | 4 |
| 109 | unicorn shoes | 4 |
| 110 | horse shampoo | 4 |
|-----|----------------|-------------|
活动表示例
我们要在活动表中求和(按类别)的视图数据:
|----|------------|-------|
| id | product_id | views |
|----|------------|-------|
| 1 | 101 | 1000 |
| 2 | 102 | 2000 |
| 3 | 103 | 3000 |
| 4 | 104 | 4000 |
| 5 | 105 | 5000 |
| 6 | 106 | 6000 |
| 7 | 107 | 7000 |
| 8 | 108 | 8000 |
| 9 | 109 | 9000 |
| 10 | 110 | 10000 |
|----|------------|-------|
销售表示例
我们要放置的销售表查询平均销售额(再次按类别)。请注意,vendor_id 很重要,因为单个产品可以由多个供应商携带。我省略了供应商表,因为这个问题不需要它(我们可以在后面的示例中使用供应商 ID 进行查询)。
|----|------------|-----------|--------|
| id | product_id | vendor_id | amount |
|----|------------|-----------|--------|
| 1 | 101 | 1 | 1000 |
| 2 | 102 | 1 | 900 |
| 3 | 103 | 1 | 2000 |
| 4 | 105 | 1 | 3000 |
| 5 | 107 | 1 | 5000 |
| 6 | 101 | 2 | 600 |
| 7 | 103 | 2 | 7000 |
| 8 | 105 | 2 | 8000 |
| 9 | 107 | 2 | 1000 |
| 10 | 108 | 1 | 500 |
| 11 | 109 | 1 | 600 |
| 12 | 108 | 2 | 400 |
| 13 | 109 | 2 | 500 |
|----|------------|-----------|--------|
期望的输出
以下是所需的输出:
**请注意,有些供应商不携带某些产品,因此,意味着没有平均销售额。或者,换句话说,在产品表中找到的某些产品的销售表中没有记录(例如没有供应商携带马洗发水)。出于这个原因,我想确保我使用的任何平均值或总和实际上是准确的。具体来说,如果是 **.
|-----------|----------------|-----------|---------------|-------------------------------|-------------------------|
| category | count_products | sum_views | average_sales | sum_views_where_sales_=>_1000 | sum_views_sales_<_1000 |
|-----------|----------------|-----------|---------------|-------------------------------|-------------------------|
| adventure | 3 | 27000 | 500 | 0 | 27000 |
| furniture | 2 | 3000 | 833 | 0 | 3000 |
| kitchen | 3 | 18000 | 3000 | 6000 | 12000 |
| music | 2 | 7000 | 5000 | 7000 | 0 |
|-----------|----------------|-----------|---------------|-------------------------------|-------------------------|
当前查询状态
首先获得准确的产品和浏览次数:
SELECT cat.name AS category,
count(distinct p.name) AS product,
sum(a.views) AS views
FROM
category AS cat,
product AS p,
activity AS a
WHERE
cat.id=p.category_id
AND
p.id=a.product_id
GROUP BY
category;
旁注:我希望不必在上述查询中使用 distinct。这里的任何想法都会很棒。
按类别显示视图的准确结果:
|-----------|---------|-------|
| category | product | views |
|-----------|---------|-------|
| Adventure | 3 | 27000 |
| Furniture | 2 | 3000 |
| Kitchen | 3 | 18000 |
| Music | 2 | 7000 |
|-----------|---------|-------|
在我开始加入其他表之前,一切看起来都不错:
SELECT cat.name AS category,
count(distinct p.name) AS product,
sum(a.views) AS views,
round(avg(s.amount)) AS sales_amount
FROM
category AS cat,
product AS p,
activity AS a,
sales AS s
WHERE
cat.id=p.category_id
AND
p.id=a.product_id
AND
p.id=s.product_id
AND
s.vendor_id=1
GROUP BY
category;
问题输出
|-----------|---------|-------|------------------|
| category | product | views | avg_sales_amount |
|-----------|---------|-------|------------------|
| Adventure | 2 | 17000 | 550 |
| Furniture | 2 | 3000 | 950 |
| Kitchen | 2 | 12000 | 4000 |
| Music | 1 | 3000 | 2000 |
|-----------|---------|-------|------------------|
您可以注意到,当我开始按 vendor_id 查询以获取平均销售额时,我离所需的输出越来越远。具体来说,产品列不再产生正确数量的产品,因为并非所有供应商都携带所有相同的产品,这使得 s.vendor_id=1 过滤器变得困难。我必须使用它才能按供应商过滤这些报告,同时仍能在视图字段中获得准确的总和。
我已尝试使用 LEFT JOIN 进行上述查询,但结果仍然不准确,并且不确定需要发生什么,可能是某种子查询?
【问题讨论】:
今日提示:切换到现代、明确的JOIN
语法。更容易编写(没有错误),更容易阅读(和维护),并且在需要时更容易转换为外连接。
Category
可以没有Products
吗?
为什么views
不是Product
的字段?
你的average_sales
值不应该是kitchen
是4250
吗? music
的 average_sales
值不应该是 4500
吗?
如果您将product_id = 102
的amount
值更改为1900
,那么这是否会使该产品的平均销售额等于950
或1900
?
【参考方案1】:
请尝试以下...
SELECT Category.name AS category,
COUNT( * ) AS count_Product,
SUM( views ) AS sum_views,
ROUND( COALESCE( SUM( sumAmount ) / SUM( countAmounts ), 0 ) ) AS average_sales,
SUM( whereGreater ) AS 'sum_views_where_sales_=>_1000',
SUM( whereLesser ) AS 'sum_views_sales_<_1000'
FROM Category
JOIN Product ON Category.id = Product.category_id
JOIN Activity ON Product.id = Activity.product_id
LEFT JOIN ( SELECT product_id AS product_id,
SUM( amount ) AS sumAmount,
COUNT( * ) AS countAmounts
FROM Sales
GROUP BY product_id
) sumCountAmountFinder ON Product.id = sumCountAmountFinder.product_id
JOIN ( SELECT Activity.product_id AS product_id,
SUM( CASE WHEN COALESCE( meanAmount, 0 ) >= 1000 THEN views ELSE 0 END ) AS whereGreater,
SUM( CASE WHEN COALESCE( meanAmount, 0 ) < 1000 THEN views ELSE 0 END ) AS whereLesser
FROM Activity
LEFT JOIN ( SELECT product_id AS product_id,
SUM( amount ) / COUNT( * ) AS meanAmount
FROM Sales
GROUP BY product_id
) AS meanAmountFinder ON Activity.product_id = meanAmountFinder.product_id
GROUP BY Activity.product_id
) sumWhereFinder ON Product.id = sumWhereFinder.product_id
GROUP BY Category.name;
假设
Category
中的一条记录将始终与Product
中的至少一条记录相关联。
Product
中的每条记录在Activity
中都有对应的记录。
说明
我的声明首先在Category
和Product
之间执行INNER JOIN
,有效地为我们提供了与每个Category
关联的Products
列表。
INNER JOIN
在Activity
和上述连接的数据集之间执行,有效地将views
字段的每个值附加到其对应的Product
记录。
然后使用以下子查询来确定每个product_id
IN Sales
的amount
总数以及每个product_id
的记录数。然后在Product
和子查询之间执行LEFT JOIN
,有效地将子查询中的每条记录附加到上述连接数据集中对应的Product
。使用了LEFT JOIN
而不是INNER JOIN
,因为并非Product
中的所有记录都将在Sales
中具有相应的记录,并且我们不希望Product
中的任何记录因缺席而被排除在外。
SELECT product_id AS product_id,
SUM( amount ) AS sumAmount,
COUNT( * ) AS countAmounts
FROM Sales
GROUP BY product_id
然后使用另一个子查询(如下)来计算Sales
中每个product_id
的平均amount
。
SELECT product_id AS product_id,
SUM( amount ) / COUNT( * ) AS meanAmount
FROM Sales
GROUP BY product_id
然后在Activity
和子查询的结果之间执行LEFT JOIN
。然后将每个product_id
的amount
的平均值与1000
进行比较,并将product_id
的对应值views
放在适当的字段中,0
放在另一个字段中。如果product_id
的值在Sales
中没有任何对应的记录,则将0
放在两个字段中。然后将父子查询(随后)生成的结果连接到上述连接的数据集。
SELECT Activity.product_id AS product_id,
SUM( CASE WHEN COALESCE( meanAmount, 0 ) >= 1000 THEN views ELSE 0 END ) AS whereGreater,
SUM( CASE WHEN COALESCE( meanAmount, 0 ) < 1000 THEN views ELSE 0 END ) AS whereLesser
FROM Activity
LEFT JOIN ( SELECT product_id AS product_id,
SUM( amount ) / COUNT( * ) AS meanAmount
FROM Sales
GROUP BY product_id
) AS meanAmountFinder ON Activity.product_id = meanAmountFinder.product_id
GROUP BY Activity.product_id
现在我们形成了最终的连接数据集,然后记录按其值 Category.name
分组。计算并返回每个组的值Category.name
及其对应的聚合值。
测试
我的语句已针对使用以下代码创建的示例数据库进行了测试...
CREATE TABLE Category
(
id INT,
name VARCHAR( 50 )
);
INSERT INTO Category ( id,
name )
VALUES ( 1, 'furniture' ),
( 2, 'music' ),
( 3, 'kitchen' ),
( 4, 'adventure' );
CREATE TABLE Product
(
id INT,
name VARCHAR( 50 ),
category_id INT
);
INSERT INTO Product ( id,
name,
category_id )
VALUES ( 101, 'couch', 1 ),
( 102, 'chair', 1 ),
( 103, 'drum', 2 ),
( 104, 'flute', 2 ),
( 105, 'pot', 3 ),
( 106, 'pan', 3 ),
( 107, 'kitchen sink', 3 ),
( 108, 'unicorn saddle', 4 ),
( 109, 'unicorn shoes', 4 ),
( 110, 'horse shampoo', 4 );
CREATE TABLE Activity
(
id INT,
product_id INT,
views INT
);
INSERT INTO Activity ( id,
product_id,
views )
VALUES ( 1, 101, 1000 ),
( 2, 102, 2000 ),
( 3, 103, 3000 ),
( 4, 104, 4000 ),
( 5, 105, 5000 ),
( 6, 106, 6000 ),
( 7, 107, 7000 ),
( 8, 108, 8000 ),
( 9, 109, 9000 ),
( 10, 110, 10000 );
CREATE TABLE Sales
(
id INT,
product_id INT,
vendor_id INT,
amount INT
);
INSERT INTO Sales ( id,
product_id,
vendor_id,
amount )
VALUES ( 1, 101, 1, 1000 ),
( 2, 102, 1, 900 ),
( 3, 103, 1, 2000 ),
( 4, 105, 1, 3000 ),
( 5, 107, 1, 5000 ),
( 6, 101, 2, 600 ),
( 7, 103, 2, 7000 ),
( 8, 105, 2, 8000 ),
( 9, 107, 2, 1000 ),
( 10, 108, 1, 500 ),
( 11, 109, 1, 600 ),
( 12, 108, 2, 400 ),
( 13, 109, 2, 500 );
如果您有任何问题或cmets,请随时发表评论。
进一步阅读
https://dev.mysql.com/doc/refman/5.7/en/case.html(在 MySQL 的 CASE
语句上)
https://dev.mysql.com/doc/refman/5.7/en/comparison-operators.html#function_coalesce(在 MySQL 的 COALESCE()
函数上)
https://dev.mysql.com/doc/refman/5.7/en/group-by-functions.html#function_count(在 MySQL 的 'COUNT()` 聚合函数上)
https://www.w3schools.com/sql/sql_join.asp(在 SQL 中各种水平的JOIN
's)
https://dev.mysql.com/doc/refman/5.7/en/mathematical-functions.html#function_round(在 MySQL 的 ROUND()
函数上)
https://dev.mysql.com/doc/refman/5.7/en/select.html(在 MySQL 的 SELECT
语句上)
https://dev.mysql.com/doc/refman/5.7/en/group-by-functions.html#function_sum(在 MySQL 的 SUM()
聚合函数上)
附录
如果我的两个假设都不正确,那么请尝试以下代码...
SELECT Category.name AS category,
COUNT( * ) AS count_Product,
COALESCE( SUM( views ), '' ) AS sum_views,
COALESCE( ROUND( SUM( sumAmount ) / SUM( countAmounts ), '' ) ) AS average_sales,
COALESCE( SUM( whereGreater ), '' ) AS 'sum_views_where_sales_=>_1000',
COALESCE( SUM( whereLesser ), '' ) AS 'sum_views_sales_<_1000'
FROM Category
LEFT JOIN Product ON Category.id = Product.category_id
LEFT JOIN Activity ON Product.id = Activity.product_id
LEFT JOIN ( SELECT product_id AS product_id,
SUM( amount ) AS sumAmount,
COUNT( * ) AS countAmounts
FROM Sales
GROUP BY product_id
) sumCountAmountFinder ON Product.id = sumCountAmountFinder.product_id
LEFT JOIN ( SELECT Activity.product_id AS product_id,
SUM( CASE WHEN COALESCE( meanAmount, 0 ) >= 1000 THEN views ELSE 0 END ) AS whereGreater,
SUM( CASE WHEN COALESCE( meanAmount, 0 ) < 1000 THEN views ELSE 0 END ) AS whereLesser
FROM Activity
LEFT JOIN ( SELECT product_id AS product_id,
SUM( amount ) / COUNT( * ) AS meanAmount
FROM Sales
GROUP BY product_id
) AS meanAmountFinder ON Activity.product_id = meanAmountFinder.product_id
GROUP BY Activity.product_id
) sumWhereFinder ON Product.id = sumWhereFinder.product_id
GROUP BY Category.name;
【讨论】:
你需要左连接吗? 感谢您提出问题。我在活动和销售之间做。根据我将在解释中概述的假设,我不会在其他地方进行。 哇...这是一个很棒的答案。我最终与另一个一起进行了一些调整以完成工作。非常感谢您的帮助!【参考方案2】:您的报告要求非常复杂。您可能已经进入这个项目,认为它比实际简单得多。
在这种情况下,您根据独立的表格度量(观看次数和销售额)报告摘要。
因此,您需要从不将两个详细测量表连接在一起的聚合子查询开始。这是一个这样的查询。它可以让您按类别查看。 http://sqlfiddle.com/#!9/02f4b6/31/0
SELECT c.id category_id, SUM(a.views) views
FROM activity a
JOIN product p ON a.product_id = p.id
JOIN category c ON p.category_id = c.id
GROUP BY c.id
这是另一个这样的查询。它可以让您按类别销售。 http://sqlfiddle.com/#!9/02f4b6/32/0
SELECT c.id category_id,
SUM(s.amount) total_sales,
AVG(s.amount) avg_sales
FROM sales s
JOIN product p ON s.product_id = p.id
JOIN category c ON p.category_id = c.id
GROUP BY c.id
接下来,您需要按类别计算产品数量。幸运的是,每个产品只能属于一个类别。 http://sqlfiddle.com/#!9/02f4b6/42/0
SELECT c.id category_id,
COUNT(*) products
FROM product p
JOIN category c ON p.category_id = c.id
GROUP BY c.id
现在,有必要将这些项目连接在一起。从category
表和LEFT JOIN
其他三个开始,就像这样。 http://sqlfiddle.com/#!9/02f4b6/51/0
SELECT c.name, aggproducts.products,
aggviews.views, aggsales.avg_sales,
aggsales.total_sales
FROM category c
LEFT JOIN (
SELECT c.id category_id, SUM(a.views) views
FROM activity a
JOIN product p ON a.product_id = p.id
JOIN category c ON p.category_id = c.id
GROUP BY c.id
) aggviews ON c.id = aggviews.category_id
LEFT JOIN (
SELECT c.id category_id,
SUM(s.amount) total_sales,
AVG(s.amount) avg_sales
FROM sales s
JOIN product p ON s.product_id = p.id
JOIN category c ON p.category_id = c.id
GROUP BY c.id
) aggsales ON c.id = aggsales.category_id
LEFT JOIN (
SELECT c.id category_id,
COUNT(*) products
FROM product p
JOIN category c ON p.category_id = c.id
GROUP BY c.id
) aggproducts ON c.id = aggproducts.category_id
诀窍是为每个度量创建聚合子查询,其中包含每个类别零行或一行。如果任何聚合子查询的每个类别包含多于一行,则由于 JOIN 组合爆炸,您会开始出现重复行。
然后你LEFT JOIN
那些聚合子查询到类别表。不要使用普通的JOIN
,因为如果任何聚合子查询缺少特定类别,这将抑制结果中的行。
请注意,您使用这些子查询就好像它们是表格一样。这种从子查询构建查询的能力正是将结构化置于结构化查询语言中的原因。 p>
这些是基础。现在,您需要另一个聚合子查询来获取这些条件总和。我要把那个留给你。
【讨论】:
您可以根据所需的输出按category
排序来提高此答案的质量。您还可以使用别名根据所需的输出设置字段名称。我还建议在平均销售领域使用ROUND()
函数。【参考方案3】:
你在计算你期望在输出中得到的最终表输出时有很多错误。
我试图理解你在桌子上的意思,结果如下:
SELECT
( `cat`.`name` ) AS `category`,
COUNT( `p`.`name` ) AS `productsInGroup`,
SUM( `a`.`views` ) AS `viewsOnGroup`,
SUM( `s`.`amount` ) / SUM( `salesCnt` ) AS `average_sales`,
IF( SUM(`s`.`amount`) / SUM( `salesCnt` ) >= 1000, SUM( `a`.`views` ) - SUM( IF(`s`.`salesCnt` IS NULL, `a`.`views`, 0 ) ), 0 ) AS `sum_views_where_sales_>=_1000`,
IF( SUM(`s`.`amount`) / SUM( `salesCnt` ) < 1000, SUM( `a`.`views` ) , SUM( IF(`s`.`salesCnt` IS NULL, `a`.`views`, 0 ) ) ) AS `sum_views_where_sales_<_1000`
FROM
`product` AS `p`
INNER JOIN
`category` AS `cat`
ON `cat`.`id` = `p`.`category_id`
LEFT JOIN
`activity` AS `a`
ON `a`.`product_id` = `p`.`id`
LEFT JOIN(
SELECT
`product_id`,
COUNT( `product_id` ) AS `salesCnt`,
SUM( `amount` ) AS `amount`
FROM `sales`
GROUP BY `product_id`
) AS `s`
ON `s`.`product_id` = `a`.`product_id`
GROUP BY
`category`;
如果结果正确,请告诉我,我将使用变量中的保存计算来优化查询。
http://sqlfiddle.com/#!9/02f4b6/144
【讨论】:
以上是关于查询多表时如何准确使用聚合函数?的主要内容,如果未能解决你的问题,请参考以下文章
如何使用聚合函数在 MySQL 查询中获取分组记录的第一条和最后一条记录?