sql:在sql中相交(加入概率)

Posted

技术标签:

【中文标题】sql:在sql中相交(加入概率)【英文标题】:sql: intersect in sql (join probability) 【发布时间】:2020-06-29 19:19:12 【问题描述】:

我正在为一个 SQL 问题而苦苦挣扎,但我就是想不通。我得数一数喜欢两种食物的人数。

我有一个显示 user_ids 的表格和一个指示他们喜欢什么类型的食物的列。 ID 可能是重复的,因为一个人可能喜欢不止一种食物。 我还有一个显示 user_ids 的表格和一个指示他们喜欢什么类型的饮料的列。同样,用户 ID 可能是重复的。

有了这两个表,我必须创建一个表来统计喜欢某种食物的人数,统计喜欢某种饮料的人数和喜欢这对的人数。以下是我期望的示例:

foods .  --------- drinks --------- count_food ------- count_drink -------count_combination
hamburger          coke .              17 .              67                  21
pizza              coke .              45                67                  8
chicken            coke                21                67                  25

到目前为止,我已经创建了 food、drinks、count_food 和 count_drink 列。我使用交叉连接函数创建食品和饮料列,并使用计数函数填充 count_food 和计数饮料列。但是,我被困在 count_combination 列中。我只是不知道怎么做。这可以通过交叉验证连接来完成吗?

感谢您的帮助:)

【问题讨论】:

【参考方案1】:

由于您在问题中提供了所需的输出,因此我设法使用示例数据重新创建了您的案例。

为了达到最终的结果,我已经完成了你提到的所有步骤:统计只喜欢一种饮料的人,统计只喜欢一种食物的人然后统计喜欢饮料和食物的人。

以下是示例数据和我采取的步骤,

#sample data for food
WITH food AS(
SELECT 1 AS user_ids, "hamburguer" AS foods UNION ALL
SELECT 1 AS user_ids, "hamburguer" AS foods UNION ALL
SELECT 2 AS user_ids, "hamburguer" AS foods UNION ALL
SELECT 2 AS user_ids, "pizza" AS foods UNION ALL
SELECT 2 AS user_ids, "pizza" AS foods UNION ALL
SELECT 3 AS user_ids, "chicken" AS foods 
),

#sample data for drink
drink AS(
SELECT 1 AS user_ids, "coke" AS drinks UNION ALL
SELECT 2 AS user_ids, "coke" AS drinks UNION ALL
SELECT 2 AS user_ids, "coke" AS drinks UNION ALL
SELECT 4 AS user_ids, "coke" AS drinks UNION ALL
SELECT 5 AS user_ids, "coke" AS drinks 
),
#count how many people like each type of food
count_foods AS (
SELECT COUNT(foods) AS count_foods, foods FROM food GROUP BY foods
),
#count how many people like each type of drink
count_drinks AS(
SELECT COUNT(drinks) AS count_drinks, drinks FROM drink GROUP BY drinks
),
#making all the possible combinations between foods and drinks with CROSS JOIN
food_drink_only AS (
SELECT foods, drinks, count_foods, count_drinks FROM count_foods a CROSS JOIN count_drinks b
),
#people who like one food and a drink, for ex.: user_ids = 1 likes hamburguer and coke
like_both AS (
SELECT  COUNT(user_ids) AS count_both, foods, drinks FROM (SELECT DISTINCT user_ids, foods FROM food) 
INNER JOIN (SELECT DISTINCT user_ids, drinks FROM drink) USING(user_ids) GROUP BY  2,3
)
#Using left join with foods and drinks as primary keys because all the combinations (fodds, drinks) came from the CROSS JOIN
#and are in the left table
SELECT a.foods,a.drinks,a.count_foods,a.count_drinks, b.count_both FROM food_drink_only a
LEFT JOIN like_both b ON a.foods = b.foods AND a.drinks=b.drinks

还有输出,

Row foods   drinks  count_foods count_drinks    count_both
1   hamburguer  coke    3   5   2
2   pizza   coke    2   5   1
3   chicken coke    1   5   null

首先,请注意CROSS JOIN 为我们提供了foodsdrinks 之间的所有可能组合。因此,在创建最终输出时使用LEFT JOIN。然后,注意like_both表中会有2个喜欢可乐和汉堡的用户,1个喜欢可乐和比萨的用户,0个喜欢可乐和鸡肉的用户。出于这个原因,foodsdrinks 字段被用作主键来连接这个表和food_drink_only 表。因此,数据被放置在正确的食物和饮料组合中。

【讨论】:

以上是关于sql:在sql中相交(加入概率)的主要内容,如果未能解决你的问题,请参考以下文章

如图,向方格纸(方格边长为a)内投掷直径为2r的硬币(2r<a),求硬币不与线条相交的概率。

一条SQL搞定信息增益的计算

数据科学速查手册(包括机器学习,概率,微积分,线性代数,python,pandas,numpy,数据可视化,SQL,大数据等方向)

51nod 1381概率与期望(内含基础知识)硬币游戏

51nod 1381概率与期望(内含基础知识)硬币游戏

[离散概率理论]