sql:在sql中相交(加入概率)
Posted
技术标签:
【中文标题】sql:在sql中相交(加入概率)【英文标题】:sql: intersect in sql (join probability) 【发布时间】:2020-06-29 19:19:12 【问题描述】:我正在为一个 SQL 问题而苦苦挣扎,但我就是想不通。我得数一数喜欢两种食物的人数。
我有一个显示 user_ids 的表格和一个指示他们喜欢什么类型的食物的列。 ID 可能是重复的,因为一个人可能喜欢不止一种食物。 我还有一个显示 user_ids 的表格和一个指示他们喜欢什么类型的饮料的列。同样,用户 ID 可能是重复的。
有了这两个表,我必须创建一个表来统计喜欢某种食物的人数,统计喜欢某种饮料的人数和喜欢这对的人数。以下是我期望的示例:
foods . --------- drinks --------- count_food ------- count_drink -------count_combination
hamburger coke . 17 . 67 21
pizza coke . 45 67 8
chicken coke 21 67 25
到目前为止,我已经创建了 food、drinks、count_food 和 count_drink 列。我使用交叉连接函数创建食品和饮料列,并使用计数函数填充 count_food 和计数饮料列。但是,我被困在 count_combination 列中。我只是不知道怎么做。这可以通过交叉验证连接来完成吗?
感谢您的帮助:)
【问题讨论】:
【参考方案1】:由于您在问题中提供了所需的输出,因此我设法使用示例数据重新创建了您的案例。
为了达到最终的结果,我已经完成了你提到的所有步骤:统计只喜欢一种饮料的人,统计只喜欢一种食物的人然后统计喜欢饮料和食物的人。
以下是示例数据和我采取的步骤,
#sample data for food
WITH food AS(
SELECT 1 AS user_ids, "hamburguer" AS foods UNION ALL
SELECT 1 AS user_ids, "hamburguer" AS foods UNION ALL
SELECT 2 AS user_ids, "hamburguer" AS foods UNION ALL
SELECT 2 AS user_ids, "pizza" AS foods UNION ALL
SELECT 2 AS user_ids, "pizza" AS foods UNION ALL
SELECT 3 AS user_ids, "chicken" AS foods
),
#sample data for drink
drink AS(
SELECT 1 AS user_ids, "coke" AS drinks UNION ALL
SELECT 2 AS user_ids, "coke" AS drinks UNION ALL
SELECT 2 AS user_ids, "coke" AS drinks UNION ALL
SELECT 4 AS user_ids, "coke" AS drinks UNION ALL
SELECT 5 AS user_ids, "coke" AS drinks
),
#count how many people like each type of food
count_foods AS (
SELECT COUNT(foods) AS count_foods, foods FROM food GROUP BY foods
),
#count how many people like each type of drink
count_drinks AS(
SELECT COUNT(drinks) AS count_drinks, drinks FROM drink GROUP BY drinks
),
#making all the possible combinations between foods and drinks with CROSS JOIN
food_drink_only AS (
SELECT foods, drinks, count_foods, count_drinks FROM count_foods a CROSS JOIN count_drinks b
),
#people who like one food and a drink, for ex.: user_ids = 1 likes hamburguer and coke
like_both AS (
SELECT COUNT(user_ids) AS count_both, foods, drinks FROM (SELECT DISTINCT user_ids, foods FROM food)
INNER JOIN (SELECT DISTINCT user_ids, drinks FROM drink) USING(user_ids) GROUP BY 2,3
)
#Using left join with foods and drinks as primary keys because all the combinations (fodds, drinks) came from the CROSS JOIN
#and are in the left table
SELECT a.foods,a.drinks,a.count_foods,a.count_drinks, b.count_both FROM food_drink_only a
LEFT JOIN like_both b ON a.foods = b.foods AND a.drinks=b.drinks
还有输出,
Row foods drinks count_foods count_drinks count_both
1 hamburguer coke 3 5 2
2 pizza coke 2 5 1
3 chicken coke 1 5 null
首先,请注意CROSS JOIN 为我们提供了foods
和drinks
之间的所有可能组合。因此,在创建最终输出时使用LEFT JOIN。然后,注意like_both
表中会有2个喜欢可乐和汉堡的用户,1个喜欢可乐和比萨的用户,0个喜欢可乐和鸡肉的用户。出于这个原因,foods
和 drinks
字段被用作主键来连接这个表和food_drink_only
表。因此,数据被放置在正确的食物和饮料组合中。
【讨论】:
以上是关于sql:在sql中相交(加入概率)的主要内容,如果未能解决你的问题,请参考以下文章
如图,向方格纸(方格边长为a)内投掷直径为2r的硬币(2r<a),求硬币不与线条相交的概率。
数据科学速查手册(包括机器学习,概率,微积分,线性代数,python,pandas,numpy,数据可视化,SQL,大数据等方向)