查找第三个表中不存在的两个表的组合
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了查找第三个表中不存在的两个表的组合相关的知识,希望对你有一定的参考价值。
我有三张桌子:
表A(> 1.000.000行)
+----+-----------+
| id | field_A_1 |
+----+-----------+
| 1 | testa1 |
| 2 | testa2 |
| 3 | testa3 |
+----+-----------+
表B(~100行)
+----+-----------+
| id | field_B_1 |
+----+-----------+
| 1 | testb1 |
| 2 | testb2 |
| 3 | testb3 |
+----+-----------+
表C(> 10.000.000行)
+----+---------------+---------------+
| id | field_A_1 | fk_id_table_B |
+----+---------------+---------------+
| 1 | testa1 | 1 |
| 2 | testa2 | 2 |
| 3 | testa3 | 3 |
+----+---------------+---------------+
我想找到不在表C中的A和B的所有组合。不幸的是,表a field_A_1 / table C field_A_1是varchar。
结果将是这个例子:
+-----------+---------------+
| field_A_1 | fk_id_table_B |
+-----------+---------------+
| testa1 | 2 |
| testa1 | 3 |
| testa2 | 1 |
| testa2 | 3 |
| testa3 | 1 |
| testa3 | 2 |
+-----------+---------------+
答案结果:
EXPLAIN
SELECT count(a.field_A_1),
b.id AS fk_id_table_B
FROM a,
CROSS JOIN b
WHERE NOT EXISTS
(SELECT 1
FROM c
WHERE c.field_A_1=a.field_A_1
AND fk_id_table_B=b.id)
GROUP BY fk_id_table_B
+----+--------------------+-------+-------+----------------------------+----------------------------+---------+------------------+------------+----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-------+-------+----------------------------+----------------------------+---------+------------------+------------+----------------------------------------------------+
| 1 | PRIMARY | b | index | PRIMARY,b.id_foreign | b.id_foreign | 4 | NULL | ~100 | Using index; Using temporary; Using filesort |
| 1 | PRIMARY | a | ALL | NULL | NULL | NULL | NULL | >1.000.000 | Using where; Using join buffer (Block Nested Loop) |
| 2 | DEPENDENT SUBQUERY | c | ref | IDX_TABLE_C_B_ID_FIELD_A_1 | IDX_TABLE_C_B_ID_FIELD_A_1 | 36 | b.id,a.field_A_1 | 4 | Using index |
+----+--------------------+-------+-------+----------------------------+----------------------------+---------+------------------+------------+----------------------------------------------------+
运行时未知我在一分钟后杀了查询,发送数据花了太长时间。
EXPLAIN
SELECT count(t1.field_A_1), t1.bid
FROM
(
SELECT field_A_1, b.id as bid
FROM TableA as a, TableB as b
) AS t1
LEFT JOIN TableC AS c ON t1.field_A_1 = c.field_A_1 AND t1.bid = c.fk_id_table_B
WHERE c.field_A_1 IS NULL AND c.fk_id_table_B is null
GROUP BY t1.bid
+----+-------------+------------+-------+----------------------------+----------------------------+---------+---------------------+------------+--------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+-------+----------------------------+----------------------------+---------+---------------------+------------+--------------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 100million | Using temporary; Using filesort |
| 1 | PRIMARY | c | ref | IDX_TABLE_C_B_ID_FIELD_A_1 | IDX_TABLE_C_B_ID_FIELD_A_1 | 36 | t1.bid,t1.field_A_1 | 4 | Using where; Not exists; Using index |
| 2 | DERIVED | b | index | NULL | b.id_foreign | 4 | NULL | ~100 | Using index |
| 2 | DERIVED | a | ALL | NULL | | | | | |
+----+-------------+------------+-------+----------------------------+----------------------------+---------+---------------------+------------+--------------------------------------+
运行时未知我在一分钟后杀了查询,发送数据花了太长时间。
答案
你可以这样做:
SELECT t1.*
FROM
(
SELECT field_A_1, b.id as bid
FROM TableA as a, TableB as b
) AS t1
LEFT JOIN TableC AS c ON t1.field_A_1 = c.field_A_1 AND t1.bid = c.fk_id_table_B
WHERE c.field_A_1 IS NULL AND c.fk_id_table_B is null;
表a和b FROM TableA as a, TableB as b
之间的交叉连接将为您提供两个表之间的所有可能组合。
然后使用带有表c和LEFT JOIN
谓词的IS NULL
,您只能拥有表c中不存在的那些组合。因为那些不存在的组合将对两个连接列都具有空值。
结果:
| field_A_1 | bid |
|-----------|-----|
| testa2 | 1 |
| testa3 | 1 |
| testa1 | 2 |
| testa3 | 2 |
| testa1 | 3 |
| testa2 | 3 |
另一答案
A和B的所有组合都是CROSS JOIN,使用NOT EXISTS过滤它们:
select a.field_A_1, b.id as fk_id_table_B
from a, cross join b
where not exists (select 1 from c where c.field_A_1=a.field_A_1
and fk_id_table_B=b.id)
以上是关于查找第三个表中不存在的两个表的组合的主要内容,如果未能解决你的问题,请参考以下文章
HIVE:如何仅从两个表中的两列中选择第三个表中不存在的不同值?