查找第三个表中不存在的两个表的组合

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了查找第三个表中不存在的两个表的组合相关的知识,希望对你有一定的参考价值。

我有三张桌子:

表A(> 1.000.000行)

+----+-----------+
| id | field_A_1 |
+----+-----------+
|  1 | testa1    |
|  2 | testa2    |
|  3 | testa3    |
+----+-----------+

表B(~100行)

+----+-----------+
| id | field_B_1 |
+----+-----------+
|  1 | testb1    |
|  2 | testb2    |
|  3 | testb3    |
+----+-----------+

表C(> 10.000.000行)

+----+---------------+---------------+
| id | field_A_1     | fk_id_table_B |
+----+---------------+---------------+
|  1 |        testa1 |             1 |
|  2 |        testa2 |             2 |
|  3 |        testa3 |             3 |
+----+---------------+---------------+

我想找到不在表C中的A和B的所有组合。不幸的是,表a field_A_1 / table C field_A_1是varchar。

结果将是这个例子:

+-----------+---------------+
| field_A_1 | fk_id_table_B |
+-----------+---------------+
| testa1    |             2 |
| testa1    |             3 |
| testa2    |             1 |
| testa2    |             3 |
| testa3    |             1 |
| testa3    |             2 |
+-----------+---------------+

答案结果:

EXPLAIN
SELECT count(a.field_A_1),
       b.id AS fk_id_table_B
FROM a,
CROSS JOIN b
WHERE NOT EXISTS
    (SELECT 1
     FROM c
     WHERE c.field_A_1=a.field_A_1
       AND fk_id_table_B=b.id)
GROUP BY fk_id_table_B

+----+--------------------+-------+-------+----------------------------+----------------------------+---------+------------------+------------+----------------------------------------------------+
| id |    select_type     | table | type  |       possible_keys        |            key             | key_len |       ref        |    rows    |                       Extra                        |
+----+--------------------+-------+-------+----------------------------+----------------------------+---------+------------------+------------+----------------------------------------------------+
|  1 | PRIMARY            | b     | index | PRIMARY,b.id_foreign       | b.id_foreign               | 4       | NULL             | ~100       | Using index; Using temporary; Using filesort       |
|  1 | PRIMARY            | a     | ALL   | NULL                       | NULL                       | NULL    | NULL             | >1.000.000 | Using where; Using join buffer (Block Nested Loop) |
|  2 | DEPENDENT SUBQUERY | c     | ref   | IDX_TABLE_C_B_ID_FIELD_A_1 | IDX_TABLE_C_B_ID_FIELD_A_1 | 36      | b.id,a.field_A_1 | 4          | Using index                                        |
+----+--------------------+-------+-------+----------------------------+----------------------------+---------+------------------+------------+----------------------------------------------------+

运行时未知我在一分钟后杀了查询,发送数据花了太长时间。

EXPLAIN
SELECT count(t1.field_A_1), t1.bid
FROM
(
  SELECT field_A_1, b.id as bid
  FROM TableA as a, TableB as b 
) AS t1
LEFT JOIN TableC AS c ON t1.field_A_1 = c.field_A_1 AND t1.bid = c.fk_id_table_B
WHERE c.field_A_1 IS NULL AND c.fk_id_table_B is null
GROUP BY t1.bid

+----+-------------+------------+-------+----------------------------+----------------------------+---------+---------------------+------------+--------------------------------------+
| id | select_type |   table    | type  |       possible_keys        |            key             | key_len |         ref         |    rows    |                Extra                 |
+----+-------------+------------+-------+----------------------------+----------------------------+---------+---------------------+------------+--------------------------------------+
|  1 | PRIMARY     | <derived2> | ALL   | NULL                       | NULL                       | NULL    | NULL                | 100million | Using temporary; Using filesort      |
|  1 | PRIMARY     | c          | ref   | IDX_TABLE_C_B_ID_FIELD_A_1 | IDX_TABLE_C_B_ID_FIELD_A_1 | 36      | t1.bid,t1.field_A_1 | 4          | Using where; Not exists; Using index |
|  2 | DERIVED     | b          | index | NULL                       | b.id_foreign               | 4       | NULL                | ~100       | Using index                          |
|  2 | DERIVED     | a          | ALL   | NULL                       |                            |         |                     |            |                                      |
+----+-------------+------------+-------+----------------------------+----------------------------+---------+---------------------+------------+--------------------------------------+

运行时未知我在一分钟后杀了查询,发送数据花了太长时间。

答案

你可以这样做:

SELECT t1.*
FROM
(
  SELECT field_A_1, b.id as bid
  FROM TableA as a, TableB as b 
) AS t1
LEFT JOIN TableC AS c ON t1.field_A_1 = c.field_A_1 AND t1.bid = c.fk_id_table_B
WHERE c.field_A_1 IS NULL AND c.fk_id_table_B is null;

表a和b FROM TableA as a, TableB as b之间的交叉连接将为您提供两个表之间的所有可能组合。

然后使用带有表c和LEFT JOIN谓词的IS NULL,您只能拥有表c中不存在的那些组合。因为那些不存在的组合将对两个连接列都具有空值。


结果:

| field_A_1 | bid |
|-----------|-----|
|    testa2 |   1 |
|    testa3 |   1 |
|    testa1 |   2 |
|    testa3 |   2 |
|    testa1 |   3 |
|    testa2 |   3 |
另一答案

A和B的所有组合都是CROSS JOIN,使用NOT EXISTS过滤它们:

select a.field_A_1, b.id as fk_id_table_B
  from a, cross join b
where not exists (select 1 from c where c.field_A_1=a.field_A_1 
                                    and fk_id_table_B=b.id)

以上是关于查找第三个表中不存在的两个表的组合的主要内容,如果未能解决你的问题,请参考以下文章

HIVE:如何仅从两个表中的两列中选择第三个表中不存在的不同值?

对第二个表中不存在 ID 的两个表的 MySQL 查询优化

仅当表中不存在两个 id 的组合时才将值插入表中

mySQL 从一个表中选择,该表在另一个表中不存在,并且不是第三个表中的子表

从一个表中查找另一个表中不存在的记录

选择不在其他表sql server中的字段组合