根据标志和 id 连接两个数据集

Posted 2023-02-24

技术标签:

【中文标题】根据标志和 id 连接两个数据集【英文标题】：Join two datasets based on a flag and id 【发布时间】：2021-12-22 19:58:02 【问题描述】：

我正在尝试基于标志和 ID 加入两个数据集。即

proc sql;
create table demo as
select a.*,b.b1,b.2
from table1 a
left join table2 on
(a.flag=b.flag and a.id=b.id) or (a.flag ne b.flag and a.id=b.id)
end;

此代码运行到一个循环中并且永远不会产生输出。我想确保在哪里有匹配的标志值获取属性；如果没有获取 id 级别的属性，那么我们就没有空白值。

【问题讨论】：

请考虑提供Minimal Reproducible Example。您的 ON 条件只是 A.ID=B.ID。也许在其他使用三级逻辑的 SQL 实现中，任何一个数据集中的 FLAG 值缺失（又称为空）可能会导致表达式不仅仅是 A.ID=B.ID。但 SAS 只使用二进制逻辑。值要么相同，要么即使缺少一个或两个值也不相同。我看到您在“left join table2 on”这一行忘记了“b”。重写它：left join table2 b on 【参考方案1】：

无法优化此连接条件。在连接中使用or 不是一个好习惯。如果你检查你的日志，你会看到：

NOTE: The execution of this query involves performing one or more Cartesian product joins 
that can not be optimized.

相反，将您的查询转换为联合：

proc sql;
    create table demo as
        select a.*,
               b.b1,
               b.b2
        from table1 as a
        left join 
             table2 as b
        on a.flag=b.flag and a.id=b.id

        UNION

        select a.*,
               b.b1,
               b.b2
        from table1 as a
        left join 
             table2 as b
        on a.flag ne b.flag and a.id=b.id
    ;
quit;

【讨论】：

以上是关于根据标志和 id 连接两个数据集的主要内容，如果未能解决你的问题，请参考以下文章