比较 JSON 值并识别差异 -Snowflake SQL

Posted

技术标签:

【中文标题】比较 JSON 值并识别差异 -Snowflake SQL【英文标题】:Compare JSON Values and identify the differences -Snowflake SQL 【发布时间】:2020-12-03 18:16:53 【问题描述】:

我正在尝试比较每笔交易的两个 JSON 值的集合,并从下面提取特定值。我想提取以下值 cCode、dCode、hcps 和 mod..请您指导我了解雪花相同的 SQL 语法。第一个 JSON 由编码器完成,第二个 json 由审计员完成

1st json
[
  
    "cCode": "7832",
    "Date": "08/26/2020",
    "ID": "511",
    "description": "holos",
    "dgoses": [
      
        "description": "disease",
        "dCode": "Y564",
        "CodeAllId": "8921",
        "messages": [
          ""
        ],
        "sequenceNumber": 1
      ,
      
        "description": "acute pain",
        "dCode": "U3321",
        "CodeAllId": "33213",
        "messages": [
          ""
        ],
        "sequenceNumber": 2
      ,
      
        "description": "height",
        "dCode": "U1111",
        "CodeAllId": "33278",
        "messages": [
          ""
        ],
        "sequenceNumber": 3
      ,
      
        "description": "PIDEMIA ",
        "dCode": "H8811",
        "CodeAllId": "90000",
        "messages": [
          ""
        ],
        "sequenceNumber": 4
      
    ],
    "familyPlan": "",
    "hcpc": 5,
    "id": "",
    "isEPS": false,
    "mod": "67",
    "originalUnitAmount": "8888",
    "type": "CHARGE",
    "unitAmount": "9000",
    "vId": "90001"
  ,
  
    "cCode": "900114",
    "Date": "08/26/2020",
    "ID": "523",
    "description": "heart valve",
    "dgoses": [
      
        "description": "Fever",
        "dCode": "J8923",
        "CodeAllId": "892138",
        "messages": [
          ""
        ],
        "sequenceNumber": 1
      
    ],
    "familyPlan": "",
    "hcpc": 1,
    "id": "",
    "mod": "26",
    "originalUnitAmount": "19039",
    "type": "CHARGE",
    "unitAmount": "1039",
    "vId": "5113"
  
]

2nd JSON
[
  
    ""cCode"": ""78832"",
    ""Date"": ""08/26/2020"",
    ""ID"": ""511"",
    ""description"": ""holos"",
    ""dgoses"": [
      
        ""description"": ""disease"",
        ""dCode"": ""Y564"",
        ""CodeAllId"": ""8921"",
        ""messages"": [
          """"
        ],
        ""sequenceNumber"": 1
      ,
      
        ""description"": ""acute pain"",
        ""dCode"": ""U3321"",
        ""CodeAllId"": ""33213"",
        ""messages"": [
          """"
        ],
        ""sequenceNumber"": 2
      ,
      
        ""description"": ""height"",
        ""dCode"": ""U41111"",
        ""CodeAllId"": ""33278"",
        ""messages"": [
          """"
        ],
        ""sequenceNumber"": 3
      ,
      
        ""description"": ""PIDEMIA "",
        ""dCode"": ""H8811"",
        ""CodeAllId"": ""90000"",
        ""messages"": [
          """"
        ],
        ""sequenceNumber"": 4
      
    ],
    ""familyPlan"": """",
    ""hcpc"": 8,
    ""id"": """",
    ""isEPS"": false,
    ""mod"": ""67"",
    ""originalUnitAmount"": ""8888"",
    ""type"": ""CHARGE"",
    ""unitAmount"": ""9000"",
    ""vId"": ""90001""
  ,
  
    ""cCode"": ""900114"",
    ""Date"": ""08/26/2020"",
    ""ID"": ""523"",
    ""description"": ""heart valve"",
    ""dgoses"": [
      
        ""description"": ""Fever"",
        ""dCode"": ""J8923"",
        ""CodeAllId"": ""892138"",
        ""messages"": [
          """"
        ],
        ""sequenceNumber"": 1
      
    ],
    ""familyPlan"": """",
    ""hcpc"": 1,
    ""id"": """",
    ""mod"": ""126"",
    ""originalUnitAmount"": ""19039"",
    ""type"": ""CHARGE"",
    ""unitAmount"": ""1039"",
    ""vId"": ""5113""
  
]

我正在寻找如下结果:

 Billid ctextid cCode-Coder cCode-Auditor   deletedccode    added-ccode dCode-Coder
dCode-Auditor   deleted-dcode   added-dcode hcpc-coder  hcpc-auditor    deletedhcpc addedhcpc   mod-coder   mod-auditor deletedmod  addedmod
 7111   89321   7832,900114 78832,900114    7832    78832   Y564,U3321,U1111,H8811,J8923    Y564,U3321,U41111,H8811,J8923   U1111   U41111  5,1 8,1 5   8   67,26   67,126  26  126

谁能帮帮我

sql 试过

with cte4 as
  
(
  select info:dCode as dtcode
  from cte3, lateral flatten( input => saveinfo:dgoses )
)
select  dCode from cte4, lateral flatten( input => dtcode )

这会直接导致使用错误:

我已经尝试过使用 SQL 服务器版本的代码,但我需要知道如何将 JSON 函数映射到 Snowflake SQL 版本..请您在这里帮忙..

    with I as

( 选择 , dense_rank() over (order by Billid, Ctextid) as tid, dense_rank() over (partition by Billid, Ctextid order by Created) as n 来自##input1 ), 作为 ( 选择 I., mk.[key] as mk, m., dk.[key] as dk, d. 从我 交叉应用 openjson(info) mk 交叉应用 openjson(mk.value) 与 ( cCode nvarchar(max) '$.cCode', dgoses nvarchar(max) '$.dgoses' 作为 json ) 米 交叉应用 openjson(dgoses) dk 交叉应用 openjson(dk.value) 与 ( dCode nvarchar(max) '$.dCode' ) d ), C作为 ( 从 D 中选择 *,其中 n = 1 ), 一个作为 ( 从 D 中选择 *,其中 n = 2 ) 选择 比利德, 由编码, 文本标识,

 (
     select string_agg(cCode, ',') within group (order by mk)
     from 
     (
         select distinct cCode, mk
         from C
         where tid = t.tid
     ) d
 ) as cCodeCoder,
 (
     select string_agg(cCode, ',') within group (order by mk)
     from 
     (
         select distinct cCode, mk
         from A
         where tid = t.tid
     ) d
 ) as cCodeAuditor,
 (
     select string_agg(cCode, ',')
     from
     (
         select cCode
         from C
         where tid = t.tid
         except
         select cCode
         from A 
         where tid = t.tid
     ) d
 ) as deletedcCode,
 (
     select string_agg(cCode, ',')
     from
     (
         select cCode
         from A
         where tid = t.tid
         except
         select cCode
         from C
         where tid = t.tid
     ) d
 ) as addedcCode,

 (
     select string_agg(dCode, ',') within group (order by mk, dk)
     from 
     (
         select distinct dCode, mk, dk
         from C
         where tid = t.tid
     ) d
 ) as dCodeCoder,
 (
     select string_agg(dCode, ',') within group (order by mk, dk)
     from 
     (
         select distinct dCode, mk, dk
         from A
         where tid = t.tid
     ) d
 ) as dCodeAuditor,
 (
     select string_agg(dCode, ',')
     from
     (
         select dCode
         from C
         where tid = t.tid
         except
         select dCode
         from A 
         where tid = t.tid
     ) d
 ) as deleteddCode,
 (
     select string_agg(dCode, ',')
     from
     (
         select dCode
         from A
         where tid = t.tid
         except
         select dCode
         from C
         where tid = t.tid
     ) d
 ) as addeddCode

从我作为 t 其中 n = 1

谢谢, 阿伦

【问题讨论】:

你的表结构是什么?它们是否在两个不同列的同一行中?您的第二个 JSON 有额外的双引号。这不会在 JSON 变体中解析,所以这是复制中的工件吗? @GregPavlik:抱歉,第二个 JSON 中没有双引号。我在问题中编辑了相同的内容。json 位于两个不同的行中...表结构是 Billid(int ),Ctextid(int),info(Json),created(timestamp),createdby.(varchar)...第一个 json 由 coder 在较早创建的时间创建,第二个 JSON 由审核员在稍后创建时间..第二个 JSON(由审核员)是两者中正确且最后的一个...每对 Billid 和 ctextid 都存在 2 个 json .. JSON 存在于 info 列中 这里的任何输入都会有所帮助! 【参考方案1】:

我不完全确定您如何需要数据,但您正在尝试获取“cCode、dCode、hcps 和 mod”(假设 hcps 实际上是 hcpc)。问题是 cCode、hcpc 和 mod 都在 JSON 的同一级别。 dCode 不是。它从其他属性向下嵌套一层,并且是一对多的关系。这可以展开为具有 1:MANY 关系的两个表,或者可以展开为一个重复 cCode、hcpc 和 mod 值的表。此示例显示了第二个选项:

-- I created a table named FOO and added your JSON as a variant
create temp table foo(v variant);

with
JSON(C_CODE, DGOSES, HCPC, "MOD") as
(
select   "VALUE":cCode::int         as C_CODE
        ,"VALUE":dgoses             as DGOSES
        ,"VALUE":hcpc::int          as HCPS
        ,"VALUE":mod::int           as "MOD"
from foo, lateral flatten(v)
)
select C_CODE, HCPC, "MOD", "VALUE":dCode::string as D_CODE
from JSON, lateral flatten(DGOSES);

这会创建一个像这样的表:

C_CODE    HCPC    MOD    D_CODE
  7832       5     67    Y564
  7832       5     67    U3321
  7832       5     67    U1111 
  7832       5     67    HBB11
900114       1     26    J8923

【讨论】:

非常感谢格雷格!!..这很有帮助..我只是想知道我们是否能得到像 7832,7832,7832,7832,900114 5,5,5,5 这样的结果,1 (67,67,67,67,26)

以上是关于比较 JSON 值并识别差异 -Snowflake SQL的主要内容,如果未能解决你的问题,请参考以下文章

比较两行数据之间的值并仅显示不同的列

用python依次比较2个文件夹内的JSON文件的差异?

比较两个 NSDictionaries 并找出差异

比较列表值并选择每天的最大值-Python

比较两个集合的 mongo 差异

根据 acf 和 pacf 图确定 p、q 值并根据图识别 SARIMA 的参数