Redshift - 使用 Python UDF 从 JSON 中提取根密钥

Posted

技术标签:

【中文标题】Redshift - 使用 Python UDF 从 JSON 中提取根密钥【英文标题】:Redshift - Extract root keys from the JSON with Python UDF 【发布时间】:2019-08-25 13:30:17 【问题描述】:

在 Redshift 中,如果我们可以使用 json_extract_path_text 函数提取 JSON。但有时,如果我们不知道列中有多少键,那么就很难知道键列表。

需要一个标量或 Python UDF 来简化此操作。

源数据:

  
    "_id": "5d628b01132beadd7e2ede3e",
    "index": 0,
    "guid": "a2a351a1-3cca-40e1-8b2b-1197e76373fb",
    "isActive": true,
    "balance": "$3,771.66",
    "picture": "http://placehold.it/32x32",
    "age": 28,
    "eyeColor": "blue",
    "name": "Araceli Lang",
    "gender": "female",
    "company": "OLYMPIX",
    "email": "aracelilang@olympix.com",
    "phone": "+1 (817) 552-3696",
    "address": "817 Concord Street, Zeba, Alabama, 5127",
    "about": "Enim nulla sit ea qui exercitation aute do cupidatat mollit incididunt deserunt aute in. Culpa anim eu cillum esse ipsum veniam amet veniam enim nostrud eu et. Enim aute ea duis enim in consectetur nulla amet fugiat id nisi non aliquip. Proident fugiat culpa aute minim dolor esse reprehenderit",
    "registered": "2018-12-27T10:15:14 -06:-30",
    "latitude": 20.920064,
    "longitude": 62.561981,
    "tags": [
        "excepteur",
        "magna",
        "eiusmod",
        "esse",
        "aute",
        "occaecat",
        "consectetur"
    ],
    "friends": [
            "id": 0,
            "name": "Schneider Combs"
        ,
        
            "id": 1,
            "name": "Roseann Buckner"
        ,
        
            "id": 2,
            "name": "Eaton Reid"
        
    ],
    "greeting": "Hello, Araceli Lang! You have 10 unread messages.",
    "favoriteFruit": "banana"
  

【问题讨论】:

【参考方案1】:

这是我用来解决这个问题的简单 Python UDF。

 create or replace function json_list_root_keys (j varchar(max))
        returns varchar(max)
        stable as $$
          import json
          if not j:
            return None
          try:
            js = json.loads(j)
          except ValueError:
            return None
          if len(js) == 0:
            return None
          return json.dumps(js.keys())
        $$ language plpythonu;
create table j_test (j_col varchar(max));

insert into j_test values ('"_id":"5d628b01132beadd7e2ede3e","index":0,"guid":"a2a351a1-3cca-40e1-8b2b-1197e76373fb","isActive":true,"balance":"$3,771.66","picture":"http://placehold.it/32x32","age":28,"eyeColor":"blue","name":"AraceliLang","gender":"female","company":"OLYMPIX","email":"aracelilang@olympix.com","phone":"+1(817)552-3696","address":"817ConcordStreet,Zeba,Alabama,5127","about":"Enimnullasiteaquiexercitationautedocupidatatmollitincididuntdeseruntautein.Culpaanimeucillumesseipsumveniamametveniamenimnostrudeuet.Enimauteeaduiseniminconsecteturnullaametfugiatidnisinonaliquip.Proidentfugiatculpaauteminimdoloressereprehenderit.","registered":"2018-12-27T10:15:14-06:-30","latitude":20.920064,"longitude":62.561981,"tags":["excepteur","magna","eiusmod","esse","aute","occaecat","consectetur"],"friends":["id":0,"name":"SchneiderCombs","id":1,"name":"RoseannBuckner","id":2,"name":"EatonReid"],"greeting":"Hello,AraceliLang!Youhave10unreadmessages.","favoriteFruit":"banana"');

select json_list_root_keys(j_col) from j_test ;

输出:

["guid", "index", "favoriteFruit", "latitude", "company", "email", "picture", "tags", "registered", "eyeColor", "phone", "address", "friends", "isActive", "a
bout", "balance", "name", "gender", "age", "greeting", "longitude", "_id"]

注意:

确保您没有任何 \r \n 字符。否则它不会返回任何值。

同样,如果有人对编写标量 UDF 感兴趣,也请分享。

【讨论】:

以上是关于Redshift - 使用 Python UDF 从 JSON 中提取根密钥的主要内容,如果未能解决你的问题,请参考以下文章

如何在 python udf 中使用 select 查询进行 redshift?

在 Redshift 中使用 python UDF 中的表

Redshift - 使用 Python UDF 从 JSON 中提取根密钥

Psycopg2 是不是允许使用 Python 在 redshift 上运行 udf create 查询?

如何避免 Redshift Python UDF 出现 UnicodeDecodeError ascii 错误?

在 Redshift 中创建 python UDF 时出错