雪花 JSON 到表格

Posted

技术标签:

【中文标题】雪花 JSON 到表格【英文标题】:Snowflake JSON to tabular 【发布时间】:2021-07-15 16:55:21 【问题描述】:

我正在阅读 Snowflake 的文档,但还没有找到解决方案,所以我来找你。我在 Snowflake 中有一个表,其中包含一个存储 JSON 数据的变量列。您知道将变体列上的查询结果动态转换为表格格式的方法吗?

例如我有一个类似的查询

select json_data from database.schema.table limit 2

这会返回类似的东西

JSON_DATA
"EventName": "Test", "EventValue": 100
"EventName": "Test", "EventValue": 200

有没有办法将其作为表格返回而无需引用键?我知道我能做到

select
  json_data['EventName'] EventName, 
  json_data['EventValue'] EventValue
from
  database.schema.table

但我正在寻找更有活力的东西,比如

select * from table(json_to_table(select json_data from database.schema.table)) limit 2

这可能会返回

EventName EventValue
Test 100
Test 200

我正在寻找任何内部解决方案(例如我可能错过的存储过程、udf、雪花函数......除了外部函数之外的任何东西)

【问题讨论】:

问题出在 Snowflake 中,您必须提前定义结果集的列。存储过程不会返回表。用户定义的表函数需要返回列的列表。目前可用的最接近的东西是一个存储过程,用于读取 JSON_DATA 列中的 JSON 属性并在其之上构建一个视图以将 JSON 属性转换为列。它不是动态的,但它确实避免了在视图中手动构建列列表。 【参考方案1】:

虽然目前无法创建动态列列表,但如注释中所述,您可以运行存储过程来构建(和重建)视图。这将避免手动输入和维护一长串列。

在底部创建SP后,可以这样使用:

create or replace table MY_TABLE(JSON_DATA variant);
insert into MY_TABLE select parse_json('"EventName": "Test", "EventValue": 100');
insert into MY_TABLE select parse_json('"EventName": "Test", "EventValue": 200');

call create_view_over_json('MY_TABLE', 'JSON_DATA', 'MY_VIEW');

select * from MY_VIEW;

这是创建视图的存储过程。请注意,如果表非常大,将需要 Snowflake 的 TYPEOF() 函数相当长的时间来确定列类型。如果已知它是一致的,您可以将其指向一个示例表或创建一个限制为 1000 的表。

create or replace procedure create_view_over_json (TABLE_NAME varchar, COL_NAME varchar, VIEW_NAME varchar)
returns varchar
language javascript
as
$$
/****************************************************************************************************************
*                                                                                                               *
* CREATE_VIEW_OVER_JSON - Craig Warman, Alan Eldridge and Greg Pavlik Snowflake Computing, 2019, 2020, 2021     *
*                                                                                                               *
* This stored procedure creates a view on a table that contains JSON data in a column.                          *
* of type VARIANT.  It can be used for easily generating views that enable access to                            *
* this data for BI tools without the need for manual view creation based on the underlying                      *
* JSON document structure.                                                                                      *
*                                                                                                               *
* Parameters:                                                                                                   *
* TABLE_NAME    - Name of table that contains the semi-structured data.                                         *
* COL_NAME      - Name of VARIANT column in the aforementioned table.                                           *
* VIEW_NAME     - Name of view to be created by this stored procedure.                                          *
*                                                                                                               *
* Usage Example:                                                                                                *
* call create_view_over_json('db.schema.semistruct_data', 'variant_col', 'db.schema.semistruct_data_vw');       *
*                                                                                                               *
* Important notes:                                                                                              *
*   - This is the "basic" version of a more sophisticated procedure. Its primary purpose                        *
*     is to illustrate the view generation concept.                                                             *
*   - This version of the procedure does not support:                                                           *
*         - Column case preservation (all view column names will be case-insensitive).                          *
*         - JSON document attributes that are SQL reserved words (like TYPE or NUMBER).                         *
*         - "Exploding" arrays into separate view columns - instead, arrays are simply                          *
*           materialized as view columns of type ARRAY.                                                         *
*   - Execution of this procedure may take an extended period of time for very                                  *
*     large datasets, or for datasets with a wide variety of document attributes                                *
*     (since the view will have a large number of columns).                                                     *
*                                                                                                               *
* Attribution:                                                                                                  *
* I leveraged code developed by Alan Eldridge as the basis for this stored procedure.                           *
*                                                                                                               *
****************************************************************************************************************/

var currentActivity;

try

    currentActivity   = "building the query for column types";
    var elementQuery  = GetElementQuery(TABLE_NAME, COL_NAME);
    
    currentActivity   = "running the query to get column names";
    var elementRS     = GetResultSet(elementQuery);

    currentActivity   = "building the column list";
    var colList       = GetColumnList(elementRS);

    currentActivity   = "building the view's DDL";
    var viewDDL       = GetViewDDL(VIEW_NAME, colList, TABLE_NAME);

    currentActivity   = "creating the view";
    return ExecuteSingleValueQuery("status", viewDDL);

catch(err)
    return "ERROR: Encountered an error while " + currentActivity + ".\n" + err.message;


/****************************************************************************************************************
*                                                                                                               *
*   End of main function. Helper functions below.                                                               *
*                                                                                                               *
****************************************************************************************************************/

function GetElementQuery(tableName, columnName)

// Build a query that returns a list of elements which will be used to build the column list for the CREATE VIEW statement

sql = 
`

SELECT DISTINCT regexp_replace(regexp_replace(f.path,'\\\\[(.+)\\\\]'),'(\\\\w+)','\"\\\\1\"')                      AS path_name,       -- This generates paths with levels enclosed by double quotes (ex: "path"."to"."element").  It also strips any bracket-enclosed array element references (like "[0]")
                DECODE (substr(typeof(f.value),1,1),'A','ARRAY','B','BOOLEAN','I','FLOAT','D','FLOAT','STRING')     AS attribute_type,  -- This generates column datatypes of ARRAY, BOOLEAN, FLOAT, and STRING only
                REGEXP_REPLACE(REGEXP_REPLACE(f.path, '\\\\[(.+)\\\\]'),'[^a-zA-Z0-9]','_')                         AS alias_name       -- This generates column aliases based on the path
FROM
        @~TABLE_NAME~@,
        LATERAL FLATTEN(@~COL_NAME~@, RECURSIVE=>true) f
WHERE   TYPEOF(f.value) != 'OBJECT'
        AND NOT contains(f.path, '[');         -- This prevents traversal down into arrays

`;

    sql = sql.replace(/@~TABLE_NAME~@/g, tableName);
    sql = sql.replace(/@~COL_NAME~@/g, columnName);

    return sql;


function GetColumnList(elementRS)

    /*  
        Add elements and datatypes to the column list
        They will look something like this when added:
            col_name:"name"."first"::STRING as name_first,
            col_name:"name"."last"::STRING as name_last
    */

    var col_list = "";

    while (elementRS.next()) 
        if (col_list != "") 
            col_list += ", \n";
        
        col_list += COL_NAME + ":" + elementRS.getColumnValue("PATH_NAME");         // Start with the element path name
        col_list += "::"           + elementRS.getColumnValue("ATTRIBUTE_TYPE");    // Add the datatype
        col_list += " as "         + elementRS.getColumnValue("ALIAS_NAME");        // And finally the element alias
    
    return col_list;


function GetViewDDL(viewName, columnList, tableName)

sql = 
`
create or replace view @~VIEW_NAME~@ as
select 
    @~COLUMN_LIST~@
from @~TABLE_NAME~@;
`;
    sql = sql.replace(/@~VIEW_NAME~@/g, viewName);
    sql = sql.replace(/@~COLUMN_LIST~@/g, columnList);
    sql = sql.replace(/@~TABLE_NAME~@/g, tableName);

    return sql;


/****************************************************************************************************************
*                                                                                                               *
*   Library functions                                                                                           *
*                                                                                                               *
****************************************************************************************************************/

function ExecuteSingleValueQuery(columnName, queryString) 
    var out;
    cmd1 = sqlText: queryString;
    stmt = snowflake.createStatement(cmd1);
    var rs;
    try
        rs = stmt.execute();
        rs.next();
        return rs.getColumnValue(columnName);
    
    catch(err) 
        throw err;
    
    return out;


function GetResultSet(sql)

    try
        cmd1 = sqlText: sql;
        stmt = snowflake.createStatement(cmd1);
        var rs;
        rs = stmt.execute();
        return rs;
    
    catch(err) 
        throw err;
     

$$;

【讨论】:

以上是关于雪花 JSON 到表格的主要内容,如果未能解决你的问题,请参考以下文章

雪花表中json数据的解析字段将多行插入到新的雪花表中

雪花 - 无法将暂存区域中的 JSON 拆分文件复制到表中

将雪花中的数据卸载到 Postgres 中?

雪花合并对象/json

在雪花中展平可变长度 JSON

从 JSON 复制雪花 - ON_ERROR = CONTINUE - 奇怪的问题