将非规范化表转换为嵌套结构

Posted

技术标签:

【中文标题】将非规范化表转换为嵌套结构【英文标题】:Turning denormalized table into a nested structure 【发布时间】:2021-06-25 14:18:44 【问题描述】:

这里是相当初级的数据分析师

我正在尝试将来自 GA4 的非规范化事件数据转换为对 BI 更友好的嵌套格式。 原始 GA4 数据架构: GA4 schema

起点是事件级别的数据,但是当我尝试基于 user_pseudo_id 创建一个深入的用户仪表板时,我想创建三个抽象层:

    一个用户级别,包含设备信息、整体统计信息和作为​​嵌套重复记录的会话 一个会话级别,包含地理数据、会话长度、访问页面的数量以及会话的所有事件作为嵌套的重复记录 包含时间戳、事件类型和事件特定信息的事件级别。

到目前为止我的代码:

...events_joined_with_transactions AS (
    SELECT
        ue.*,
        t.transaction_id,
        t.currency,
        t.shipping,
        t.tax,
        t.revenue,
        t.unique_items,
        t.total_items,
        t.items
    FROM user_events AS ue
    LEFT JOIN transactions AS t
    ON ue.event_name = "purchase"
    AND ue.user_pseudo_id = t.user_pseudo_id
    AND t.timestamp = ue.timestamp
),

sessions AS (
    SELECT
        user_pseudo_id,
        session_id,
        source_medium,
        campaign_name,
        ARRAY_AGG(
            STRUCT(
                date, 
                timestamp,
                event_name,
                event_specific_info
            )
        ) AS events
    FROM events_joined_with_transactions
    GROUP BY 1, 2, 3, 4
),

users AS (
    SELECT
        user_pseudo_id,
        SUM(IF(event_name != "user_engagement", 1, 0)) AS total_events,
        SUM(IF(event_name = "session_start", 1, 0)) AS sessions,
        SUM(IF(event_name = "page_view", 1, 0)) AS view_page,
        SUM(IF(event_name = "view_item", 1, 0)) AS view_item,
        SUM(IF(event_name = "add_to_cart", 1, 0)) AS add_to_cart,
        SUM(IF(event_name = "remove_from_cart", 1, 0)) AS remove_from_cart,
        SUM(IF(event_name = "add_payment_info", 1, 0)) AS add_payment_info,
        SUM(IF(event_name = "add_shipping_info", 1, 0)) AS add_shipping_info,
        SUM(IF(event_name = "begin_checkout", 1, 0)) AS begin_checkout,
        SUM(IF(event_name = "purchase", 1, 0)) AS transactions,
        SUM(shipping) AS total_shipping,
        SUM(tax) AS total_tax,
        SUM(revenue) AS total_revenue,
        SUM(total_items) AS total_items,
    FROM events_joined_with_transactions
    GROUP BY 1
),

final AS (
    SELECT
        u.user_pseudo_id,
        u.total_events,
        u.sessions,
        u.view_page,
        u.view_item,
        u.add_to_cart,
        u.remove_from_cart,
        u.add_payment_info,
        u.add_shipping_info,
        u.begin_checkout,
        u.transactions,
        u.total_shipping,
        u.total_tax,
        u.total_revenue,
        u.total_items,
        ARRAY_AGG(
            session_id,
            source_medium,
            campaign_name,
            events
        ) AS sessions
    FROM users u
    LEFT JOIN sessions s
    USING(user_pseudo_id)
)

SELECT *
FROM final

但是,我收到以下错误消息:

ARRAY_AGG 的参数不能是数组类型,而是 ARRAY>> >>

为什么这个数组类型无效?

【问题讨论】:

【参考方案1】:

根据你的代码,sessions表中的events字段已经是数组类型,所以不能在最后一个array_agg中使用。

来自官方doc

Supported Argument Types
All data types except ARRAY.

【讨论】:

以上是关于将非规范化表转换为嵌套结构的主要内容,如果未能解决你的问题,请参考以下文章

如何将非规范化表映射到两个实体?

将非规范化数字刷新为零

从 mongoDB 迁移到 clickhouse 中的嵌套数据结构

使用 Symfony 2 序列化器对对象中的嵌套结构进行非规范化

将非规范化数据加载到数据仓库中

使用不同的键规范化嵌套的 json