如何将旧版 SQL BigQuery 转换为标准 SQL?
Posted
技术标签:
【中文标题】如何将旧版 SQL BigQuery 转换为标准 SQL?【英文标题】:How to convert legacy SQL BigQuery to standard SQL? 【发布时间】:2018-09-27 12:48:31 【问题描述】:我一直在尝试将旧版 SQL BigQuery 代码转换为标准 SQL,但我不断收到大量错误。
这是原始的旧版 SQL:
SELECT t.page_path,
t.second_page_path,
t.third_page_path,
t.fourth_page_path,
CONCAT(t.page_path,IF(t.second_page_path IS NULL,"","-"),
IFNULL(t.second_page_path,""),IF(t.third_page_path IS NULL,"","-"),
IFNULL(t.third_page_path,""),IF(t.fourth_page_path IS NULL,"","-"),
IFNULL(t.fourth_page_path,"")) AS full_page_journey,
count(sessionId) AS total_sessions
FROM (
SELECT
CONCAT(fullVisitorId,"-",STRING(visitStartTime)) AS sessionId,
hits.hitNumber,
hits.page.pagePath AS page_path,
LEAD(hits.page.pagePath) OVER (PARTITION BY fullVisitorId, visitStartTime ORDER BY hits.hitNumber) AS second_page_path,
LEAD(hits.page.pagePath,2) OVER (PARTITION BY fullVisitorId, visitStartTime ORDER BY hits.hitNumber) AS third_page_path,
LEAD(hits.page.pagePath,3) OVER (PARTITION BY fullVisitorId, visitStartTime ORDER BY hits.hitNumber) AS fourth_page_path
FROM
TABLE_DATE_RANGE( [xxxxxxx:xxxxxxx.ga_sessions_],
TIMESTAMP('2017-01-01'), TIMESTAMP('2017-01-02') )
WHERE
hits.type="PAGE"
) t
WHERE t.hits.hitNumber=1
GROUP BY t.page_path,
t.second_page_path,
t.third_page_path,
t.fourth_page_path,
full_page_journey
ORDER BY total_sessions DESC
已更新(已编辑):这是迄今为止我能够做到的:
SELECT t.page_path,
t.second_page_path,
t.third_page_path,
t.fourth_page_path,
CONCAT(t.page_path,IF(t.second_page_path IS NULL,"","-"),
IFNULL(t.second_page_path,""),IF(t.third_page_path IS NULL,"","-"),
IFNULL(t.third_page_path,""),IF(t.fourth_page_path IS NULL,"","-"),
IFNULL(t.fourth_page_path,"")) AS full_page_journey,
count(sessionId) AS total_sessions
FROM (
SELECT
CONCAT(fullVisitorId,"-",cast(visitStartTime as string)) AS sessionId,
hits.hitNumber,
hits.page.pagePath AS page_path,
LEAD(hits.page.pagePath) OVER (PARTITION BY fullVisitorId, visitStartTime ORDER BY hits.hitNumber) AS second_page_path,
LEAD(hits.page.pagePath,2) OVER (PARTITION BY fullVisitorId, visitStartTime ORDER BY hits.hitNumber) AS third_page_path,
LEAD(hits.page.pagePath,3) OVER (PARTITION BY fullVisitorId, visitStartTime ORDER BY hits.hitNumber) AS fourth_page_path
FROM
`xxxxxxxxxxx.xxxxxxx.ga_sessions_*`,
UNNEST(hits) AS hits
WHERE
_TABLE_SUFFIX BETWEEN
FORMAT_DATE('%Y%m%d', DATE_ADD(CURRENT_DATE(), INTERVAL -16 DAY))AND
FORMAT_DATE('%Y%m%d', DATE_ADD(CURRENT_DATE(), INTERVAL -1 DAY))AND
hits.type = 'PAGE' ) AS t
WHERE t.hits.hitNumber = 1
GROUP BY t.page_path,
t.second_page_path,
t.third_page_path,
t.fourth_page_path,
full_page_journey
ORDER BY total_sessions DESC
如果有人能帮助找出语法错误,那就太好了。
得到的一些错误包括:
无法访问类型为 ARRAY 的值上的字段 hitNumber
我读到的“_TABLE_SUFFIX”问题与通配符有关。
【问题讨论】:
【参考方案1】:作为起点,DATE_ADD 需要一个日期,但您要给它一个时间戳,而 _TABLE_SUFFIX 需要一个字符串,但您要给它一个日期(有点)。
尝试围绕现有语法使用 CURRENT_DATE() 和 FORMAT_DATE:
FORMAT_DATE('%Y%m%d', DATE_ADD(CURRENT_DATE(), INTERVAL -16 DAY))
这个问题可能对 hitNumber 错误有用:
query-hits-and-custom-dimensions-in-the-bigquery
尝试使用 CTE 而不是子查询,因为它使事情更清晰,更易于调试。
WITH CTE AS
(SELECT
CONCAT(fullVisitorId,"-",cast(visitStartTime as string)) AS sessionId,
hits.hitNumber as hitNumber,
hits.page.pagePath AS page_path,
LEAD(hits.page.pagePath) OVER (PARTITION BY fullVisitorId, visitStartTime
ORDER BY hits.hitNumber) AS second_page_path,
LEAD(hits.page.pagePath,2) OVER (PARTITION BY fullVisitorId, visitStartTime
ORDER BY hits.hitNumber) AS third_page_path,
LEAD(hits.page.pagePath,3) OVER (PARTITION BY fullVisitorId,
visitStartTime ORDER BY hits.hitNumber) AS fourth_page_path
FROM
`xxxxxxxxxxx.xxxxxxx.ga_sessions_*`,
UNNEST(hits) AS hits
WHERE
_TABLE_SUFFIX BETWEEN
FORMAT_DATE('%Y%m%d', DATE_ADD(CURRENT_DATE(), INTERVAL -16 DAY))AND
FORMAT_DATE('%Y%m%d', DATE_ADD(CURRENT_DATE(), INTERVAL -1 DAY))AND
hits.type = 'PAGE' )
SELECT page_path,
second_page_path,
third_page_path,
fourth_page_path,
CONCAT(page_path,IF(second_page_path IS NULL,"","-"),
IFNULL(second_page_path,""),IF(third_page_path IS NULL,"","-"),
IFNULL(third_page_path,""),IF(fourth_page_path IS NULL,"","-"),
IFNULL(fourth_page_path,"")) AS full_page_journey,
count(sessionId) AS total_sessions
FROM CTE
WHERE hitNumber = 1
GROUP BY page_path,
second_page_path,
third_page_path,
fourth_page_path,
full_page_journey
ORDER BY total_sessions DESC
【讨论】:
以上是关于如何将旧版 SQL BigQuery 转换为标准 SQL?的主要内容,如果未能解决你的问题,请参考以下文章
标准 sql 中的 bigquery 旧版 sql POSITION() 函数
如何在 BigQuery 中将此旧版 SQL 转换为标准 SQL?
无法在 Google BigQuery 中将此旧版 SQL 转换为标准 SQL