谷歌大数据三篇论文啥时候发表的
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了谷歌大数据三篇论文啥时候发表的相关的知识,希望对你有一定的参考价值。
你说的可能是这三个吧:2003年发表了《The Google File System》
2004年发表了《MapReduce: Simplified Data Processing on Large Clusters 》
2006年发表了《Bigtable: A Distributed Storage System for Structured Data》 参考技术A 等会让他赶紧染发剂对人体
左连接以填充谷歌大查询中 2 个表中的数据
【中文标题】左连接以填充谷歌大查询中 2 个表中的数据【英文标题】:Left join to populate data from 2 tables in google big query 【发布时间】:2016-05-17 19:13:45 【问题描述】:下面是 2 个表 RawDebug 和 CarrierDetails。在 RawDebug 中,如果 DebugData 是 VER%,那么 ActualDebugData 是 Verizon,如果 DebugData 是一个数字,首先我们必须用 ' ' 替换其他字符,如 (?, "),然后我们必须查找 CarrierDetails 表以选择其网络 Mcc = substr ("310410",0,3) 和 Mnc = substr ("310410",4,2)。然后将此网络填充到 ActualDebugData。
表原始调试:
HardwareId DebugData ActualDebugData
123 VER% Verizon
456 310410? Bell
表载体详细信息:
Mcc Mnc Network
310 410 Bell
我尝试过的:
SELECT
HardwareId, DebugReason, DebugData,
CASE
WHEN lower(DebugData) LIKE 'ver%' THEN 'Verizon'
WHEN REGEXP_MATCH(DebugData,'\\d+') THEN c.Network
ELSE REGEXP_REPLACE(DebugData,'\\?',' ')
END
AS ActualDebugData
FROM (
SELECT
HardwareId, DebugReason, DebugData,
INTEGER(SUBSTR(DebugData,0,3)) AS d1, INTEGER(SUBSTR(REGEXP_REPLACE(DebugData,'^[a-zA-Z0-9]',' '),4,LENGTH(DebugData)-1)) as d2
FROM TABLE_DATE_RANGE([bigdata:RawDebug.T],TIMESTAMP('2016-05-15'),TIMESTAMP('2016-05-15'))
WHERE DebugReason = 50013
) AS d
LEFT JOIN (
SELECT
Network, Mcc, Mnc
FROM [bigdata:RawDebug.CarrierDetails]
) AS c
ON c.Mcc = d.d1 and c.Mnc = d.d2
LIMIT 400
【问题讨论】:
【参考方案1】:请记住 - 答案通常与问题一样好! 希望这会有所帮助,但看看你的问题的历史 - 这可能不是结束 :o)
SELECT
HardwareId, DebugReason, DebugData,
CASE
WHEN LOWER(DebugData) LIKE 'ver%' THEN 'Verizon'
WHEN REGEXP_MATCH(DebugData,'\\d+') THEN c.Network
ELSE REGEXP_REPLACE(DebugData,'\\?',' ')
END AS ActualDebugData
FROM (
SELECT
HardwareId, DebugReason, DebugData,
INTEGER(SUBSTR (DebugData, 1, 3)) AS d1,
INTEGER(SUBSTR (DebugData, 4, 3)) AS d2
FROM //TABLE_DATE_RANGE([bigdata:RawDebug.T],TIMESTAMP('2016-05-15'),TIMESTAMP('2016-05-15'))
(SELECT 123 AS HardwareId, 'VER%' AS DebugData, 'Verizon' AS ActualDebugData, 50013 AS DebugReason), // sample data
(SELECT 456 AS HardwareId, '310410?' AS DebugData, 'Bell' AS ActualDebugData, 50013 AS DebugReason) // sample data
WHERE DebugReason = 50013
) AS d
LEFT JOIN (
SELECT
Network, Mcc, Mnc
FROM //[bigdata:RawDebug.CarrierDetails]
(SELECT 310 AS Mcc, 410 AS Mnc, 'Bell' AS Network) // sample data
) AS c
ON c.Mcc = d.d1 AND c.Mnc = d.d2
LIMIT 400
输出:
HardwareId DebugReason DebugData ActualDebugData
123 50013 VER% Verizon
456 50013 310410? Bell
【讨论】:
以上是关于谷歌大数据三篇论文啥时候发表的的主要内容,如果未能解决你的问题,请参考以下文章