在 bigquery 中创建一个 udf 以匹配数组输入
Posted
技术标签:
【中文标题】在 bigquery 中创建一个 udf 以匹配数组输入【英文标题】:Creating a udf in bigquery to match array inputs 【发布时间】:2020-06-18 21:47:02 【问题描述】:我正在尝试创建一个 udf 以匹配 biqquery 中的数组,本质上如果 y 数组中有 x 数组的值,那么我希望结果为真。
例如match_rows([4,5,6], [5,6,7] 应该返回 true。
我已经把它写出来了,但我不断收到语法错误,而且我对我正在做的调试工作不够熟悉,所以希望有人能够对正在发生的事情有所了解。
具体错误 = No matching signature for function match_rows for argument types: ARRAY, ARRAY. Supported signature: match_rows(ARRAY, ARRAY) at [44:1]
CREATE TEMP FUNCTION match_rows(arr1 ARRAY<FLOAT64>, arr2 ARRAY<FLOAT64>)
RETURNS BOOL
LANGUAGE js AS
"""
function findCommonElements2(arr1, arr2)
// Create an empty object
let obj = ;
// Loop through the first array
for (let i = 0; i < arr1.length; i++)
// Check if element from first array
// already exist in object or not
if(!obj[arr1[i]])
// If it doesn't exist assign the
// properties equals to the
// elements in the array
const element = arr1[i];
obj[element] = true;
// Loop through the second array
for (let j = 0; j < arr2.length ; j++)
// Check elements from second array exist
// in the created object or not
if(obj[arr2[j]])
return true;
return false;
""";
WITH input AS (
SELECT STRUCT([5,6,7] as row, 'column2' as value) AS test
)
SELECT
match_rows([4,4,6],[4,7,8]),
match_rows(test.row, test.row)
FROM input ```
【问题讨论】:
【参考方案1】:您将 ARRAY<INT64>
传递给您的函数,而不是使用 ARRAY<FLOAT64>
。因此,No matching signature
错误。为了解决这个错误,您可以将数组中的一个值分配为浮点数,语法如下:
WITH input AS (
#notice that the first element is a float and so the whole array is ARRAY<FLOAT64>
SELECT STRUCT([5.0,6,7] as row, 'column2' as value) AS test
)
SELECT
#the same as it was done above, first element explicitly as float
match_rows([4.0,4,6],[4,7,8]),
match_rows(test.row, test.row)
FROM input
但是,我已经测试了您的函数,并且发现您的 javascript UDF 的语法不符合 documentation。语法应该如下:
CREATE TEMP FUNCTION multiplyInputs(x FLOAT64, y FLOAT64)
RETURNS FLOAT64
LANGUAGE js AS """
//write directly your transformations here
return x*y;
""";
注意,不需要指定function findCommonElements2(arr1, arr2)
,你可以直接进入你的函数体,因为函数的名字是在声明CREATE TEMP FUNCTION
之后定义的。
此外,我还发现您的函数没有返回所需的输出。出于这个原因,我编写了一个更简单的 JavaScript UDF,它返回你所期望的。下面是语法和测试:
CREATE TEMP FUNCTION match_rows(arr1 ARRAY<FLOAT64>, arr2 ARRAY<FLOAT64>)
RETURNS BOOL
LANGUAGE js AS
"""
//array of common elements betweent the two arrays
var common_el= [];
for(i=0;i < arr1.length;i++)
for(j=0; j< arr2.length;j++)
if(arr1[i] = arr2[j])
//add to the common_el array when the element is present in both arrays
common_el.push(arr1[i]);
//if the array of common elements has at least one element return true othersie false
if(common_el.length > 0)return true;elsereturn false;
""";
WITH input AS (
SELECT STRUCT([5.0,6,7] as row, 'column2' as value) AS test
)
SELECT
match_rows([4.0,4,6],[4.0,5,7]) as check_1,
match_rows(test.row, test.row) as check_2
FROM input#, unnest(test) as test
还有输出,
Row check_1 check_2
1 true true
【讨论】:
【参考方案2】:以下是 BigQuery 标准 SQL
#standardSQL
CREATE TEMP FUNCTION match_rows(arr1 ANY TYPE, arr2 ANY TYPE) AS (
(SELECT COUNT(1) FROM UNNEST(arr1) el JOIN UNNEST(arr2) el USING(el)) > 0
);
SELECT match_rows([4,5,6], [5,6,7])
【讨论】:
以上是关于在 bigquery 中创建一个 udf 以匹配数组输入的主要内容,如果未能解决你的问题,请参考以下文章
如何在 BigQuery 中创建 UDF?例程名称缺失数据集