使用 hasrsine() 雪花函数从最近的位置动态映射数据

Posted

技术标签:

【中文标题】使用 hasrsine() 雪花函数从最近的位置动态映射数据【英文标题】:Using haversine() snowflake function to dynamically map data from closest locations 【发布时间】:2020-06-12 19:27:48 【问题描述】:

我们正在尝试找到一种方法来使用 hasrsine() 函数将某些位置映射到某些数据。这个概念是我们有一些商店位置,然后我们有各个城市的数据。我们想要确定离我们拥有数据的每个商店最近的城市,然后将该城市的数据与商店的数据合并。下面的例子。显然,我们可以编写一个 python 脚本来查找最近的城市并将其作为一列添加到一个表中,但我们希望通过查询/视图来完成此操作,以便在添加新商店/城市时我们不需要重新运行映射脚本。我唯一能想到的就是在我不相信 Snowflake 支持的列中的相关子查询。有没有其他方法可以做到这一点?

谢谢, J

Stores Table:

City          State               Store_Num        Lat                     Lon
Buckhead      GA                     1             33.8399734          -84.4701434
Villanova     PA                     2             40.0369415          -75.365584
Boulder       CO                     3             40.0294202          -105.3101889



Data Table:

Date            Value                    City                        Lat                          Lon
1/1/20          10                      Atlanta                33.7678357          -84.4908155
1/2/20          15                      Atlanta                33.7678357          -84.4908155        
1/3/20          13                      Atlanta                33.7678357          -84.4908155
1/1/20          11                      Denver                 39.7645183          -104.9955382
1/2/20          12                      Denver                 39.7645183          -104.9955382
1/3/20          14                      Denver                 39.7645183          -104.9955382
1/1/20          20                      Philadelphia           40.0026763          -75.258455          
1/2/20          25                      Philadelphia           40.0026763          -75.258455                          
1/3/20          22                      Philadelphia           40.0026763          -75.258455          
1/1/20          5                       Atlantic City          39.376672            -74.4879282        
1/2/20          7                       Atlantic City          39.376672            -74.4879282        
1/3/20          10                      Atlantic City          39.376672            -74.4879282


Desired Outcome:

Date          Store_Num        Data_City            Data Value         Data_Distance
1/1/20        1                Atlanta                 10               8,248
1/2/20        1                Atlanta                 15               8,248     
1/3/20        1                Atlanta                 13               8,248
1/1/20        3                Denver                  11               39,864
1/2/20        3                Denver                  12               39,864
1/3/20        3                Denver                  14               39,864
1/1/20        2                Philadelphia            20               9,889
1/2/20        2                Philadelphia            25               9,889     
1/3/20        2                Philadelphia            22               9,889

【问题讨论】:

【参考方案1】:

我不知道大西洋城在你的输出中去了哪里,但如果你有一个小数据,你可以使用以下查询:

WITH stores (City,State,Store_Num,Lat,Lon) AS (
SELECT * FROM VALUES
('Buckhead','GA',1,33.8399734,-84.4701434),
('Villanova','PA',2,40.0369415,-75.365584),
('Boulder','CO',3,40.0294202,-105.3101889)
)
, data_table (Date,Value,City,Lat,Lon)
AS (
SELECT * FROM VALUES
('1/1/20',10,'Atlanta',33.7678357,-84.4908155),
('1/2/20',15,'Atlanta',33.7678357,-84.4908155),      
('1/3/20',13,'Atlanta',33.7678357,-84.4908155),
('1/1/20',11,'Denver',39.7645183,-104.9955382),
('1/2/20',12,'Denver',39.7645183,-104.9955382),
('1/3/20',14,'Denver',39.7645183,-104.9955382),
('1/1/20',20,'Philadelphia',40.0026763,-75.258455),          
('1/2/20',25,'Philadelphia',40.0026763,-75.258455),                          
('1/3/20',22,'Philadelphia',40.0026763,-75.258455),         
('1/1/20',5,'Atlantic City',39.376672,-74.4879282),        
('1/2/20',7,'Atlantic City',39.376672,-74.4879282),        
('1/3/20',10,'Atlantic City',39.376672,-74.4879282)
)
SELECT d.date, s.Store_Num,d.City, d.value, 
haversine( s.lat, s.lon, d.lat, d.lon) distance
FROM data_table d, stores s
qualify row_number() over (partition by  d.city, d.date order by haversine( s.lat, s.lon, d.lat, d.lon)  ) = 1;

关于 QUALIFY 检查以下文档:

https://docs.snowflake.com/en/sql-reference/constructs/qualify.html

输出是:

+--------+-----------+---------------+-------+---------------+
| DATE   | STORE_NUM | CITY          | VALUE | DISTANCE      |
+--------+-----------+---------------+-------+---------------+
| 1/1/20 | 1         | Atlanta       | 10    | 8.245620101   |
| 1/2/20 | 1         | Atlanta       | 15    | 8.245620101   |
| 1/3/20 | 1         | Atlanta       | 13    | 8.245620101   |
| 1/1/20 | 2         | Atlantic City | 5     | 105.009087658 |
| 1/2/20 | 2         | Atlantic City | 7     | 105.009087658 |
| 1/3/20 | 2         | Atlantic City | 10    | 105.009087658 |
| 1/1/20 | 3         | Denver        | 11    | 39.851626235  |
| 1/2/20 | 3         | Denver        | 12    | 39.851626235  |
| 1/3/20 | 3         | Denver        | 14    | 39.851626235  |
| 1/1/20 | 2         | Philadelphia  | 20    | 9.886319193   |
| 1/2/20 | 2         | Philadelphia  | 25    | 9.886319193   |
| 1/3/20 | 2         | Philadelphia  | 22    | 9.886319193   |
+--------+-----------+---------------+-------+---------------+

【讨论】:

谢谢!!不是一直到那里,但你非常非常接近。大西洋城不在那里,因为我们需要最近的 CITY FOR EACH STORE 而不是最近的 STORE FOR EACH CITY。我所做的只是将分区从 d.city 更改为 s.store_num 就解决了。

以上是关于使用 hasrsine() 雪花函数从最近的位置动态映射数据的主要内容,如果未能解决你的问题,请参考以下文章

从 pandas 数据框到元组(对于 hasrsine 模块)

Laravel Eloquent - hasrsine 公式和分页

可以从雪花中的函数调用存储过程吗

从 XML 获取数据 - XMLGet 函数使用属性名称及其在雪花中的值

需要专家帮助解决空间数据查询的细微变化

雪花 - 如何使用函数显示列名?