使用 hasrsine() 雪花函数从最近的位置动态映射数据
Posted
技术标签:
【中文标题】使用 hasrsine() 雪花函数从最近的位置动态映射数据【英文标题】:Using haversine() snowflake function to dynamically map data from closest locations 【发布时间】:2020-06-12 19:27:48 【问题描述】:我们正在尝试找到一种方法来使用 hasrsine() 函数将某些位置映射到某些数据。这个概念是我们有一些商店位置,然后我们有各个城市的数据。我们想要确定离我们拥有数据的每个商店最近的城市,然后将该城市的数据与商店的数据合并。下面的例子。显然,我们可以编写一个 python 脚本来查找最近的城市并将其作为一列添加到一个表中,但我们希望通过查询/视图来完成此操作,以便在添加新商店/城市时我们不需要重新运行映射脚本。我唯一能想到的就是在我不相信 Snowflake 支持的列中的相关子查询。有没有其他方法可以做到这一点?
谢谢, J
Stores Table:
City State Store_Num Lat Lon
Buckhead GA 1 33.8399734 -84.4701434
Villanova PA 2 40.0369415 -75.365584
Boulder CO 3 40.0294202 -105.3101889
Data Table:
Date Value City Lat Lon
1/1/20 10 Atlanta 33.7678357 -84.4908155
1/2/20 15 Atlanta 33.7678357 -84.4908155
1/3/20 13 Atlanta 33.7678357 -84.4908155
1/1/20 11 Denver 39.7645183 -104.9955382
1/2/20 12 Denver 39.7645183 -104.9955382
1/3/20 14 Denver 39.7645183 -104.9955382
1/1/20 20 Philadelphia 40.0026763 -75.258455
1/2/20 25 Philadelphia 40.0026763 -75.258455
1/3/20 22 Philadelphia 40.0026763 -75.258455
1/1/20 5 Atlantic City 39.376672 -74.4879282
1/2/20 7 Atlantic City 39.376672 -74.4879282
1/3/20 10 Atlantic City 39.376672 -74.4879282
Desired Outcome:
Date Store_Num Data_City Data Value Data_Distance
1/1/20 1 Atlanta 10 8,248
1/2/20 1 Atlanta 15 8,248
1/3/20 1 Atlanta 13 8,248
1/1/20 3 Denver 11 39,864
1/2/20 3 Denver 12 39,864
1/3/20 3 Denver 14 39,864
1/1/20 2 Philadelphia 20 9,889
1/2/20 2 Philadelphia 25 9,889
1/3/20 2 Philadelphia 22 9,889
【问题讨论】:
【参考方案1】:我不知道大西洋城在你的输出中去了哪里,但如果你有一个小数据,你可以使用以下查询:
WITH stores (City,State,Store_Num,Lat,Lon) AS (
SELECT * FROM VALUES
('Buckhead','GA',1,33.8399734,-84.4701434),
('Villanova','PA',2,40.0369415,-75.365584),
('Boulder','CO',3,40.0294202,-105.3101889)
)
, data_table (Date,Value,City,Lat,Lon)
AS (
SELECT * FROM VALUES
('1/1/20',10,'Atlanta',33.7678357,-84.4908155),
('1/2/20',15,'Atlanta',33.7678357,-84.4908155),
('1/3/20',13,'Atlanta',33.7678357,-84.4908155),
('1/1/20',11,'Denver',39.7645183,-104.9955382),
('1/2/20',12,'Denver',39.7645183,-104.9955382),
('1/3/20',14,'Denver',39.7645183,-104.9955382),
('1/1/20',20,'Philadelphia',40.0026763,-75.258455),
('1/2/20',25,'Philadelphia',40.0026763,-75.258455),
('1/3/20',22,'Philadelphia',40.0026763,-75.258455),
('1/1/20',5,'Atlantic City',39.376672,-74.4879282),
('1/2/20',7,'Atlantic City',39.376672,-74.4879282),
('1/3/20',10,'Atlantic City',39.376672,-74.4879282)
)
SELECT d.date, s.Store_Num,d.City, d.value,
haversine( s.lat, s.lon, d.lat, d.lon) distance
FROM data_table d, stores s
qualify row_number() over (partition by d.city, d.date order by haversine( s.lat, s.lon, d.lat, d.lon) ) = 1;
关于 QUALIFY 检查以下文档:
https://docs.snowflake.com/en/sql-reference/constructs/qualify.html
输出是:
+--------+-----------+---------------+-------+---------------+
| DATE | STORE_NUM | CITY | VALUE | DISTANCE |
+--------+-----------+---------------+-------+---------------+
| 1/1/20 | 1 | Atlanta | 10 | 8.245620101 |
| 1/2/20 | 1 | Atlanta | 15 | 8.245620101 |
| 1/3/20 | 1 | Atlanta | 13 | 8.245620101 |
| 1/1/20 | 2 | Atlantic City | 5 | 105.009087658 |
| 1/2/20 | 2 | Atlantic City | 7 | 105.009087658 |
| 1/3/20 | 2 | Atlantic City | 10 | 105.009087658 |
| 1/1/20 | 3 | Denver | 11 | 39.851626235 |
| 1/2/20 | 3 | Denver | 12 | 39.851626235 |
| 1/3/20 | 3 | Denver | 14 | 39.851626235 |
| 1/1/20 | 2 | Philadelphia | 20 | 9.886319193 |
| 1/2/20 | 2 | Philadelphia | 25 | 9.886319193 |
| 1/3/20 | 2 | Philadelphia | 22 | 9.886319193 |
+--------+-----------+---------------+-------+---------------+
【讨论】:
谢谢!!不是一直到那里,但你非常非常接近。大西洋城不在那里,因为我们需要最近的 CITY FOR EACH STORE 而不是最近的 STORE FOR EACH CITY。我所做的只是将分区从 d.city 更改为 s.store_num 就解决了。以上是关于使用 hasrsine() 雪花函数从最近的位置动态映射数据的主要内容,如果未能解决你的问题,请参考以下文章
从 pandas 数据框到元组(对于 hasrsine 模块)
Laravel Eloquent - hasrsine 公式和分页