在循环中创建多个循环的数据框以进行半正弦地理定位
Posted
技术标签:
【中文标题】在循环中创建多个循环的数据框以进行半正弦地理定位【英文标题】:Creating a dataframe of many loops within loops for haversine geolocation 【发布时间】:2021-08-19 00:03:48 【问题描述】:我有一个有 3 人(“成员”)的 df,我想测量这些人与 3 个位置的距离。最终结果将是 df 对所有 3 个人从最近到更远的 3 个位置进行排名。我有所有 3 个人和所有 3 个位置的地理坐标,这是我迄今为止尝试过的,但我不知道如何完成循环以将帧连接到主帧。请帮忙!:
df = []
df_2 = []
for m in range(len(members)):
df_temp_member = pd.DataFrame('member_id': members.iloc[[m]]['member_id']
)
for s in range(len(locations)):
dist = haversine(lon1 = members.iloc[[m]]['longitude']
,lat1 = members.iloc[[m]]['latitude']
,lon2 = locations.iloc[[s]]['Longitude']
,lat2 = locations.iloc[[s]]['Latitude'])
df_temp = pd.DataFrame('location_name': locations.iloc[[s]]['location_name'],
'Distance': dist,
)
df.append(df_temp)
df = pd.concat(df)
df = df.sort_values(by='Distance', ascending=True, na_position='first').reset_index(drop = True).reset_index(drop = True)
df_temp_1 = pd.DataFrame('location_1': df.iloc[[0]]['location_name'],
'Distance_1': df.iloc[[0]]['Distance'],
)
df_temp_2 = pd.DataFrame('location_2': df.iloc[[1]]['location_name'].reset_index(drop = True),
'Distance_2': df.iloc[[1]]['Distance'].reset_index(drop = True),
)
df_temp_3 = pd.DataFrame('location_3': df.iloc[[2]]['location_name'].reset_index(drop = True),
'Distance_3': df.iloc[[2]]['Distance'].reset_index(drop = True),
)
frames = [df_temp_1, df_temp_2, df_temp_3]
df_2 = pd.concat(frames, axis = 1)
【问题讨论】:
请包括可用于测试的members
和locations
样本以及提供输入数据的预期输出。请参阅MRE - Minimal, Reproducible, Example 和How to make good reproducible pandas examples 了解更多信息。
【参考方案1】:
样本数据*:
>>> members
Name Longitude Latitude
0 Sherie 16.196499 44.040776
1 Cathi 107.000799 -7.018167
2 Grissel 118.152148 24.722747
>>> locations
Location Longitude Latitude
0 Quarteira -8.098960 37.102928
1 Weishan 100.307174 25.227212
2 Šuto Orizare 21.429841 41.992429
Haversine 函数** 特别为 Series
修改:
def haversine_series(sr):
lon1, lat1, lon2, lat2 = sr[["Longitude1", "Latitude1", "Longitude2", "Latitude2"]]
lon1, lat1, lon2, lat2 = map(np.radians, [lon1, lat1, lon2, lat2])
dlon = lon2 - lon1
dlat = lat2 - lat1
a = np.sin(dlat / 2.0)**2 + np.cos(lat1) * np.cos(lat2) * np.sin(dlon / 2.0)**2
c = 2 * np.arcsin(np.sqrt(a))
km = 6371 * c
return km
交叉members
和locations
数据帧并计算距离:
distances = members.merge(locations, how="cross", suffixes=('1', '2'))
distances["Distance"] = distances.apply(haversine_series, axis="columns")
>>> distances
Name Longitude1 Latitude1 Location Longitude2 Latitude2 Distance
0 Sherie 16.196499 44.040776 Quarteira -8.098960 37.102928 2182.362810
1 Sherie 16.196499 44.040776 Weishan 100.307174 25.227212 7640.729330
2 Sherie 16.196499 44.040776 Šuto Orizare 21.429841 41.992429 482.470815
3 Cathi 107.000799 -7.018167 Quarteira -8.098960 37.102928 12695.443489
4 Cathi 107.000799 -7.018167 Weishan 100.307174 25.227212 3657.950305
5 Cathi 107.000799 -7.018167 Šuto Orizare 21.429841 41.992429 10165.429008
6 Grissel 118.152148 24.722747 Quarteira -8.098960 37.102928 11135.298789
7 Grissel 118.152148 24.722747 Weishan 100.307174 25.227212 1798.285195
8 Grissel 118.152148 24.722747 Šuto Orizare 21.429841 41.992429 8719.611566
排名:
>>> distances.pivot(index="Location", columns="Name", values="Distance") \
.rank(axis="columns").astype(int)
Name Cathi Grissel Sherie
Location
Quarteira 3 2 1
Weishan 2 1 3
Šuto Orizare 3 2 1
学分:
* 从Mockaroo生成的数据
** 来自https://***.com/a/25767765/15239951
【讨论】:
以上是关于在循环中创建多个循环的数据框以进行半正弦地理定位的主要内容,如果未能解决你的问题,请参考以下文章