使用半正弦地理定位创建两个循环的数据框
Posted
技术标签:
【中文标题】使用半正弦地理定位创建两个循环的数据框【英文标题】:Creating a dataframe of two loops using haversine geolocation 【发布时间】:2021-08-19 01:33:08 【问题描述】:我有 3 个人(“成员”)的 df,我想测量这些人与 3 个位置的距离。最终结果将是 df 对所有 3 个人从最近到更远的 3 个位置进行排名。以下是我正在使用的内容以及我所追求的结果:
Out[154]: members
Out[154]:
member_id latitude longitude
0 1 7.1899 52.2080
1 2 -5.9209 37.4827
2 3 83.1072 54.8490
In[155]: locations
Out[155]:
location latitude longitude
0 theater 36.8381 -2.4597
1 bar 41.6561 -0.8773
2 car_shop 37.2829 -5.9209
In[156]: results
Out[156]:
member location_1 distance_1 location_2 distance_2 location_3 distance_3
0 1 theater 9 bar 15 car_shop 17
1 2 car_shop 13 bar 25 theater 35
2 3 bar 16 theater 25 car_shop 41
这是我迄今为止尝试过的,但我不知道如何完成循环以将帧连接到主帧。请帮忙!:
df = []
df_2 = []
for m in range(len(members)):
df_temp_member = pd.DataFrame('member_id': members.iloc[[m]]['member_id']
)
for s in range(len(locations)):
dist = haversine(lon1 = members.iloc[[m]]['longitude']
,lat1 = members.iloc[[m]]['latitude']
,lon2 = locations.iloc[[s]]['longitude']
,lat2 = locations.iloc[[s]]['latitude'])
df_temp = pd.DataFrame('location_name': locations.iloc[[s]]['location_name'],
'Distance': dist,
)
df.append(df_temp)
df = pd.concat(df)
df = df.sort_values(by='Distance', ascending=True, na_position='first').reset_index(drop = True).reset_index(drop = True)
df_temp_1 = pd.DataFrame('location_1': df.iloc[[0]]['location'],
'Distance_1': df.iloc[[0]]['Distance'],
)
df_temp_2 = pd.DataFrame('location_2': df.iloc[[1]]['location'].reset_index(drop = True),
'Distance_2': df.iloc[[1]]['Distance'].reset_index(drop = True),
)
df_temp_3 = pd.DataFrame('location_3': df.iloc[[2]]['location'].reset_index(drop = True),
'Distance_3': df.iloc[[2]]['Distance'].reset_index(drop = True),
)
frames = [df_temp_1, df_temp_2, df_temp_3]
df_2 = pd.concat(frames, axis = 1)
【问题讨论】:
你是如何得到这些距离值的?因为,除非我真的搞砸了我的计算,否则剧院距离成员 1 约 6400 公里(约 4000 英里)。 结果表包含虚构数字,我追求的是更多的循环解决方案 【参考方案1】:交叉merge
将members
中的所有行与locations
中的所有其他行关联,使用haversine_vector
计算距离,sort_values
从最近到最远排序,然后pivot_table
从长到宽格式,最后折叠 MultiIndex:
import pandas as pd
from haversine import haversine_vector, Unit
# Cross Merge To Associate Every Row in Members with Every Other in Locations
df3 = pd.merge(members, locations, how='cross')
# Calculate the haversine distance
df3['distance'] = haversine_vector(df3.filter(like='_x'),
df3.filter(like='_y'),
Unit.KILOMETERS)
# Use a Pivot Table to go from long to wide format
df3 = (
df3.pivot_table(index='member_id',
columns=(
# Create Groups based on Sorted Distance
df3.sort_values('distance', ascending=True)
.groupby('member_id').cumcount() + 1
),
values=['location', 'distance'],
aggfunc='first')
.sort_index(level=[1, 0], axis=1, ascending=(True, False))
)
# Collapse MultiIndex
df3.columns = df3.columns.map(lambda x: '_'.join(map(str, x)))
df3 = df3.reset_index()
df3
:
member_id location_1 distance_1 location_2 distance_2 location_3 distance_3
0 1 theater 6416.753469 bar 6460.611645 car_shop 6725.829125
1 2 theater 6308.847913 bar 6566.958894 car_shop 6579.375371
2 3 bar 4974.492016 car_shop 5516.266902 theater 5523.801936
这里的关键是交叉合并能够在行中进行计算:
df3 = pd.merge(members, locations, how='cross')
df3
:
member_id latitude_x longitude_x location latitude_y longitude_y
0 1 7.1899 52.2080 theater 36.8381 -2.4597
1 1 7.1899 52.2080 bar 41.6561 -0.8773
2 1 7.1899 52.2080 car_shop 37.2829 -5.9209
3 2 -5.9209 37.4827 theater 36.8381 -2.4597
4 2 -5.9209 37.4827 bar 41.6561 -0.8773
5 2 -5.9209 37.4827 car_shop 37.2829 -5.9209
6 3 83.1072 54.8490 theater 36.8381 -2.4597
7 3 83.1072 54.8490 bar 41.6561 -0.8773
8 3 83.1072 54.8490 car_shop 37.2829 -5.9209
【讨论】:
以上是关于使用半正弦地理定位创建两个循环的数据框的主要内容,如果未能解决你的问题,请参考以下文章