在循环中创建多个循环的数据框以进行半正弦地理定位

Posted

技术标签:

【中文标题】在循环中创建多个循环的数据框以进行半正弦地理定位【英文标题】:Creating a dataframe of many loops within loops for haversine geolocation 【发布时间】:2021-08-19 00:03:48 【问题描述】:

我有一个有 3 人(“成员”)的 df,我想测量这些人与 3 个位置的距离。最终结果将是 df 对所有 3 个人从最近到更远的 3 个位置进行排名。我有所有 3 个人和所有 3 个位置的地理坐标,这是我迄今为止尝试过的,但我不知道如何完成循环以将帧连接到主帧。请帮忙!:

df = []
df_2 = []

for m in range(len(members)):

    df_temp_member = pd.DataFrame('member_id': members.iloc[[m]]['member_id']
                                   )

    for s in range(len(locations)):
        dist = haversine(lon1 = members.iloc[[m]]['longitude']
                        ,lat1 = members.iloc[[m]]['latitude']
                        ,lon2 = locations.iloc[[s]]['Longitude']
                        ,lat2 = locations.iloc[[s]]['Latitude'])

        df_temp = pd.DataFrame('location_name': locations.iloc[[s]]['location_name'],
                                'Distance': dist,
                                )

        df.append(df_temp)

    df = pd.concat(df)
    df = df.sort_values(by='Distance', ascending=True, na_position='first').reset_index(drop = True).reset_index(drop = True)

    df_temp_1 = pd.DataFrame('location_1': df.iloc[[0]]['location_name'],
                              'Distance_1': df.iloc[[0]]['Distance'],
                               )

    df_temp_2 = pd.DataFrame('location_2': df.iloc[[1]]['location_name'].reset_index(drop = True),
                              'Distance_2': df.iloc[[1]]['Distance'].reset_index(drop = True),
                               )

    df_temp_3 = pd.DataFrame('location_3': df.iloc[[2]]['location_name'].reset_index(drop = True),
                              'Distance_3': df.iloc[[2]]['Distance'].reset_index(drop = True),
                               )
    
    frames = [df_temp_1, df_temp_2, df_temp_3]

    df_2 = pd.concat(frames, axis = 1)

【问题讨论】:

请包括可用于测试的memberslocations 样本以及提供输入数据的预期输出。请参阅MRE - Minimal, Reproducible, Example 和How to make good reproducible pandas examples 了解更多信息。 【参考方案1】:

样本数据*:

>>> members
      Name   Longitude   Latitude
0   Sherie   16.196499  44.040776
1    Cathi  107.000799  -7.018167
2  Grissel  118.152148  24.722747

>>> locations
       Location   Longitude   Latitude
0     Quarteira   -8.098960  37.102928
1       Weishan  100.307174  25.227212
2  Šuto Orizare   21.429841  41.992429

Haversine 函数** 特别为 Series 修改:

def haversine_series(sr):
    lon1, lat1, lon2, lat2 = sr[["Longitude1", "Latitude1", "Longitude2", "Latitude2"]]
    lon1, lat1, lon2, lat2 = map(np.radians, [lon1, lat1, lon2, lat2])
    dlon = lon2 - lon1
    dlat = lat2 - lat1
    a = np.sin(dlat / 2.0)**2 + np.cos(lat1) * np.cos(lat2) * np.sin(dlon / 2.0)**2
    c = 2 * np.arcsin(np.sqrt(a))
    km = 6371 * c
    return km

交叉memberslocations 数据帧并计算距离:

distances = members.merge(locations, how="cross", suffixes=('1', '2'))
distances["Distance"] = distances.apply(haversine_series, axis="columns")
>>> distances
      Name  Longitude1  Latitude1      Location  Longitude2  Latitude2      Distance
0   Sherie   16.196499  44.040776     Quarteira   -8.098960  37.102928   2182.362810
1   Sherie   16.196499  44.040776       Weishan  100.307174  25.227212   7640.729330
2   Sherie   16.196499  44.040776  Šuto Orizare   21.429841  41.992429    482.470815
3    Cathi  107.000799  -7.018167     Quarteira   -8.098960  37.102928  12695.443489
4    Cathi  107.000799  -7.018167       Weishan  100.307174  25.227212   3657.950305
5    Cathi  107.000799  -7.018167  Šuto Orizare   21.429841  41.992429  10165.429008
6  Grissel  118.152148  24.722747     Quarteira   -8.098960  37.102928  11135.298789
7  Grissel  118.152148  24.722747       Weishan  100.307174  25.227212   1798.285195
8  Grissel  118.152148  24.722747  Šuto Orizare   21.429841  41.992429   8719.611566

排名:

>>> distances.pivot(index="Location", columns="Name", values="Distance") \
             .rank(axis="columns").astype(int)
Name          Cathi  Grissel  Sherie
Location
Quarteira         3        2       1
Weishan           2        1       3
Šuto Orizare      3        2       1

学分

* 从Mockaroo生成的数据

** 来自https://***.com/a/25767765/15239951

【讨论】:

以上是关于在循环中创建多个循环的数据框以进行半正弦地理定位的主要内容,如果未能解决你的问题,请参考以下文章

循环通过过滤的数据框以查看值是不是在列表列中

循环遍历数据框以消除数据中的巨大跳跃的最快方法

循环遍历列表以从 SQL 查询创建多个数据帧

Python 3 函数循环遍历 pandas 数据框以更改模式

Python for循环遍历一列的所有行

循环遍历 Pandas 数据框以填充列表(Python)