按列计算连续行和组的距离

Posted

技术标签:

【中文标题】按列计算连续行和组的距离【英文标题】:Calculate distance of successive row AND group by column 【发布时间】:2021-09-08 14:02:09 【问题描述】:

使用以下公式和结果数据框:

    df['dist'] = haversine(df.LAT.shift(), df.LONG.shift(),df.loc[1:, 'LAT'], df.loc[1:, 'LONG'])

这里定义了半正弦函数:https://***.com/a/40453439/15492238

Group       ID      LAT       LONG         dist
   1         1  74.166061  30.512811          NaN
   1         2  72.249672  33.427724   232.549785
   1         3  67.499828  37.937264   554.905446
   1         4  84.253715  69.328767  1981.896491
   2         5  72.104828  33.823462  1513.397997
   2         6  63.989462  51.918173  1164.481327
   2         7  80.209112  33.530778  1887.256899
   2         8  68.954132  35.981256  1252.531365
   2         9  83.378214  40.619652  1606.340727
   2        10  68.778571   6.607066  1793.921854

我想重写相同的公式,但将它们按组列分组。

预期输出:

 Group       ID      LAT       LONG         dist
   1         1  74.166061  30.512811          NaN
   1         2  72.249672  33.427724   232.549785
   1         3  67.499828  37.937264   554.905446
   1         4  84.253715  69.328767  1981.896491
   2         5  72.104828  33.823462          NaN
   2         6  63.989462  51.918173  1164.481327
   2         7  80.209112  33.530778  1887.256899
   2         8  68.954132  35.981256  1252.531365
   2         9  83.378214  40.619652  1606.340727
   2        10  68.778571   6.607066  1793.921854

【问题讨论】:

【参考方案1】:

您的函数已稍作更改以返回DataFrame,然后groupbyapply 可以完成这项工作:

>>> def haversine(lat1, lon1, lat2, lon2, to_radians=True, earth_radius=6371):
...     if to_radians:
...         lat1, lon1, lat2, lon2 = np.radians([lat1, lon1, lat2, lon2])
...     a = np.sin((lat2-lat1)/2.0)**2+ np.cos(lat1) * np.cos(lat2) * np.sin((lon2-lon1)/2.0)**2
...     return pd.DataFrame(earth_radius *2 * np.arcsin(np.sqrt(a)))

>>> df['dist'] = (df.groupby(["Group"])
...                 .apply(lambda x: haversine(x['LAT'],
...                                            x['LONG'], 
...                                            x['LAT'].shift(),
...                                            x['LONG'].shift())).values)
>>> df
Group   ID  LAT         LONG        dist
0   1   1   74.166061   30.512811   NaN
1   1   2   72.249672   33.427724   232.695882
2   1   3   67.499828   37.937264   555.254059
3   1   4   84.253715   69.328767   1983.141596
4   2   5   72.104828   33.823462   NaN
5   2   6   63.989462   51.918173   1165.212900
6   2   7   80.209112   33.530778   1888.442548
7   2   8   68.954132   35.981256   1253.318254
8   2   9   83.378214   40.619652   1607.349894
9   2   0   68.778571   6.607066    1795.048866

【讨论】:

以上是关于按列计算连续行和组的距离的主要内容,如果未能解决你的问题,请参考以下文章

每个客户的连续行之间的Haversine距离

计算同一列之间的差异,在python中由另一列分组的连续行

SQL 查询 - 计算值大于 X 的连续行数

计算每天 Ms-Sql 总行中的最大连续行

SQL:检测具有相同键的连续行的连续块

处理连续行计算