按列计算连续行和组的距离
Posted
技术标签:
【中文标题】按列计算连续行和组的距离【英文标题】:Calculate distance of successive row AND group by column 【发布时间】:2021-09-08 14:02:09 【问题描述】:使用以下公式和结果数据框:
df['dist'] = haversine(df.LAT.shift(), df.LONG.shift(),df.loc[1:, 'LAT'], df.loc[1:, 'LONG'])
这里定义了半正弦函数:https://***.com/a/40453439/15492238
Group ID LAT LONG dist
1 1 74.166061 30.512811 NaN
1 2 72.249672 33.427724 232.549785
1 3 67.499828 37.937264 554.905446
1 4 84.253715 69.328767 1981.896491
2 5 72.104828 33.823462 1513.397997
2 6 63.989462 51.918173 1164.481327
2 7 80.209112 33.530778 1887.256899
2 8 68.954132 35.981256 1252.531365
2 9 83.378214 40.619652 1606.340727
2 10 68.778571 6.607066 1793.921854
我想重写相同的公式,但将它们按组列分组。
预期输出:
Group ID LAT LONG dist
1 1 74.166061 30.512811 NaN
1 2 72.249672 33.427724 232.549785
1 3 67.499828 37.937264 554.905446
1 4 84.253715 69.328767 1981.896491
2 5 72.104828 33.823462 NaN
2 6 63.989462 51.918173 1164.481327
2 7 80.209112 33.530778 1887.256899
2 8 68.954132 35.981256 1252.531365
2 9 83.378214 40.619652 1606.340727
2 10 68.778571 6.607066 1793.921854
【问题讨论】:
【参考方案1】:您的函数已稍作更改以返回DataFrame
,然后groupby
和apply
可以完成这项工作:
>>> def haversine(lat1, lon1, lat2, lon2, to_radians=True, earth_radius=6371):
... if to_radians:
... lat1, lon1, lat2, lon2 = np.radians([lat1, lon1, lat2, lon2])
... a = np.sin((lat2-lat1)/2.0)**2+ np.cos(lat1) * np.cos(lat2) * np.sin((lon2-lon1)/2.0)**2
... return pd.DataFrame(earth_radius *2 * np.arcsin(np.sqrt(a)))
>>> df['dist'] = (df.groupby(["Group"])
... .apply(lambda x: haversine(x['LAT'],
... x['LONG'],
... x['LAT'].shift(),
... x['LONG'].shift())).values)
>>> df
Group ID LAT LONG dist
0 1 1 74.166061 30.512811 NaN
1 1 2 72.249672 33.427724 232.695882
2 1 3 67.499828 37.937264 555.254059
3 1 4 84.253715 69.328767 1983.141596
4 2 5 72.104828 33.823462 NaN
5 2 6 63.989462 51.918173 1165.212900
6 2 7 80.209112 33.530778 1888.442548
7 2 8 68.954132 35.981256 1253.318254
8 2 9 83.378214 40.619652 1607.349894
9 2 0 68.778571 6.607066 1795.048866
【讨论】:
以上是关于按列计算连续行和组的距离的主要内容,如果未能解决你的问题,请参考以下文章