Python:使用两列计算两点坐标之间的距离
Posted
技术标签:
【中文标题】Python:使用两列计算两点坐标之间的距离【英文标题】:Python: Computing the distance between two point coordinates using two columns 【发布时间】:2021-07-04 17:28:10 【问题描述】:我想计算两个坐标之间的距离。我知道我可以计算两点之间的半正弦距离。但是,我想知道是否有更简单的方法来代替使用迭代整个列的公式创建循环(也会在循环中出错)。
以下是示例的一些数据
# Random values for the duration from one point to another
random_values = random.sample(range(2,20), 8)
random_values
# Creating arrays for the coordinates
lat_coor = [11.923855, 11.923862, 11.923851, 11.923847, 11.923865, 11.923841, 11.923860, 11.923846]
lon_coor = [57.723843, 57.723831, 57.723839, 57.723831, 57.723827, 57.723831, 57.723835, 57.723827]
df = pd.DataFrame(
'duration': random_values,
'latitude': lat_coor,
'longitude': lon_coor
)
df
duration latitude longitude
0 5 11.923855 57.723843
1 2 11.923862 57.723831
2 10 11.923851 57.723839
3 19 11.923847 57.723831
4 16 11.923865 57.723827
5 4 11.923841 57.723831
6 13 11.923860 57.723835
7 3 11.923846 57.723827
为了计算距离,这是我尝试过的:
# Looping over each row to compute the Haversine distance between two points
# Earth's radius (in m)
R = 6373.0 * 1000
lat = df["latitude"]
lon = df["longitude"]
for i in lat:
lat1 = lat[i]
lat2 = lat[i+1]
for j in lon:
lon1 = lon[i]
lon2 = lon[i+1]
dlon = lon2 - lon1
dlat = lat2 - lat1
# Haversine formula
a = math.sin(dlat / 2)**2 + math.cos(lat1) * math.cos(lat2) * math.sin(dlon / 2)**2
c = 2 * math.atan2(math.sqrt(a), math.sqrt(1 - a))
distance = R * c
print(distance) # in m
但是,这是我得到的错误:
计算距离的两个点应该取自同一列。
第一个距离值:
11.923855 57.723843 (point1/observation1)
11.923862 57.723831 (point2/observation2)
秒距离值:
11.923862 57.723831 (point1/observation2)
11.923851 57.723839(point2/observation3)
第三个距离值:
11.923851 57.723839(point1/observation3)
11.923847 57.723831 (point1/observation4)
...(等等)
【问题讨论】:
您需要两个点 (lat1, lon1) 和 (lat2, lon2) 来计算距离。数据框中的每一行只有一个点。你能指定应该用哪个其他点计算距离吗? 我已编辑问题以使这部分更清晰。它们应该来自同一列。 【参考方案1】:好的,首先您可以创建一个数据框,将每个测量值与前一个测量值结合起来:
df2 = pd.concat([df.add_suffix('_pre').shift(), df], axis=1)
df2
这个输出:
duration_pre latitude_pre longitude_pre duration latitude longitude
0 NaN NaN NaN 5 11.923855 57.723843
1 5.0 11.923855 57.723843 2 11.923862 57.723831
2 2.0 11.923862 57.723831 10 11.923851 57.723839
…
然后创建一个haversine
函数并将其应用于行:
def haversine(lat1, lon1, lat2, lon2):
import math
R = 6373.0 * 1000
dlon = lon2 - lon1
dlat = lat2 - lat1
a = math.sin(dlat / 2)**2 + math.cos(lat1) * math.cos(lat2) * math.sin(dlon / 2)**2
return R *2 * math.atan2(math.sqrt(a), math.sqrt(1 - a))
df2.apply(lambda x: haversine(x['latitude_pre'], x['longitude_pre'], x['latitude'], x['longitude']), axis=1)
计算每一行与前一行的距离(因此第一行是 NaN)。
0 NaN
1 75.754755
2 81.120210
3 48.123604
…
而且,如果您想在原始数据框中的一行中包含一个新列:
df['distance'] = pd.concat([df.add_suffix('_pre').shift(), df], axis=1).apply(lambda x: haversine(x['latitude_pre'], x['longitude_pre'], x['latitude'], x['longitude']), axis=1)
输出:
duration latitude longitude distance
0 5 11.923855 57.723843 NaN
1 2 11.923862 57.723831 75.754755
2 10 11.923851 57.723839 81.120210
3 19 11.923847 57.723831 48.123604
4 16 11.923865 57.723827 116.515304
5 4 11.923841 57.723831 154.307571
6 13 11.923860 57.723835 122.794838
7 3 11.923846 57.723827 98.115312
【讨论】:
【参考方案2】:我知道您想获得 df 中所有点之间的成对半正弦距离。这是如何做到的:
在使用这种方法时要小心,因为它会快速生成很多列
设置
import random
random_values = random.sample(range(2,20), 8)
random_values
# Creating arrays for the coordinates
lat_coor = [11.923855, 11.923862, 11.923851, 11.923847, 11.923865, 11.923841, 11.923860, 11.923846]
lon_coor = [57.723843, 57.723831, 57.723839, 57.723831, 57.723827, 57.723831, 57.723835, 57.723827]
df = pd.DataFrame(
'duration': random_values,
'latitude': lat_coor,
'longitude': lon_coor
)
获取弧度
import math
df['lat_rad'] = df.latitude.apply(math.radians)
df['long_rad'] = df.latitude.apply(math.radians)
计算成对距离
from sklearn.metrics.pairwise import haversine_distances
for idx_from, from_point in df.iterrows():
for idx_to, to_point in df.iterrows():
column_name = f"Distance_to_point_idx_from"
haversine_matrix = haversine_distances([[from_point.lat_rad, from_point.long_rad], [to_point.lat_rad, to_point.long_rad]])
point_distance = haversine_matrix[0][1] * 6371000/1000
df.loc[idx_to, column_name] = point_distance
df
duration latitude longitude lat_rad long_rad Distance_to_point_0 Distance_to_point_1 Distance_to_point_2 Distance_to_point_3 Distance_to_point_4 Distance_to_point_5 Distance_to_point_6 Distance_to_point_7
0 3 11.923855 57.723843 0.20811052928038845 0.20811052928038845 0.0 0.0010889626934743966 0.0006222644021223135 0.001244528808978787 0.0015556609862946524 0.002177925427923575 0.000777830496776312 0.0014000949117650525
1 13 11.923862 57.723831 0.2081106514534361 0.2081106514534361 0.0010889626934743966 0.0 0.0017112270955967099 0.002333491502453183 0.0004666982928202561 0.00326688812139797 0.00031113219669808446 0.0024890576052394482
2 14 11.923851 57.723839 0.2081104594672184 0.2081104594672184 0.0006222644021223135 0.0017112270955967099 0.0 0.0006222644068564735 0.002177925388416966 0.0015556610258012616 0.0014000948988986254 0.0007778305096427389
3 4 11.923847 57.723831 0.20811038965404832 0.20811038965404832 0.001244528808978787 0.002333491502453183 0.0006222644068564735 0.0 0.0028001897952734385 0.0009333966189447881 0.002022359305755099 0.0001555661027862654
4 5 11.923865 57.723827 0.20811070381331365 0.20811070381331365 0.0015556609862946524 0.0004666982928202561 0.002177925388416966 0.0028001897952734385 0.0 0.003733586414218225 0.0007778304895183407 0.002955755898059704
5 7 11.923841 57.723831 0.20811028493429318 0.20811028493429318 0.002177925427923575 0.00326688812139797 0.0015556610258012616 0.0009333966189447881 0.003733586414218225 0.0 0.002955755924699886 0.0007778305161585227
6 9 11.92386 57.723835 0.20811061654685106 0.20811061654685106 0.000777830496776312 0.00031113219669808446 0.0014000948988986254 0.002022359305755099 0.0007778304895183407 0.002955755924699886 0.0 0.002177925408541364
7 8 11.923846 57.723827 0.20811037220075576 0.20811037220075576 0.0014000949117650525 0.0024890576052394482 0.0007778305096427389 0.0001555661027862654 0.002955755898059704 0.0007778305161585227 0.002177925408541364 0.0
【讨论】:
【参考方案3】:您将索引与值本身混淆了,因此您遇到了一个关键错误,因为您的示例中没有 lat[i](例如 lat[11.923855])。在将 i 固定为索引后,您的代码将使用您的 [i+1] 超出 lat 和 lon 的最后一行。既然你想将每一行与前一行进行比较,那么从索引 1 开始并回顾 1 怎么样,那么你就不会超出范围。您的代码的此编辑版本不会崩溃:
for i in range(1, len(lat)):
lat1 = lat[i - 1]
lat2 = lat[i]
for j in range(1, len(lon)):
lon1 = lon[i - 1]
lon2 = lon[i]
dlon = lon2 - lon1
dlat = lat2 - lat1
# Haversine formula
a = math.sin(dlat / 2) ** 2 + math.cos(lat1) * math.cos(lat2) * math.sin(dlon / 2) ** 2
c = 2 * math.atan2(math.sqrt(a), math.sqrt(1 - a))
distance = R * c
print(distance) # in m
【讨论】:
以上是关于Python:使用两列计算两点坐标之间的距离的主要内容,如果未能解决你的问题,请参考以下文章