如何使用 Python 从导入的 csv 计算纬度/经度点之间的距离?
Posted
技术标签:
【中文标题】如何使用 Python 从导入的 csv 计算纬度/经度点之间的距离?【英文标题】:How to use Python to calculate the distance between lat/long points from imported csv? 【发布时间】:2020-11-17 19:32:44 【问题描述】:我正在尝试导入包含四列位置数据(纬度/经度)的 .csv,计算点之间的距离,将距离写入新列,将函数循环到下一组坐标,然后写入输出数据帧到一个新的 .csv。我编写了以下代码。 我在完成这些步骤后遇到了错误。
示例数据:
lat1 lon1 lat2 lon2
33.58144 -57.73018 32.44873 -99.46281
25.46212 -46.62017 34.64971 -96.70271
39.97521 -80.27027 68.69710 -83.27182
42.74529 -73.73028 36.17318 -28.18201
代码:
import pandas as pd
import numpy as np
input_file = "input.csv"
output_file = "output.csv"
df = pd.read_csv(input_file) #Dataframe specification
df = df.convert_objects(convert_numeric = True)
def dist_from_coordinates(lat1, lon1, lat2, lon2):
R = 6371 # Earth radius in km
#conversion to radians
d_lat = np.radians(lat2-lat1)
d_lon = np.radians(lon2-lon1)
r_lat1 = np.radians(lat1)
r_lat2 = np.radians(lat2)
#haversine formula
a = np.sin(d_lat/2.) **2 + np.cos(r_lat1) * np.cos(r_lat2) * np.sin(d_lon/2.)**2
haversine = 2 * R * np.arcsin(np.sqrt(a))
return haversine
new_column = [] #empty column for distance
for index,row in df.iterrows():
lat1 = row['lat1'] #first row of location.lat column here
lon1 = row['lon1'] #first row of location.long column here
lat2 = row['lat2'] #second row of location.lat column here
lon2 = row['lon2'] #second row of location.long column here
value = dist_from_coordinates(lat1, lon1, lat2, lon2) #get the distance
new_column.append(value) #append the empty list with distance values
df.insert(4,"Distance",new_column) #4 is the index where you want to place your column. Column index starts with 0. "Distance" is the header and new_column are the values in the column.
with open(output_file,'ab') as f:
df.to_csv(f,index = False) #creates the output.csv
输出:
因此,经过操作后,output.csv 文件是一个单独的文件,其中包含所有前 4 列以及第 5 列,即距离。您可以使用 for 循环来执行此操作。我在这里展示的方法读取每一行并计算距离并将其附加到一个空列表中,即新列“距离”并最终创建 output.csv。
错误:
FutureWarning: convert_objects is deprecated. To re-infer data dtypes for object columns, use DataFrame.infer_objects()
For all other conversions use the data-type specific converters pd.to_datetime, pd.to_timedelta and pd.to_numeric.
after removing the cwd from sys.path.
TypeError Traceback (most recent call last)
<ipython-input-8-ce103283fa0d> in <module>
33
34 with open(output_file,'ab') as f:
---> 35 df.to_csv(f,index = False) #creates the output.csv
~/anaconda3/lib/python3.7/site-packages/pandas/core/generic.py in to_csv(self, path_or_buf, sep, na_rep, float_format, columns, header, index, index_label, mode, encoding, compression, quoting, quotechar, line_terminator, chunksize, tupleize_cols, date_format, doublequote, escapechar, decimal)
3018 doublequote=doublequote,
3019 escapechar=escapechar, decimal=decimal)
-> 3020 formatter.save()
3021
3022 if path_or_buf is None:
~/anaconda3/lib/python3.7/site-packages/pandas/io/formats/csvs.py in save(self)
170 self.writer = UnicodeWriter(f, **writer_kwargs)
171
--> 172 self._save()
173
174 finally:
~/anaconda3/lib/python3.7/site-packages/pandas/io/formats/csvs.py in _save(self)
272 def _save(self):
273
--> 274 self._save_header()
275
276 nrows = len(self.data_index)
~/anaconda3/lib/python3.7/site-packages/pandas/io/formats/csvs.py in _save_header(self)
240 if not has_mi_columns or has_aliases:
241 encoded_labels += list(write_cols)
--> 242 writer.writerow(encoded_labels)
243 else:
244 # write out the mi
TypeError: a bytes-like object is required, not 'str'
类似问题:
Link to Similar Problem
【问题讨论】:
也许我理解错了,但你为什么不直接使用df = pd.read_csv(input_file, delim_whitespace=True)
而不使用包含convert_objects
的以下行?
【参考方案1】:
您应该应用以下更正:
而不是df = df.convert_objects(convert_numeric = True)
放入df[:] = df[:].apply(pd.to_numeric, errors='coerce')
另外,with open(output_file,'ab') as f:
正在以二进制格式打开文件,而应使用 with open(output_file,'w') as f:
那么它应该可以工作。
【讨论】:
以上是关于如何使用 Python 从导入的 csv 计算纬度/经度点之间的距离?的主要内容,如果未能解决你的问题,请参考以下文章