如何使用 Python 从导入的 csv 计算纬度/经度点之间的距离?

Posted

技术标签:

【中文标题】如何使用 Python 从导入的 csv 计算纬度/经度点之间的距离?【英文标题】:How to use Python to calculate the distance between lat/long points from imported csv? 【发布时间】:2020-11-17 19:32:44 【问题描述】:

我正在尝试导入包含四列位置数据(纬度/经度)的 .csv,计算点之间的距离,将距离写入新列,将函数循环到下一组坐标,然后写入输出数据帧到一个新的 .csv。我编写了以下代码。 我在完成这些步骤后遇到了错误。

示例数据:

lat1       lon1        lat2       lon2
33.58144   -57.73018   32.44873   -99.46281
25.46212   -46.62017   34.64971   -96.70271
39.97521   -80.27027   68.69710   -83.27182
42.74529   -73.73028   36.17318   -28.18201

代码:

import pandas as pd
import numpy as np
input_file = "input.csv"
output_file = "output.csv"
df = pd.read_csv(input_file)                       #Dataframe specification
df = df.convert_objects(convert_numeric = True)

def dist_from_coordinates(lat1, lon1, lat2, lon2):
  R = 6371  # Earth radius in km

  #conversion to radians
  d_lat = np.radians(lat2-lat1)
  d_lon = np.radians(lon2-lon1)

  r_lat1 = np.radians(lat1)
  r_lat2 = np.radians(lat2)

  #haversine formula
  a = np.sin(d_lat/2.) **2 + np.cos(r_lat1) * np.cos(r_lat2) * np.sin(d_lon/2.)**2

  haversine = 2 * R * np.arcsin(np.sqrt(a))

  return haversine

new_column = []                    #empty column for distance
for index,row in df.iterrows():
  lat1 = row['lat1'] #first row of location.lat column here
  lon1 = row['lon1'] #first row of location.long column here
  lat2 = row['lat2'] #second row of location.lat column here
  lon2 = row['lon2'] #second row of location.long column here
  value = dist_from_coordinates(lat1, lon1, lat2, lon2)  #get the distance
  new_column.append(value)   #append the empty list with distance values

df.insert(4,"Distance",new_column)  #4 is the index where you want to place your column. Column index starts with 0. "Distance" is the header and new_column are the values in the column.

with open(output_file,'ab') as f:
  df.to_csv(f,index = False)       #creates the output.csv

输出:

因此,经过操作后,output.csv 文件是一个单独的文件,其中包含所有前 4 列以及第 5 列,即距离。您可以使用 for 循环来执行此操作。我在这里展示的方法读取每一行并计算距离并将其附加到一个空列表中,即新列“距离”并最终创建 output.csv。

错误:

FutureWarning: convert_objects is deprecated.  To re-infer data dtypes for object columns, use DataFrame.infer_objects()
For all other conversions use the data-type specific converters pd.to_datetime, pd.to_timedelta and pd.to_numeric.
  after removing the cwd from sys.path.
TypeError                                 Traceback (most recent call last)
<ipython-input-8-ce103283fa0d> in <module>
     33 
     34 with open(output_file,'ab') as f:
---> 35   df.to_csv(f,index = False)       #creates the output.csv

~/anaconda3/lib/python3.7/site-packages/pandas/core/generic.py in to_csv(self, path_or_buf, sep, na_rep, float_format, columns, header, index, index_label, mode, encoding, compression, quoting, quotechar, line_terminator, chunksize, tupleize_cols, date_format, doublequote, escapechar, decimal)
   3018                                  doublequote=doublequote,
   3019                                  escapechar=escapechar, decimal=decimal)
-> 3020         formatter.save()
   3021 
   3022         if path_or_buf is None:

~/anaconda3/lib/python3.7/site-packages/pandas/io/formats/csvs.py in save(self)
    170                 self.writer = UnicodeWriter(f, **writer_kwargs)
    171 
--> 172             self._save()
    173 
    174         finally:

~/anaconda3/lib/python3.7/site-packages/pandas/io/formats/csvs.py in _save(self)
    272     def _save(self):
    273 
--> 274         self._save_header()
    275 
    276         nrows = len(self.data_index)

~/anaconda3/lib/python3.7/site-packages/pandas/io/formats/csvs.py in _save_header(self)
    240         if not has_mi_columns or has_aliases:
    241             encoded_labels += list(write_cols)
--> 242             writer.writerow(encoded_labels)
    243         else:
    244             # write out the mi

TypeError: a bytes-like object is required, not 'str'

类似问题:

Link to Similar Problem

【问题讨论】:

也许我理解错了,但你为什么不直接使用df = pd.read_csv(input_file, delim_whitespace=True) 而不使用包含convert_objects 的以下行? 【参考方案1】:

您应该应用以下更正:

而不是df = df.convert_objects(convert_numeric = True) 放入df[:] = df[:].apply(pd.to_numeric, errors='coerce')

另外,with open(output_file,'ab') as f: 正在以二进制格式打开文件,而应使用 with open(output_file,'w') as f:

那么它应该可以工作。

【讨论】:

以上是关于如何使用 Python 从导入的 csv 计算纬度/经度点之间的距离?的主要内容,如果未能解决你的问题,请参考以下文章

python如何筛选csv文件的内容

如何在 python 上从经度和纬度获取邮政编码?

如何将导入 python 的数据从 csv 文件转换为时间序列?

从csv文件导入python时如何替换数字

如何从 CSV 文件导入数据并将其存储在变量中?

自动将数据从 CSV 导入到 excel/计算表