如何根据与已知参考轨迹的距离过滤掉位置数据?

Posted

技术标签:

【中文标题】如何根据与已知参考轨迹的距离过滤掉位置数据?【英文标题】:How to filter out positional data based on distance from a known reference trajectory? 【发布时间】:2020-03-19 10:55:11 【问题描述】:

我有一个需要过滤的 87288 点数据集。数据集的过滤字段是 X 位置和 Y 位置,如纬度和经度。绘制的数据如下所示:

问题是,我只需要事先知道的某个路径的数据。像这样的:

我已经知道如何在 Pandas DF 中过滤数据,但考虑到路径不是线性的,我需要一种有效的策略来清除所有具有一定精度的噪声数据(由于数据集如此之大,手动选择点不是一种选择)。

这是一些示例数据。唯一重要的列是纬度和经度,Y 和 X。

Sesion,Tiempo,Latitud,Longitud,PM2.5,Modo,Hora,DiaSemana
M-O-AM-07OCT19-DMR,2019-10-01 09:48:17.625,3.3659550000000005,-76.5288288,13.0,OUTDOOR,AM,1
M-O-AM-07OCT19-DMR,2019-10-07 08:18:03.555,3.3661757000000003,-76.5289441,12.0,OUTDOOR,AM,0
M-O-AM-07OCT19-DMR,2019-10-07 08:18:04.596,3.3661757000000003,-76.5289441,11.0,OUTDOOR,AM,0
M-O-AM-07OCT19-DMR,2019-10-07 08:18:05.572,3.3661767,-76.5289375,11.0,OUTDOOR,AM,0
M-O-AM-07OCT19-DMR,2019-10-07 08:18:06.614,3.3661790999999996,-76.5289188,11.0,OUTDOOR,AM,0
M-O-AM-07OCT19-DMR,2019-10-07 08:18:07.581,3.3661814,-76.5289024,11.0,OUTDOOR,AM,0
M-O-AM-07OCT19-DMR,2019-10-07 08:18:08.588,3.3661847999999996,-76.52889820000001,11.0,OUTDOOR,AM,0
M-O-AM-07OCT19-DMR,2019-10-07 08:18:09.570,3.3661922,-76.52890450000001,11.0,OUTDOOR,AM,0
M-O-AM-07OCT19-DMR,2019-10-07 08:18:10.579,3.3661922,-76.52890450000001,11.0,OUTDOOR,AM,0
M-O-AM-07OCT19-DMR,2019-10-07 08:18:11.577,3.3662135,-76.52893370000001,12.0,OUTDOOR,AM,0
M-O-AM-07OCT19-DMR,2019-10-07 08:18:12.611,3.3662227999999996,-76.5289516,12.0,OUTDOOR,AM,0
M-O-AM-07OCT19-DMR,2019-10-07 08:18:13.561,3.3662227999999996,-76.5289516,11.0,OUTDOOR,AM,0
M-O-AM-07OCT19-DMR,2019-10-07 08:18:14.631,3.3662346,-76.5289927,11.0,OUTDOOR,AM,0
M-O-AM-07OCT19-DMR,2019-10-07 08:18:15.554,3.3662421,-76.52901440000001,10.0,OUTDOOR,AM,0
M-O-AM-07OCT19-DMR,2019-10-07 08:18:16.623,3.3662523999999996,-76.5290363,10.0,OUTDOOR,AM,0
M-O-AM-07OCT19-DMR,2019-10-07 08:18:17.593,3.3662523999999996,-76.5290363,10.0,OUTDOOR,AM,0
M-O-AM-07OCT19-DMR,2019-10-07 08:18:18.617,3.3662523999999996,-76.5290363,10.0,OUTDOOR,AM,0
M-O-AM-07OCT19-DMR,2019-10-07 08:18:19.608,3.3662523999999996,-76.5290363,10.0,OUTDOOR,AM,0
M-O-AM-07OCT19-DMR,2019-10-07 08:18:20.605,3.3662523999999996,-76.5290363,10.0,OUTDOOR,AM,0
M-O-AM-07OCT19-DMR,2019-10-07 08:18:21.594,3.3662523999999996,-76.5290363,10.0,OUTDOOR,AM,0
M-O-AM-07OCT19-DMR,2019-10-07 08:18:22.608,3.3662523999999996,-76.5290363,10.0,OUTDOOR,AM,0
M-O-AM-07OCT19-DMR,2019-10-07 08:18:23.620,3.3662523999999996,-76.5290363,10.0,OUTDOOR,AM,0
M-O-AM-07OCT19-DMR,2019-10-07 08:18:24.611,3.3662523999999996,-76.5290363,10.0,OUTDOOR,AM,0
M-O-AM-07OCT19-DMR,2019-10-07 08:18:25.622,3.3662523999999996,-76.5290363,10.0,OUTDOOR,AM,0
M-O-AM-07OCT19-DMR,2019-10-07 08:18:26.590,3.3662523999999996,-76.5290363,10.0,OUTDOOR,AM,0
M-O-AM-07OCT19-DMR,2019-10-07 08:18:27.619,3.3662523999999996,-76.5290363,10.0,OUTDOOR,AM,0
M-O-AM-07OCT19-DMR,2019-10-07 08:18:28.595,3.3662523999999996,-76.5290363,10.0,OUTDOOR,AM,0
M-O-AM-07OCT19-DMR,2019-10-07 08:18:29.628,3.3662523999999996,-76.5290363,10.0,OUTDOOR,AM,0
M-O-AM-07OCT19-DMR,2019-10-07 08:18:30.621,3.3662523999999996,-76.5290363,10.0,OUTDOOR,AM,0

我尝试在路线内挑选几个点,然后使用固定的最小距离过滤其余点,就像这样。

import pandas as pd
import random
import matplotlib.pyplot as plt
import seaborn as sns
from cycler import cycler
import numpy as np
from salem import get_demo_file, DataLevels, GoogleVisibleMap, Map
import geopy.distance

def get_dist(coords_1 , coords_2):
    return geopy.distance.distance(coords_1, coords_2).meters

dists=[
    (-76.5297163,3.3665631),
    (-76.5307019,3.3656924),
    (-76.5314718,3.3646900),
    (-76.5319956,3.3638394),
    (-76.5316622,3.3621781),
    (-76.5311999,3.3611796),
    (-76.5308636,3.3599338),
    (-76.5306335,3.3585191),
    (-76.5304758,3.3577502),
    (-76.5303957,3.3561101),
    (-76.5302998,3.3543178),
    (-76.5302220,3.3531897),
    (-76.5302369,3.3515283),
    (-76.5303363,3.3502667),
    (-76.5305351,3.3485951),
    (-76.5306779,3.3475220),
    (-76.5308545,3.3456382),
    (-76.5307738,3.3446934),
    (-76.530618,3.3430422)
]
df = pd.read_csv('movil.csv')


for index, row in df.iterrows():
    if index%1000 ==0:
        print(index)
    mind=None
    for i in dists:
        if mind:
            d=get_dist((row['Latitud'],row['Longitud']),(i[1],i[0]))
            if d<mind:
                mind=d
        else:
            mind=get_dist((row['Latitud'],row['Longitud']),(i[1],i[0]))
    if mind>125:
        df.drop(index, inplace=True)

print(df)

使用这些方法,我设法进行了一些清理,但我觉得很多有用的数据正在被过滤掉。

【问题讨论】:

红色路径是预先知道的,还是您试图从那里的数据点的频率/密度推断“主要”路径? 你好,红色路径是提前知道的。 @尼古拉斯M 您能否在问题中包含一些示例数据,以及您迄今为止为解决此问题而编写的代码?它要求很多其他 SO 用户生成一些真实的样本数据来回答这个问题。 我添加了一些示例数据和我目前解决问题的方法。希望能帮助到你。 @尼古拉斯M 【参考方案1】:

让我们从一些示例数据开始。请注意,纬度和经度以 degrees 记录用于生成和绘图,但以 radians 记录用于计算。

import numpy
import pandas

def add_radians(df):
    return df.assign(**colname.rstrip("_deg"): numpy.radians(col) for colname, col in df.iteritems())

n_ref = 26
ref_traj = pandas.DataFrame("lat_deg": -76 + numpy.linspace(-1, 1, n_ref),
                             "lon_deg":   3 + numpy.linspace(-1, 1, n_ref)**2,
                            ).pipe(add_radians)

n = 500
traj = pandas.DataFrame("lat_deg": -76 + numpy.cumsum(numpy.random.choice([-1, 1], size=n) * 0.05),
                         "lon_deg":   3 + numpy.cumsum(numpy.random.choice([-1, 1], size=n) * 0.05),
                        ).pipe(add_radians)

ax = traj.plot.scatter(x="lat_deg", y="lon_deg")
ax = ref_traj.plot.scatter(x="lat_deg", y="lon_deg", color="red", ax=ax)

接下来,我们可以定义一个向量化函数,返回两点之间的距离。这应该适用于一维或二维数组。

def distance(lat1, lon1, lat2, lon2):
    # TODO: check that shape of lat1, lon1, lat2, lon2 are all compatible.
    R = 6371  # Radius of Earth in kilometers

    # TODO: check this distance calculation

    def hav(theta):
        return numpy.sin(theta)**2

    d_lat = lat2 - lat1
    d_lon = lon2 - lon1
    a = hav(d_lat / 2) + numpy.cos(lat1) * numpy.cos(lat2) * hav(d_lon / 2)
    return 2 * R * numpy.sqrt(a)

然后,我们可以尝试找到从每个轨迹点到任何参考轨迹点的最小距离。这在O(N*M) 的计算上很昂贵,但我们可以通过将参考点和轨迹点广播到二维数组中来对其进行矢量化。

def min_distance(ref_lat, ref_lon, lat, lon):
    shape = (numpy.shape(lat)[0], numpy.shape(ref_lat)[0])

    def broadcasted(a):
        return numpy.broadcast_to(a, shape=shape)

    d = distance(lat1=broadcasted(ref_lat), 
                 lon1=broadcasted(ref_lon), 
                 lat2=broadcasted(lat[:, numpy.newaxis]),
                 lon2=broadcasted(lon[:, numpy.newaxis]))
    return numpy.amin(d, axis=-1)

最后,我们可以选择一个容差并过滤最小距离小于容差的点。

d = min_distance(ref_traj['lat'], ref_traj['lon'], traj['lat'], traj['lon'])
tolerance = 10  # in kilometers
near_ref = d < tolerance

最后,我们可以使用布尔near_ref 掩码过滤traj 数据框:

ax = ref_traj.plot.scatter(x="lat_deg", y="lon_deg", color="red")
traj[near_ref].plot.scatter(x="lat_deg", y="lon_deg", color="blue", ax=ax)
traj[~near_ref].plot.scatter(x="lat_deg", y="lon_deg", color="gray", ax=ax)

【讨论】:

以上是关于如何根据与已知参考轨迹的距离过滤掉位置数据?的主要内容,如果未能解决你的问题,请参考以下文章

根据距离过滤谷歌地图标记

向数学达人求助,如何求到平面上五个任意的点距离最近的点的位置?已知那五个点的位置

Linux下的QT编程,已知两点的经纬度坐标,要让两点动态显示在地图上,呈现出相对位置和运动轨迹,怎么做?

已知经纬度,如何确定其所在的省市呢?急求资料和思路~~~感激涕零

sql server, 已知两组地理位置数据,求第一组每个地理位置2km以内第二组地理位置的个数

cesium轨迹追踪