如何计算用户在某个车站上下车，地图匹配方法

Posted 2021-01-09

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了如何计算用户在某个车站上下车，地图匹配方法相关的知识，希望对你有一定的参考价值。

具体需求
基于电信的位置融合数据，分析用户是否通过火车出行，以及火车出行的相关信息，如乘车车次、上车站、下车站等。

数据描述
1、用户定位数据
手机号码
业务开始时间
城市编码
经纬度
基站标识
网格号
业务类型
事件类型
数据源
天分区
2、火车停靠数据
车次
车站序号
车站名称
行驶时间
到站时间
行驶里程
经纬度

Python代码如下：
输入经纬度和编码级别计算对应的geohash编码

from math import log10

__base32 = ‘0123456789bcdefghjkmnpqrstuvwxyz‘
__decodemap = { }
for i in range(len(__base32)):
    __decodemap[__base32[i]] = i
del i

def decode_exactly(geohash):
    lat_interval, lon_interval = (-90.0, 90.0), (-180.0, 180.0)
    lat_err, lon_err = 90.0, 180.0
    is_even = True
    for c in geohash:
        cd = __decodemap[c]
        for mask in [16, 8, 4, 2, 1]:
            if is_even: # adds longitude info
                lon_err /= 2
                if cd & mask:
                    lon_interval = ((lon_interval[0]+lon_interval[1])/2, lon_interval[1])
                else:
                    lon_interval = (lon_interval[0], (lon_interval[0]+lon_interval[1])/2)
            else:      # adds latitude info
                lat_err /= 2
                if cd & mask:
                    lat_interval = ((lat_interval[0]+lat_interval[1])/2, lat_interval[1])
                else:
                    lat_interval = (lat_interval[0], (lat_interval[0]+lat_interval[1])/2)
            is_even = not is_even
    lat = (lat_interval[0] + lat_interval[1]) / 2
    lon = (lon_interval[0] + lon_interval[1]) / 2
    return lat, lon, lat_err, lon_err

def decode(geohash):

    lat, lon, lat_err, lon_err = decode_exactly(geohash)
    # Format to the number of decimals that are known
    lats = "%.*f" % (max(1, int(round(-log10(lat_err)))) - 1, lat)
    lons = "%.*f" % (max(1, int(round(-log10(lon_err)))) - 1, lon)
    if ‘.‘ in lats: lats = lats.rstrip(‘0‘)
    if ‘.‘ in lons: lons = lons.rstrip(‘0‘)
    return lats, lons

def encode(latitude, longitude, precision=12):
    """
    Encode a position given in float arguments latitude, longitude to
    a geohash which will have the character count precision.
    """
    lat_interval, lon_interval = (-90.0, 90.0), (-180.0, 180.0)
    geohash = []
    bits = [ 16, 8, 4, 2, 1 ]
    bit = 0
    ch = 0
    even = True
    while len(geohash) < precision:
        if even:
            mid = (lon_interval[0] + lon_interval[1]) / 2
            if longitude > mid:
                ch |= bits[bit]
                lon_interval = (mid, lon_interval[1])
            else:
                lon_interval = (lon_interval[0], mid)
        else:
            mid = (lat_interval[0] + lat_interval[1]) / 2
            if latitude > mid:
                ch |= bits[bit]
                lat_interval = (mid, lat_interval[1])
            else:
                lat_interval = (lat_interval[0], mid)
        even = not even
        if bit < 4:
            bit += 1
        else:
            geohash += __base32[ch]
            bit = 0
            ch = 0
    return ‘‘.join(geohash)

上面的decode是解码算法 encode是编码算法

Google S2地图索引算法
编码过程和解码过程如下：

hilbert_map = {
    ‘a‘: {(0, 0): (0, ‘d‘), (0, 1): (1, ‘a‘), (1, 0): (3, ‘b‘), (1, 1): (2, ‘a‘)},
    ‘b‘: {(0, 0): (2, ‘b‘), (0, 1): (1, ‘b‘), (1, 0): (3, ‘a‘), (1, 1): (0, ‘c‘)},
    ‘c‘: {(0, 0): (2, ‘c‘), (0, 1): (3, ‘d‘), (1, 0): (1, ‘c‘), (1, 1): (0, ‘b‘)},
    ‘d‘: {(0, 0): (0, ‘a‘), (0, 1): (3, ‘c‘), (1, 0): (1, ‘d‘), (1, 1): (2, ‘d‘)},
}

un_hilbert_map = {
    ‘a‘: { 0: (0, 0,‘d‘), 1: (0, 1,‘a‘), 3: (1, 0,‘b‘),  2: (1, 1,‘a‘)},
    ‘b‘: { 2: (0, 0,‘b‘), 1: (0, 1,‘b‘), 3: (1, 0,‘a‘),  0: (1, 1,‘c‘)},
    ‘c‘: { 2: (0, 0,‘c‘), 3: (0, 1,‘d‘), 1: (1, 0,‘c‘),  0: (1, 1,‘b‘)},
    ‘d‘: { 0: (0, 0,‘a‘), 3: (0, 1,‘c‘), 1: (1, 0,‘d‘),  2: (1, 1,‘d‘)}
}
#编码def point_to_hilbert(lng,lat, order=16):
    print (‘hilbert‘)
    lng_range = [-180.0, 180.0]
    lat_range = [-90.0, 90.0]

    current_square = ‘a‘
    position = 0

    for i in range(order - 1, -1, -1):
        position <<= 2

        lng_mid = (lng_range[0]+lng_range[1])/2
        lat_mid = (lat_range[0]+lat_range[1])/2

        if lng >= lng_mid :
            quad_x = 1
            lng_range[0] = lng_mid
        else:
            quad_x = 0
            lng_range[1] = lng_mid

        if lat >= lat_mid :
            quad_y = 1
            lat_range[0] = lat_mid
        else:
            quad_y = 0
            lat_range[1] = lat_mid
        quad_position,current_square = hilbert_map[current_square][(quad_x, quad_y)]
        position |= quad_position
    return position
#解码def hilbert_to_point( d , order=16):
    print (‘hilbert‘)
    lng_range = [-180.0, 180.0]
    lat_range = [-90.0, 90.0]

    current_square = ‘a‘
    lng=lat=lng_mid=lat_mid=0
    for i in range(order - 1, -1, -1):

        lng_mid = ( lng_range[0] + lng_range[1] ) / 2
        lat_mid = ( lat_range[0] + lat_range[1] ) / 2

        mask = 3 << (2*i)
        quad_position = (d & mask) >> (2*i)

        quad_x, quad_y, current_square= un_hilbert_map[current_square][quad_position]

        if quad_x:
            lng_range[0] = lng_mid
        else:
            lng_range[1] = lng_mid

        if quad_y:
            lat_range[0] = lat_mid
        else:
            lat_range[1] = lat_mid

    lat = lat_range[0]
    lng = lng_range[0]

    return lng,lat
if __name__ == ‘__main__‘:
    d = point_to_hilbert(-50.555443,77.776655,36)
    print (d)
    lng,lat = hilbert_to_point(d,36)
    print (lng,lat)

获取百度路径规划算法结果的代码如下：
对应的百度地图API
参考代码中的链接

import logging
import sys
import traceback
from time import sleep
import json
import requests
from bs4 import BeautifulSoup
import pymysql
import os.path
import time

def getroadinfo(par):
    linearray = par.split(‘#&;‘)
    startlon = linearray[0]
    startlat = linearray[1]
    endlon = linearray[2]
    endlat = linearray[3]
    strcount = linearray[4]
    #http: // api.map.baidu.com / direction / v1?mode = driving & origin = 上地五街 & destination = 北京大学 & origin_region = 北京 & destination_region = 北京 & output = json & ak = 您的ak
    origin = str(startlat) + "," + str(startlon)
    dest = str(endlat) + "," + str(endlon)
    url = "http://api.map.baidu.com/direction/v1";
    Param = "mode=driving&origin=" + origin + "&destination=" + dest + "&origin_region=武汉&destination_region=武汉&output=json&ak=" + "";
    strurl = url + "?" + Param;
    #请求百度接口返回数据
    try:
        web = requests.get(strurl, timeout=30)
        data = json.loads(web.text)
        if ‘result‘ in data:
            if ‘routes‘ in data[‘result‘]:
                routes = data[‘result‘][‘routes‘]
                listtext = []
                writestr=""
                for route in routes:
                    if ‘steps‘ in route:
                        steps = route[‘steps‘]
                        for step in steps:
                            # area = 0
                            # if ‘area‘ in step:
                            #     area = step[‘area‘]
                            # direction = 0
                            # if ‘direction‘ in step:
                            #     direction = step[‘direction‘]
                            # distance = 0
                            # if ‘distance‘ in step:
                            #     distance = step[‘distance‘]
                            # duration = 0
                            # if ‘duration‘ in step:
                            #     duration = step[‘duration‘]
                            # instructions = ""
                            # if ‘instructions‘ in step:
                            #     instructions = step[‘instructions‘]
                            path = ""
                            if ‘path‘ in step:
                                path = step[‘path‘]
                            # turn = 0
                            # if ‘turn‘ in step:
                            #     turn = step[‘turn‘]
                            # type = 0
                            # if ‘type‘ in step:
                            #     type = step[‘type‘]
                            # stepOriginInstruction = ""
                            # if ‘stepOriginInstruction‘ in step:
                            #     stepOriginInstruction = step[‘stepOriginInstruction‘]
                            # stepDestinationInstruction = ""
                            # if ‘stepDestinationInstruction‘ in step:
                            #     stepDestinationInstruction = step[‘stepDestinationInstruction‘]
                            # traffic_condition = 0
                            # if ‘traffic_condition‘ in step:
                            #     traffic_condition = step[‘traffic_condition‘]
                            # currenttime = time.strftime(‘%Y-%m-%d %H:%M:%S‘,time.localtime(time.time()))
                            # writestr = origin + "#&;" + dest + "#&;" + str(area) +"#&;"+ str(direction) +"#&;"+ str(distance) +"#&;"+ str(duration) +"#&;"+ instructions +"#&;"+ path +"#&;"+ str(turn) +"#&;"+ str(type) +"#&;"+ stepOriginInstruction +"#&;"+ stepDestinationInstruction +"#&;"+ str(traffic_condition) +"#&;"+ currenttime + "\n"
                            writestr = writestr + path + ";"
                print(time.strftime(‘%Y-%m-%d %H:%M:%S‘, time.localtime(time.time())))
                print(writestr)
                writestr = writestr + "\n"
                listtext.append(writestr)
                yushu = int(strcount)%20
                text = "E:/roadinfo/roadinfo"+str(yushu)+".txt"
                file_object = open(text, ‘a‘)
                lock(file_object, LOCK_EX)
                file_object.writelines(listtext)
                unlock(file_object)
                file_object.close()
    except:
        logging.error(‘解析线路信息失败‘)
        traceback.print_exc()

计算步骤
为了计算所有用户可能乘坐的火车班次和上下车车站具体的火车站，具体计算过程如下：
方案1

1、通过火车班次数据中的第三列和第七列

计算出所有的火车运行时速

2、对所有的用户定位经纬度进行geohash编码或者S2编码存成另外一个列
3、参考geohash编码级别对应的定位精度表如下：
Geohash 字符串长度纬度经度纬度误差经度误差 km误差
1 2 3 ±23 ±23 ±2500
2 5 5 ±2.8 ±5.6 ±630
3 7 8 ±0.70 ±0.70 ±78
4 10 10 ±0.087 ±0.18 ±20
5 12 13 ±0.022 ±0.022 ±2.4
6 15 15 ±0.0027 ±0.0055 ±0.61
7 17 18 ±0.00068 ±0.00068 ±0.076
8 20 20 ±0.000085 ±0.00017 ±0.019

如果用户在某一个小时的时间间隔内 geohash的前三位有变动，我们可以认为该用户在高速移动中
4、筛选出所有geohash在一个小时内前三位有变动的点的用户数据，
5、对所有的火车站进行geohash第五级编码
6、如果某个用户被过滤后的数据的geohash编码和某个车站的geohash编码的前五位相同，则判断该用户的定位时间是否在该geohash的车站的到站时间前后半个小时之内，如果在半个小时之内，则判断该用户前后的车站是否在某一个火车形成序列中，如果是则输出。

方案2
1、因为考虑到用户的数据并不是以一定的频率匀速上传，中间有间断的可能，所以只能采用路网匹配的算法，即判断某个用户某个时间的数据点是否在道路网上：
2、参考百度地图API，获取百度地图的铁路网路径规划信息，具体爬虫程序参考我上面的获取路径规划的程序。以火车时刻表的每个班车每个时间段的起点和终点（车站）作为输入获取道路网数据。
3、对所有铁路道路网点进行geohash第五级编码也就是定位精度为2.4公里
4、对所有用户数据的经纬度进行geohash第五级编码
5、对用户时间进行格式转化为小时和分钟两个列
6、对道路网数据进行时间补充操作例如：

A B C

如上图A点经过B点到达C点总共时间一个小时，A到B路程为四分之一，已知A，C时间则B的时间点为A时间加上一刻钟。
之后将所有道路网时间转化为小时和分钟两个列
7、对用户数据表和道路网表的时间数据列和 geohash列添加索引
注意是前缀索引，可以另外加两个列是geohash的前缀，通过前缀匹配会加速。

8、使用铁路网数据的geohash列对用户定位数据进行过滤，去除所有不在火车道路网附近的用户数据
9、对每一组用户数据，和道路网的数据进行 geohash列的join
10、判断时间是否和对应的时间点吻合，吻合即输出

以上是关于如何计算用户在某个车站上下车，地图匹配方法的主要内容，如果未能解决你的问题，请参考以下文章