如何计算与栅格单元相交的矢量几何图形的数量?

Posted

技术标签:

【中文标题】如何计算与栅格单元相交的矢量几何图形的数量?【英文标题】:How do I count the number of vector geometry that intersect a raster cell? 【发布时间】:2021-08-17 18:41:24 【问题描述】:

或者:寻找一种更快、更准确的方法将小型 OpenStreetMap 提取物栅格化为人口加权栅格。

我想将一个小的 .pbf 文件转换为 GeoTiff,这样可以更轻松地进行进一步的空间分析。出于这个问题的目的,我将限制处理多边形几何的要求,因为我已经找到了a solution that works very well for lines。它工作得很好,我正在考虑将我所有的多边形转换成线。

举一个我想转换的数据类型的例子:

wget https://download.geofabrik.de/europe/liechtenstein-latest.osm.pbf
osmium tags-filter liechtenstein-latest.osm.pbf landuse=grass -o liechtenstein_grass.pbf
ogr2ogr -t_srs EPSG:3857 liechtenstein_grass.sqlite -dsco SPATIALITE=YES -nln multipolygons -nlt POLYGON -skipfailures liechtenstein_grass.pbf

我在这里找到了一个区域统计脚本,我们或许可以构建它来解决这个问题:http://pcjericks.github.io/py-gdalogr-cookbook/raster_layers.html#calculate-zonal-statistics

上面的脚本需要一个矢量图层和一个栅格层,通过裁剪栅格并对其进行一些统计来迭代矢量特征。

我想计算与每个光栅像素相交的矢量特征的数量,而不是普通的区域统计。我有一个全局光栅网格 Int32,每个像素都有一个唯一值。

qgis_process run native:creategrid -- TYPE=2 EXTENT="-20037760, -8399416, 20037760, 18454624 [EPSG:3857]" HSPACING=1912 VSPACING=1912 HOVERLAY=0 VOVERLAY=0 CRS="EPSG:3857" OUTPUT="grid.gpkg"

sqlite3 land.gpkg
SELECT load_extension("mod_spatialite");
alter table output add column ogcfod int;
update output set ogcfod = fid;

gdal_rasterize -l output -a ogcfod -tap -tr 1912.0 1912.0 -a_nodata 0.0 -ot Int32 -of GTiff -co COMPRESS=DEFLATE -co PREDICTOR=2 grid.gpkg grid.tif -te -20037760 -8399416 20037760 18454624

所以我在想,如果我们仍然可以迭代矢量特征(因为矢量特征的数量要少得多,而且栅格网格中有 88m+ 区域),它的性能可能会好得多。

我们想要一个脚本脚本,它采用矢量图层和光栅层,迭代矢量特征,查找特征覆盖的所有像素的值,然后将一个添加到字典中:px_id: qty

但是,当尝试使此脚本正常工作时,它一直给我相同的几何图形...它只会一遍又一遍地向我显示 1 个像素 ID

import sys
import gdal
import numpy
import ogr
import osr
from rich import inspect, print


def zonal_stats(feat, lyr, raster):
    # Get raster georeference info
    transform = raster.GetGeoTransform()
    xOrigin = transform[0]
    yOrigin = transform[3]
    pixelWidth = transform[1]
    pixelHeight = transform[5]

    # Reproject vector geometry to same projection as raster
    sourceSR = lyr.GetSpatialRef()
    targetSR = osr.SpatialReference()
    targetSR.ImportFromWkt(raster.GetProjectionRef())
    coordTrans = osr.CoordinateTransformation(sourceSR, targetSR)
    feat = lyr.GetNextFeature()
    geom = feat.GetGeometryRef()
    geom.Transform(coordTrans)

    # Get extent of feat
    geom = feat.GetGeometryRef()
    if geom.GetGeometryName() == "MULTIPOLYGON":
        count = 0
        pointsX = []
        pointsY = []
        for polygon in geom:
            geomInner = geom.GetGeometryRef(count)
            ring = geomInner.GetGeometryRef(0)
            numpoints = ring.GetPointCount()
            for p in range(numpoints):
                lon, lat, z = ring.GetPoint(p)
                pointsX.append(lon)
                pointsY.append(lat)
            count += 1
    elif geom.GetGeometryName() == "POLYGON":
        ring = geom.GetGeometryRef(0)
        numpoints = ring.GetPointCount()
        pointsX = []
        pointsY = []
        for p in range(numpoints):
            lon, lat, z = ring.GetPoint(p)
            pointsX.append(lon)
            pointsY.append(lat)

    else:
        sys.exit("ERROR: Geometry needs to be either Polygon or Multipolygon")

    xmin = min(pointsX)
    xmax = max(pointsX)
    ymin = min(pointsY)
    ymax = max(pointsY)

    print(xmin, xmax)
    print(ymin, ymax)

    # Specify offset and rows and columns to read
    xoff = int((xmin - xOrigin) / pixelWidth)
    yoff = int((yOrigin - ymax) / pixelWidth)
    xcount = int((xmax - xmin) / pixelWidth) + 1
    ycount = int((ymax - ymin) / pixelWidth) + 1

    # Create memory target raster
    target_ds = gdal.GetDriverByName("MEM").Create(
        "", xcount, ycount, 1, gdal.GDT_Int32
    )
    target_ds.SetGeoTransform(
        (
            xmin,
            pixelWidth,
            0,
            ymax,
            0,
            pixelHeight,
        )
    )

    # Create for target raster the same projection as for the value raster
    raster_srs = osr.SpatialReference()
    raster_srs.ImportFromWkt(raster.GetProjectionRef())
    target_ds.SetProjection(raster_srs.ExportToWkt())

    # Rasterize zone polygon to raster
    gdal.RasterizeLayer(target_ds, [1], lyr, burn_values=[1])

    # Read raster as arrays
    banddataraster = raster.GetRasterBand(1)
    dataraster = banddataraster.ReadAsArray(xoff, yoff, xcount, ycount)

    bandmask = target_ds.GetRasterBand(1)
    datamask = bandmask.ReadAsArray(0, 0, xcount, ycount)

    print(dataraster)

    # Mask zone of raster
    # zoneraster = numpy.ma.masked_array(dataraster, numpy.logical_not(datamask))

    # print(zoneraster)

    # exit()


def loop_zonal_stats(lyr, raster):
    featList = range(lyr.GetFeatureCount())
    statDict = 

    for FID in featList:
        feat = lyr.GetFeature(FID)
        meanValue = zonal_stats(feat, lyr, raster)
        statDict[FID] = meanValue
    return statDict


def main(input_zonal_raster, input_value_polygon):
    raster = gdal.Open(input_zonal_raster)
    shp = ogr.Open(input_value_polygon)
    lyr = shp.GetLayer()

    return loop_zonal_stats(lyr, raster)


if __name__ == "__main__":
    if len(sys.argv) != 3:
        print(
            "[ ERROR ] you must supply two arguments: input-zone-raster-name.tif input-value-shapefile-name.shp "
        )
        sys.exit(1)

    main(sys.argv[1], sys.argv[2])

先前的研究:

https://gis.stackexchange.com/questions/177738/count-overlapping-polygons-including-duplicates https://***.com/a/47443399/697964 如果 gdal_rasterize 可以计算所有与每个像素相交的多边形(而不是固定值),这可能会满足我的需求。 https://github.com/rory/osm-summary-heatmap/blob/main/Makefile https://old.reddit.com/r/gis/comments/4n2q5v/count_overlapping_polygons_qgis/ heatmapkerneldensity 不能很好地工作,或者我没有正确使用它,但它似乎关闭
qgis_process run qgis:heatmapkerneldensityestimation -- INPUT="basen.json.geojson" RADIUS=2868 RADIUS_FIELD=None PIXEL_SIZE=1912 WEIGHT_FIELD=None KERNEL=4 DECAY=0 OUTPUT_VALUE=0 OUTPUT="basen.tif

【问题讨论】:

这似乎很可疑:“def zonal_stats(feat, lyr, raster): (...) feat = lyr.GetNextFeature() ” 也许它不会导致您的代码出现问题,或者甚至确实如此,但无论如何可能不是最清晰的方法..先有一个参数,然后再有一个同名的局部变量?我试图跟踪你的循环代码是如何运行的,也许有一个简单的错误。 嗯,是的,这很奇怪。 for FID in featList: if FID: 似乎已修复它。我想有时 FID 是无?奇怪 酷!这是工作。这比我想象的要容易得多。我会尽快发布答案。感谢蚂蚁的帮助! 【参考方案1】:

这似乎有效。输出为 CSV columns=["px​​_id", "qty"]

"""
python rasterlayerzonalvectorcounts.py grid.tif liechtenstein_grass.sqlite

MIT License
Based on https://github.com/pcjericks/py-gdalogr-cookbook/blob/master/raster_layers.rst#calculate-zonal-statistics
"""

import sys
import osr
import os
import ogr
import numpy
import gdal
import pandas
from joblib import Parallel, delayed
from collections import Counter


def zonal_stats(FID, lyr, raster):
    feat = lyr.GetFeature(FID)

    # Get raster georeference info
    transform = raster.GetGeoTransform()
    xOrigin = transform[0]
    yOrigin = transform[3]
    pixelWidth = transform[1]
    pixelHeight = transform[5]

    # Reproject vector geometry to same projection as raster
    sourceSR = lyr.GetSpatialRef()
    targetSR = osr.SpatialReference()
    targetSR.ImportFromWkt(raster.GetProjectionRef())
    coordTrans = osr.CoordinateTransformation(sourceSR, targetSR)
    geom = feat.GetGeometryRef()
    geom.Transform(coordTrans)

    # Get extent of feat
    geom = feat.GetGeometryRef()
    xmin, xmax, ymin, ymax = geom.GetEnvelope()

    # Specify offset and rows and columns to read
    xoff = int((xmin - xOrigin) / pixelWidth)
    yoff = int((yOrigin - ymax) / pixelWidth)
    xcount = int((xmax - xmin) / pixelWidth) + 1
    ycount = int((ymax - ymin) / pixelWidth) + 1

    # Create memory target raster
    target_ds = gdal.GetDriverByName("MEM").Create(
        "", xcount, ycount, 1, gdal.GDT_Int32
    )
    target_ds.SetGeoTransform(
        (
            xmin,
            pixelWidth,
            0,
            ymax,
            0,
            pixelHeight,
        )
    )

    # Create for target raster the same projection as for the value raster
    raster_srs = osr.SpatialReference()
    raster_srs.ImportFromWkt(raster.GetProjectionRef())
    target_ds.SetProjection(raster_srs.ExportToWkt())

    # Rasterize zone polygon to raster
    gdal.RasterizeLayer(target_ds, [1], lyr, burn_values=[1])

    # Read raster as arrays
    banddataraster = raster.GetRasterBand(1)
    dataraster = banddataraster.ReadAsArray(xoff, yoff, xcount, ycount)

    bandmask = target_ds.GetRasterBand(1)
    datamask = bandmask.ReadAsArray(0, 0, xcount, ycount)

    # Mask zone of raster
    zoneraster = numpy.ma.masked_array(dataraster, numpy.logical_not(datamask))

    return numpy.array(zoneraster).tolist()


def loop_zonal_stats(input_value_polygon, input_zonal_raster):
    shp = ogr.Open(input_value_polygon)
    lyr = shp.GetLayer()
    print("Processing", lyr.GetFeatureCount(), "features")
    featList = range(lyr.GetFeatureCount())

    def processFID(input_value_polygon, input_zonal_raster, FID):
        shp = ogr.Open(input_value_polygon)
        raster = gdal.Open(input_zonal_raster)
        lyr = shp.GetLayer()
        if FID:
            px_ids = zonal_stats(FID, lyr, raster)
            # print(px_ids)
            px_ids = [item for sublist in px_ids for item in sublist]
            return px_ids

    return Parallel(n_jobs=8)(
        delayed(processFID)(input_value_polygon, input_zonal_raster, FID)
        for FID in featList
    )


if __name__ == "__main__":
    if len(sys.argv) != 3:
        print(
            "[ ERROR ] you must supply two arguments: input-zone-raster-name.tif input-value-shapefile-name.shp "
        )
        sys.exit(1)

    input_zonal_raster = sys.argv[1]
    input_value_polygon = sys.argv[2]

    counts = list(
        filter(None, loop_zonal_stats(input_value_polygon, input_zonal_raster))
    )
    counts = Counter([item for sublist in counts for item in sublist])

    pandas.DataFrame.from_dict(data=counts, orient="index").to_csv(
        os.path.splitext(input_value_polygon)[0] + ".csv", header=False
    )

这将创建一个与源区域 GeoTiff 具有相同网格系统的输出 GeoTiff

我想知道使用What is the purpose of meshgrid in Python / NumPy?是否可以加快速度

"""
python rasterlayerzonalvectorcounts.py grid.tif liechtenstein_grass.sqlite

MIT License
Based on https://github.com/pcjericks/py-gdalogr-cookbook/blob/master/raster_layers.rst#calculate-zonal-statistics
"""

import sys
import osr
import os
import ogr
import numpy
import gdal
import pandas
from joblib import Parallel, delayed
from collections import Counter
from rich import print, inspect


def zonal_stats(FID, lyr, raster):
    feat = lyr.GetFeature(FID)

    # Get raster georeference info
    transform = raster.GetGeoTransform()
    xOrigin = transform[0]
    yOrigin = transform[3]
    pixelWidth = transform[1]
    pixelHeight = transform[5]

    # Get extent of feat
    geom = feat.GetGeometryRef()
    xmin, xmax, ymin, ymax = geom.GetEnvelope()

    # Specify offset and rows and columns to read
    xoff = int((xmin - xOrigin) / pixelWidth)
    yoff = int((yOrigin - ymax) / pixelWidth)
    xcount = int((xmax - xmin) / pixelWidth) + 1
    ycount = int((ymax - ymin) / pixelWidth) + 1

    feat_arr = []
    # if xcount != 1 or ycount != 1:
    # print(xoff, yoff, xcount, ycount)
    for x in range(xcount):
        for y in range(ycount):
            # print(xoff + x, yoff + y)
            feat_arr.append((xoff + x, yoff + y))

    return feat_arr


def loop_zonal_stats(input_value_polygon, input_zonal_raster):
    shp = ogr.Open(input_value_polygon)
    lyr = shp.GetLayer()
    print("Processing", lyr.GetFeatureCount(), "features")
    featList = range(lyr.GetFeatureCount())

    def processFID(input_value_polygon, input_zonal_raster, FID):
        shp = ogr.Open(input_value_polygon)
        raster = gdal.Open(input_zonal_raster)
        lyr = shp.GetLayer()
        if FID:
            px_ids = zonal_stats(FID, lyr, raster)
            return px_ids

    return Parallel(n_jobs=1)(
        delayed(processFID)(input_value_polygon, input_zonal_raster, FID)
        for FID in featList
    )


if __name__ == "__main__":
    if len(sys.argv) != 3:
        print(
            "[ ERROR ] you must supply two arguments: input-zone-raster-name.tif input-value-shapefile-name.shp "
        )
        sys.exit(1)

    input_zonal_raster = sys.argv[1]
    input_value_polygon = sys.argv[2]

    counts = list(
        filter(None, loop_zonal_stats(input_value_polygon, input_zonal_raster))
    )
    counts = Counter([item for sublist in counts for item in sublist])

    raster_srs = osr.SpatialReference()
    raster_srs.ImportFromWkt(gdal.Open(input_zonal_raster).GetProjectionRef())

    raster_arr = numpy.empty((14045, 20960), numpy.int32)
    for px in counts.items():
        # print(px)
        raster_arr[px[0][1]][px[0][0]] = px[1]

    target_ds = gdal.GetDriverByName("GTiff").Create(
        os.path.splitext(input_value_polygon)[0] + ".tif",
        20960,
        14045,
        1,
        gdal.GDT_Int32,
        options=["COMPRESS=LZW"],
    )
    target_ds.SetGeoTransform(gdal.Open(input_zonal_raster).GetGeoTransform())
    target_ds.SetProjection(raster_srs.ExportToWkt())
    target_ds.GetRasterBand(1).WriteArray(raster_arr)
    target_ds.GetRasterBand(1).SetNoDataValue(0)
    target_ds.GetRasterBand(1).GetStatistics(0, 1)
    target_ds.FlushCache()
    target_ds = None

【讨论】:

以上是关于如何计算与栅格单元相交的矢量几何图形的数量?的主要内容,如果未能解决你的问题,请参考以下文章

有人说矢量数据的实质还是栅格数据,怎么理解这句话???

栅格数据结构和矢量数据结构相比较各有啥特点

矢量数据和栅格数据的区别与联系

PIE-Basic 栅格矢量化

比较栅格数据结构和矢量数据结构的优点和缺点?

如何使用arcgis将栅格图转为矢量图,