SQL Server 地理:减小 WKT 文本的大小(小数精度)

Posted

技术标签:

【中文标题】SQL Server 地理:减小 WKT 文本的大小(小数精度)【英文标题】:SQL Server geography: Reduce size (decimal precision) of WKT text 【发布时间】:2020-07-17 11:04:15 【问题描述】:

对于我的农业应用程序,存储过程检索存储为 SQL Server geography 数据类型的围场/田地边界,以显示在用户的移动设备上。

SQL Server 的 .ToString().STAsText() 函数将每个顶点呈现为 lat/long 对,精确到小数点后 15 位。从this answer 开始,15 位小数定义了一个原子宽度内的位置!到最近的米对​​我来说就足够了。

由此产生的过于精确的有效负载非常大而且速度太慢,无法在大型农场中使用。

根据我的 SQL Server 地理数据,我想生成格式化为 4 位或 5 位小数的 WKT。我找不到任何内置方法,但我最好的线索是:

Postgis 和 Google Cloud "BigQuery" 具有 ST_SNAPTOGRID 功能,这将是完美的,并且 正则表达式可能有用,例如this answer,但 SQL Server 似乎没有正则表达式替换。

我认为这是一个常见问题:有简单的解决方案吗?

【问题讨论】:

您使用的是什么版本的 SQL Server? @iamdave - 我使用的是Microsoft SQL Azure 12.0.2000.8,它的兼容性级别为 140(= SQL Server 2017。) 【参考方案1】:

编辑

我相信我可能误解了您的问题,您希望传输 WKT 而不是多边形的二进制表示?如果是这种情况,我在下面的回答仍然会向您展示如何去掉一些小数位(不四舍五入)。只是不要将 stuff(...) FOR XML 包裹在 STGeomFromText 中,您就有了修改后的 WKT。


使用geography 数据类型时,维护一个非常详细的“主”版本会很方便,您可以根据自己的要求从中生成和保留不太详细的版本。

生成这些复杂度降低的多边形的一种简单方法是使用命名有用的 Reduce 函数,我认为它实际上会在这种情况下帮助您。

如果您想走减少小数位数的路线,您要么必须编写自定义 CLR 函数,要么进入 SQL Server 字符串操作的美妙世界!

SQL 查询

declare @DecimalPlaces int = 4; -- Specify the desired number of lat/long decimals

with g as(
    select p.g  -- Original polygon, for comparison purposes
          ,geography::STGeomFromText('POLYGON(('    -- stripped apart and then recreated polygon from text, using a custom string split function.  You won't be able to use the built in STRING_SPLIT here as it doesn't guarantee sort order.
            + stuff((select ', ' + left(s.item,charindex('.',s.item,0) + @DecimalPlaces) + substring(s.item,charindex(' ',s.item,0),charindex('.',s.item,charindex(' ',s.item,0)) - charindex(' ',s.item,0) + 1 + @DecimalPlaces)
                     from dbo.fn_StringSplitMax(replace(replace(p.g.STAsText(),'POLYGON ((',''),'))',''),', ',null) as s
                     for xml path(''), type).value('.', 'NVARCHAR(MAX)')    -- STUFF and FOR XML mimics GROUP_CONCAT functionality seen in other SQL languages, to recombine shortened Points back into a Polygon string
                   ,1,2,''
                   )
            + '))', 4326).MakeValid() as x  -- Remember to make the polygon valid again, as you have been messing with the Point data
    from(values(geography::STGeomFromText('POLYGON((-121.973669 37.365336,-121.97367 37.365336,-121.973642 37.365309,-121.973415 37.365309,-121.973189 37.365309,-121.973002 37.365912,-121.972815 37.366515,-121.972796 37.366532,-121.972776 37.366549,-121.972627 37.366424,-121.972478 37.366299,-121.972422 37.366299,-121.972366 37.366299,-121.972298 37.366356,-121.97223 37.366412,-121.97215 37.366505,-121.97207 37.366598,-121.971908 37.366794,-121.971489 37.367353,-121.971396 37.367484,-121.971285 37.36769,-121.971173 37.367897,-121.971121 37.368072,-121.971068 37.368248,-121.971028 37.36847,-121.970987 37.368692,-121.970987 37.368779,-121.970987 37.368866,-121.970949 37.368923,-121.970912 37.36898,-121.970935 37.36898,-121.970958 37.36898,-121.970975 37.368933,-121.970993 37.368887,-121.971067 37.368807,-121.97114 37.368726,-121.971124 37.368705,-121.971108 37.368685,-121.971136 37.368698,-121.971163 37.368712,-121.97134 37.368531,-121.971516 37.368351,-121.971697 37.368186,-121.971878 37.368021,-121.972085 37.367846,-121.972293 37.36767,-121.972331 37.367629,-121.972369 37.367588,-121.972125 37.367763,-121.97188 37.367938,-121.971612 37.36815,-121.971345 37.368362,-121.971321 37.36835,-121.971297 37.368338,-121.971323 37.368298,-121.97135 37.368259,-121.971569 37.368062,-121.971788 37.367865,-121.971977 37.367716,-121.972166 37.367567,-121.972345 37.367442,-121.972524 37.367317,-121.972605 37.367272,-121.972687 37.367227,-121.972728 37.367227,-121.972769 37.367227,-121.972769 37.367259,-121.972769 37.367291,-121.972612 37.367416,-121.972454 37.367542,-121.972488 37.367558,-121.972521 37.367575,-121.972404 37.367674,-121.972286 37.367773,-121.972194 37.367851,-121.972101 37.367928,-121.972046 37.36799,-121.971991 37.368052,-121.972008 37.368052,-121.972025 37.368052,-121.972143 37.367959,-121.972261 37.367866,-121.972296 37.367866,-121.972276 37.36794,-121.972221 37.36798,-121.972094 37.368097,-121.971966 37.368214,-121.971956 37.368324,-121.971945 37.368433,-121.971907 37.368753,-121.971868 37.369073,-121.97184 37.369578,-121.971812 37.370083,-121.971798 37.370212,-121.971783 37.370342,-121.971542 37.370486,-121.971904 37.370324,-121.972085 37.37028,-121.972266 37.370236,-121.972559 37.370196,-121.972852 37.370155,-121.973019 37.370155,-121.973186 37.370155,-121.973232 37.370136,-121.973279 37.370116,-121.973307 37.370058,-121.973336 37.370001,-121.973363 37.369836,-121.973391 37.369671,-121.973419 37.369227,-121.973446 37.368784,-121.973429 37.368413,-121.973413 37.368041,-121.973361 37.367714,-121.973308 37.367387,-121.973285 37.367339,-121.973262 37.36729,-121.973126 37.3673,-121.972989 37.36731,-121.973066 37.36728,-121.973144 37.367251,-121.973269 37.367237,-121.973393 37.367223,-121.973443 37.367158,-121.973493 37.367093,-121.973518 37.36702,-121.973543 37.366947,-121.973582 37.366618,-121.973622 37.366288,-121.97366 37.365826,-121.973698 37.365363,-121.973669 37.365336))', 4326))) as p(g)
)
-- select various versions of the polygons into the same column for overlay comparison in SSMS
select 'Original' as l
      ,g
from g
union all
select 'Short' as l
      ,x
from g
union all
select 'Original Reduced' as l
      ,g.Reduce(10)
from g
union all
select 'Short Reduced' as l
      ,x.Reduce(10)
from g;

输出

这里值得注意的是geog 二进制表示的长度差异(显示的字符的简单计数)。正如我上面提到的,只需使用Reduce 函数就可以满足您的需求,因此您需要测试各种方法,看看如何最好地减少数据传输。

+------------------+--------------------+------+
|        l         |         g          |  Len |
+------------------+--------------------+------+
| Original         | 0xE6100000010484...| 4290 |
| Short            | 0xE6100000010471...| 3840 |
| Original Reduced | 0xE6100000010418...|  834 |
| Short Reduced    | 0xE610000001041E...| 1184 |
+------------------+--------------------+------+

视觉对比

字符串拆分函数

由于多边形数据可能非常庞大,您需要一个可以处理超过 4k 或 8k 个字符的字符串拆分器。就我而言,我倾向于选择基于 xml 的方法:

create function [dbo].[fn_StringSplitMax]
(
    @str nvarchar(max) = ' '                -- String to split.
    ,@delimiter as nvarchar(max) = ','      -- Delimiting value to split on.
    ,@num as int = null                     -- Which value to return.
)
returns table
as
return
    with s as
    (       -- Convert the string to an XML value, replacing the delimiter with XML tags
        select convert(xml,'<x>' + replace((select @str for xml path('')),@delimiter,'</x><x>') + '</x>').query('.') as s
    )
    select rn
          ,item     -- Select the values from the generated XML value by CROSS APPLYing to the XML nodes
    from(select row_number() over (order by (select null)) as rn
              ,n.x.value('.','nvarchar(max)') as item
        from s
              cross apply s.nodes('x') as n(x)
        ) a
    where rn = @num
        or @num is null;

【讨论】:

是的,我正在寻找传送 WKT。 @Merenzo 根据我回答顶部的编辑,您是否尝试过我回答中的解决方案? 是的@iamdave,它非常适合 POLYGON,我确信可以适应处理 MULTIPOLYGON。 @Merenzo 好消息!如果这是您问题的答案,请将其标记为这样,以使遇到此问题的其他用户受益。【参考方案2】:

通过逐步查看@iamdave 的出色答案,并使用相同的方法,看起来我们只需要在句点上拆分......我认为我们可以忽略所有括号和逗号,并忽略 POLYGON 前缀(这意味着它'将适用于其他 GEOGRAPHY 类型,例如 MULTIPOLYGON。)

即每次我们找到一个句号时,只抓取它后面的 4 个字符,然后扔掉后面的任何数字(直到我们碰到一个非数字。)

这对我有用(使用@iamdave 的测试数据):

DECLARE @wkt NVARCHAR(MAX), @wktShort NVARCHAR(MAX);
DECLARE @decimalPlaces int = 4;
SET @wkt  = 'POLYGON((-121.973669 37.365336,-121.97367 37.365336,-121.973642 37.365309,-121.973415 37.365309,-121.973189 37.365309,-121.973002 37.365912,-121.972815 37.366515,-121.972796 37.366532,-121.972776 37.366549,-121.972627 37.366424,-121.972478 37.366299,-121.972422 37.366299,-121.972366 37.366299,-121.972298 37.366356,-121.97223 37.366412,-121.97215 37.366505,-121.97207 37.366598,-121.971908 37.366794,-121.971489 37.367353,-121.971396 37.367484,-121.971285 37.36769,-121.971173 37.367897,-121.971121 37.368072,-121.971068 37.368248,-121.971028 37.36847,-121.970987 37.368692,-121.970987 37.368779,-121.970987 37.368866,-121.970949 37.368923,-121.970912 37.36898,-121.970935 37.36898,-121.970958 37.36898,-121.970975 37.368933,-121.970993 37.368887,-121.971067 37.368807,-121.97114 37.368726,-121.971124 37.368705,-121.971108 37.368685,-121.971136 37.368698,-121.971163 37.368712,-121.97134 37.368531,-121.971516 37.368351,-121.971697 37.368186,-121.971878 37.368021,-121.972085 37.367846,-121.972293 37.36767,-121.972331 37.367629,-121.972369 37.367588,-121.972125 37.367763,-121.97188 37.367938,-121.971612 37.36815,-121.971345 37.368362,-121.971321 37.36835,-121.971297 37.368338,-121.971323 37.368298,-121.97135 37.368259,-121.971569 37.368062,-121.971788 37.367865,-121.971977 37.367716,-121.972166 37.367567,-121.972345 37.367442,-121.972524 37.367317,-121.972605 37.367272,-121.972687 37.367227,-121.972728 37.367227,-121.972769 37.367227,-121.972769 37.367259,-121.972769 37.367291,-121.972612 37.367416,-121.972454 37.367542,-121.972488 37.367558,-121.972521 37.367575,-121.972404 37.367674,-121.972286 37.367773,-121.972194 37.367851,-121.972101 37.367928,-121.972046 37.36799,-121.971991 37.368052,-121.972008 37.368052,-121.972025 37.368052,-121.972143 37.367959,-121.972261 37.367866,-121.972296 37.367866,-121.972276 37.36794,-121.972221 37.36798,-121.972094 37.368097,-121.971966 37.368214,-121.971956 37.368324,-121.971945 37.368433,-121.971907 37.368753,-121.971868 37.369073,-121.97184 37.369578,-121.971812 37.370083,-121.971798 37.370212,-121.971783 37.370342,-121.971542 37.370486,-121.971904 37.370324,-121.972085 37.37028,-121.972266 37.370236,-121.972559 37.370196,-121.972852 37.370155,-121.973019 37.370155,-121.973186 37.370155,-121.973232 37.370136,-121.973279 37.370116,-121.973307 37.370058,-121.973336 37.370001,-121.973363 37.369836,-121.973391 37.369671,-121.973419 37.369227,-121.973446 37.368784,-121.973429 37.368413,-121.973413 37.368041,-121.973361 37.367714,-121.973308 37.367387,-121.973285 37.367339,-121.973262 37.36729,-121.973126 37.3673,-121.972989 37.36731,-121.973066 37.36728,-121.973144 37.367251,-121.973269 37.367237,-121.973393 37.367223,-121.973443 37.367158,-121.973493 37.367093,-121.973518 37.36702,-121.973543 37.366947,-121.973582 37.366618,-121.973622 37.366288,-121.97366 37.365826,-121.973698 37.365363,-121.973669 37.365336))';

-- Split on '.', then get the next N decimals, and find the index of the first non-number.
-- Then recombine the fragments, skipping the unwanted numbers.
WITH points AS (
    SELECT value, LEFT(value, @decimalPlaces) AS decimals, PATINDEX('%[^0-9]%', value) AS indx
    FROM STRING_SPLIT(@wkt, '.')
)
SELECT @wktShort = STRING_AGG(IIF(indx < @decimalPlaces, '', decimals) + SUBSTRING(value, indx, LEN(value)), '.') 
FROM points;

比较原始与缩短,我们可以看到每个数字都被截断为 4dp:

SELECT @wkt AS Text UNION ALL SELECT @wktShort;

【讨论】:

以上是关于SQL Server 地理:减小 WKT 文本的大小(小数精度)的主要内容,如果未能解决你的问题,请参考以下文章

【python】使用wkt格式的数据

sql server, 已知两组地理位置数据,求第一组每个地理位置2km以内第二组地理位置的个数

sqlserver2005怎么执行260M的大脚本文件? 打开脚本总是报“未能完成操作,存储空间不足”

Java Geometry空间几何数据的处理应用

SQL Server 地理

SQL Server 2008 地理空间查询