SQL Server 地理:减小 WKT 文本的大小(小数精度)
Posted
技术标签:
【中文标题】SQL Server 地理:减小 WKT 文本的大小(小数精度)【英文标题】:SQL Server geography: Reduce size (decimal precision) of WKT text 【发布时间】:2020-07-17 11:04:15 【问题描述】:对于我的农业应用程序,存储过程检索存储为 SQL Server geography
数据类型的围场/田地边界,以显示在用户的移动设备上。
SQL Server 的 .ToString()
和 .STAsText()
函数将每个顶点呈现为 lat/long 对,精确到小数点后 15 位。从this answer 开始,15 位小数定义了一个原子宽度内的位置!到最近的米对我来说就足够了。
由此产生的过于精确的有效负载非常大而且速度太慢,无法在大型农场中使用。
根据我的 SQL Server 地理数据,我想生成格式化为 4 位或 5 位小数的 WKT。我找不到任何内置方法,但我最好的线索是:
Postgis 和 Google Cloud "BigQuery" 具有 ST_SNAPTOGRID 功能,这将是完美的,并且 正则表达式可能有用,例如this answer,但 SQL Server 似乎没有正则表达式替换。我认为这是一个常见问题:有简单的解决方案吗?
【问题讨论】:
您使用的是什么版本的 SQL Server? @iamdave - 我使用的是Microsoft SQL Azure 12.0.2000.8
,它的兼容性级别为 140(= SQL Server 2017。)
【参考方案1】:
编辑
我相信我可能误解了您的问题,您希望传输 WKT 而不是多边形的二进制表示?如果是这种情况,我在下面的回答仍然会向您展示如何去掉一些小数位(不四舍五入)。只是不要将 stuff(...) FOR XML
包裹在 STGeomFromText
中,您就有了修改后的 WKT。
使用geography
数据类型时,维护一个非常详细的“主”版本会很方便,您可以根据自己的要求从中生成和保留不太详细的版本。
生成这些复杂度降低的多边形的一种简单方法是使用命名有用的 Reduce
函数,我认为它实际上会在这种情况下帮助您。
如果您想走减少小数位数的路线,您要么必须编写自定义 CLR 函数,要么进入 SQL Server 字符串操作的美妙世界!
SQL 查询
declare @DecimalPlaces int = 4; -- Specify the desired number of lat/long decimals
with g as(
select p.g -- Original polygon, for comparison purposes
,geography::STGeomFromText('POLYGON((' -- stripped apart and then recreated polygon from text, using a custom string split function. You won't be able to use the built in STRING_SPLIT here as it doesn't guarantee sort order.
+ stuff((select ', ' + left(s.item,charindex('.',s.item,0) + @DecimalPlaces) + substring(s.item,charindex(' ',s.item,0),charindex('.',s.item,charindex(' ',s.item,0)) - charindex(' ',s.item,0) + 1 + @DecimalPlaces)
from dbo.fn_StringSplitMax(replace(replace(p.g.STAsText(),'POLYGON ((',''),'))',''),', ',null) as s
for xml path(''), type).value('.', 'NVARCHAR(MAX)') -- STUFF and FOR XML mimics GROUP_CONCAT functionality seen in other SQL languages, to recombine shortened Points back into a Polygon string
,1,2,''
)
+ '))', 4326).MakeValid() as x -- Remember to make the polygon valid again, as you have been messing with the Point data
from(values(geography::STGeomFromText('POLYGON((-121.973669 37.365336,-121.97367 37.365336,-121.973642 37.365309,-121.973415 37.365309,-121.973189 37.365309,-121.973002 37.365912,-121.972815 37.366515,-121.972796 37.366532,-121.972776 37.366549,-121.972627 37.366424,-121.972478 37.366299,-121.972422 37.366299,-121.972366 37.366299,-121.972298 37.366356,-121.97223 37.366412,-121.97215 37.366505,-121.97207 37.366598,-121.971908 37.366794,-121.971489 37.367353,-121.971396 37.367484,-121.971285 37.36769,-121.971173 37.367897,-121.971121 37.368072,-121.971068 37.368248,-121.971028 37.36847,-121.970987 37.368692,-121.970987 37.368779,-121.970987 37.368866,-121.970949 37.368923,-121.970912 37.36898,-121.970935 37.36898,-121.970958 37.36898,-121.970975 37.368933,-121.970993 37.368887,-121.971067 37.368807,-121.97114 37.368726,-121.971124 37.368705,-121.971108 37.368685,-121.971136 37.368698,-121.971163 37.368712,-121.97134 37.368531,-121.971516 37.368351,-121.971697 37.368186,-121.971878 37.368021,-121.972085 37.367846,-121.972293 37.36767,-121.972331 37.367629,-121.972369 37.367588,-121.972125 37.367763,-121.97188 37.367938,-121.971612 37.36815,-121.971345 37.368362,-121.971321 37.36835,-121.971297 37.368338,-121.971323 37.368298,-121.97135 37.368259,-121.971569 37.368062,-121.971788 37.367865,-121.971977 37.367716,-121.972166 37.367567,-121.972345 37.367442,-121.972524 37.367317,-121.972605 37.367272,-121.972687 37.367227,-121.972728 37.367227,-121.972769 37.367227,-121.972769 37.367259,-121.972769 37.367291,-121.972612 37.367416,-121.972454 37.367542,-121.972488 37.367558,-121.972521 37.367575,-121.972404 37.367674,-121.972286 37.367773,-121.972194 37.367851,-121.972101 37.367928,-121.972046 37.36799,-121.971991 37.368052,-121.972008 37.368052,-121.972025 37.368052,-121.972143 37.367959,-121.972261 37.367866,-121.972296 37.367866,-121.972276 37.36794,-121.972221 37.36798,-121.972094 37.368097,-121.971966 37.368214,-121.971956 37.368324,-121.971945 37.368433,-121.971907 37.368753,-121.971868 37.369073,-121.97184 37.369578,-121.971812 37.370083,-121.971798 37.370212,-121.971783 37.370342,-121.971542 37.370486,-121.971904 37.370324,-121.972085 37.37028,-121.972266 37.370236,-121.972559 37.370196,-121.972852 37.370155,-121.973019 37.370155,-121.973186 37.370155,-121.973232 37.370136,-121.973279 37.370116,-121.973307 37.370058,-121.973336 37.370001,-121.973363 37.369836,-121.973391 37.369671,-121.973419 37.369227,-121.973446 37.368784,-121.973429 37.368413,-121.973413 37.368041,-121.973361 37.367714,-121.973308 37.367387,-121.973285 37.367339,-121.973262 37.36729,-121.973126 37.3673,-121.972989 37.36731,-121.973066 37.36728,-121.973144 37.367251,-121.973269 37.367237,-121.973393 37.367223,-121.973443 37.367158,-121.973493 37.367093,-121.973518 37.36702,-121.973543 37.366947,-121.973582 37.366618,-121.973622 37.366288,-121.97366 37.365826,-121.973698 37.365363,-121.973669 37.365336))', 4326))) as p(g)
)
-- select various versions of the polygons into the same column for overlay comparison in SSMS
select 'Original' as l
,g
from g
union all
select 'Short' as l
,x
from g
union all
select 'Original Reduced' as l
,g.Reduce(10)
from g
union all
select 'Short Reduced' as l
,x.Reduce(10)
from g;
输出
这里值得注意的是geog
二进制表示的长度差异(显示的字符的简单计数)。正如我上面提到的,只需使用Reduce
函数就可以满足您的需求,因此您需要测试各种方法,看看如何最好地减少数据传输。
+------------------+--------------------+------+
| l | g | Len |
+------------------+--------------------+------+
| Original | 0xE6100000010484...| 4290 |
| Short | 0xE6100000010471...| 3840 |
| Original Reduced | 0xE6100000010418...| 834 |
| Short Reduced | 0xE610000001041E...| 1184 |
+------------------+--------------------+------+
视觉对比
字符串拆分函数
由于多边形数据可能非常庞大,您需要一个可以处理超过 4k 或 8k 个字符的字符串拆分器。就我而言,我倾向于选择基于 xml 的方法:
create function [dbo].[fn_StringSplitMax]
(
@str nvarchar(max) = ' ' -- String to split.
,@delimiter as nvarchar(max) = ',' -- Delimiting value to split on.
,@num as int = null -- Which value to return.
)
returns table
as
return
with s as
( -- Convert the string to an XML value, replacing the delimiter with XML tags
select convert(xml,'<x>' + replace((select @str for xml path('')),@delimiter,'</x><x>') + '</x>').query('.') as s
)
select rn
,item -- Select the values from the generated XML value by CROSS APPLYing to the XML nodes
from(select row_number() over (order by (select null)) as rn
,n.x.value('.','nvarchar(max)') as item
from s
cross apply s.nodes('x') as n(x)
) a
where rn = @num
or @num is null;
【讨论】:
是的,我正在寻找传送 WKT。 @Merenzo 根据我回答顶部的编辑,您是否尝试过我回答中的解决方案? 是的@iamdave,它非常适合 POLYGON,我确信可以适应处理 MULTIPOLYGON。 @Merenzo 好消息!如果这是您问题的答案,请将其标记为这样,以使遇到此问题的其他用户受益。【参考方案2】:通过逐步查看@iamdave 的出色答案,并使用相同的方法,看起来我们只需要在句点上拆分......我认为我们可以忽略所有括号和逗号,并忽略 POLYGON 前缀(这意味着它'将适用于其他 GEOGRAPHY 类型,例如 MULTIPOLYGON。)
即每次我们找到一个句号时,只抓取它后面的 4 个字符,然后扔掉后面的任何数字(直到我们碰到一个非数字。)
这对我有用(使用@iamdave 的测试数据):
DECLARE @wkt NVARCHAR(MAX), @wktShort NVARCHAR(MAX);
DECLARE @decimalPlaces int = 4;
SET @wkt = 'POLYGON((-121.973669 37.365336,-121.97367 37.365336,-121.973642 37.365309,-121.973415 37.365309,-121.973189 37.365309,-121.973002 37.365912,-121.972815 37.366515,-121.972796 37.366532,-121.972776 37.366549,-121.972627 37.366424,-121.972478 37.366299,-121.972422 37.366299,-121.972366 37.366299,-121.972298 37.366356,-121.97223 37.366412,-121.97215 37.366505,-121.97207 37.366598,-121.971908 37.366794,-121.971489 37.367353,-121.971396 37.367484,-121.971285 37.36769,-121.971173 37.367897,-121.971121 37.368072,-121.971068 37.368248,-121.971028 37.36847,-121.970987 37.368692,-121.970987 37.368779,-121.970987 37.368866,-121.970949 37.368923,-121.970912 37.36898,-121.970935 37.36898,-121.970958 37.36898,-121.970975 37.368933,-121.970993 37.368887,-121.971067 37.368807,-121.97114 37.368726,-121.971124 37.368705,-121.971108 37.368685,-121.971136 37.368698,-121.971163 37.368712,-121.97134 37.368531,-121.971516 37.368351,-121.971697 37.368186,-121.971878 37.368021,-121.972085 37.367846,-121.972293 37.36767,-121.972331 37.367629,-121.972369 37.367588,-121.972125 37.367763,-121.97188 37.367938,-121.971612 37.36815,-121.971345 37.368362,-121.971321 37.36835,-121.971297 37.368338,-121.971323 37.368298,-121.97135 37.368259,-121.971569 37.368062,-121.971788 37.367865,-121.971977 37.367716,-121.972166 37.367567,-121.972345 37.367442,-121.972524 37.367317,-121.972605 37.367272,-121.972687 37.367227,-121.972728 37.367227,-121.972769 37.367227,-121.972769 37.367259,-121.972769 37.367291,-121.972612 37.367416,-121.972454 37.367542,-121.972488 37.367558,-121.972521 37.367575,-121.972404 37.367674,-121.972286 37.367773,-121.972194 37.367851,-121.972101 37.367928,-121.972046 37.36799,-121.971991 37.368052,-121.972008 37.368052,-121.972025 37.368052,-121.972143 37.367959,-121.972261 37.367866,-121.972296 37.367866,-121.972276 37.36794,-121.972221 37.36798,-121.972094 37.368097,-121.971966 37.368214,-121.971956 37.368324,-121.971945 37.368433,-121.971907 37.368753,-121.971868 37.369073,-121.97184 37.369578,-121.971812 37.370083,-121.971798 37.370212,-121.971783 37.370342,-121.971542 37.370486,-121.971904 37.370324,-121.972085 37.37028,-121.972266 37.370236,-121.972559 37.370196,-121.972852 37.370155,-121.973019 37.370155,-121.973186 37.370155,-121.973232 37.370136,-121.973279 37.370116,-121.973307 37.370058,-121.973336 37.370001,-121.973363 37.369836,-121.973391 37.369671,-121.973419 37.369227,-121.973446 37.368784,-121.973429 37.368413,-121.973413 37.368041,-121.973361 37.367714,-121.973308 37.367387,-121.973285 37.367339,-121.973262 37.36729,-121.973126 37.3673,-121.972989 37.36731,-121.973066 37.36728,-121.973144 37.367251,-121.973269 37.367237,-121.973393 37.367223,-121.973443 37.367158,-121.973493 37.367093,-121.973518 37.36702,-121.973543 37.366947,-121.973582 37.366618,-121.973622 37.366288,-121.97366 37.365826,-121.973698 37.365363,-121.973669 37.365336))';
-- Split on '.', then get the next N decimals, and find the index of the first non-number.
-- Then recombine the fragments, skipping the unwanted numbers.
WITH points AS (
SELECT value, LEFT(value, @decimalPlaces) AS decimals, PATINDEX('%[^0-9]%', value) AS indx
FROM STRING_SPLIT(@wkt, '.')
)
SELECT @wktShort = STRING_AGG(IIF(indx < @decimalPlaces, '', decimals) + SUBSTRING(value, indx, LEN(value)), '.')
FROM points;
比较原始与缩短,我们可以看到每个数字都被截断为 4dp:
SELECT @wkt AS Text UNION ALL SELECT @wktShort;
【讨论】:
以上是关于SQL Server 地理:减小 WKT 文本的大小(小数精度)的主要内容,如果未能解决你的问题,请参考以下文章
sql server, 已知两组地理位置数据,求第一组每个地理位置2km以内第二组地理位置的个数