解析地址：模仿这 8 行 Perl 的 T-SQL 代码？

Posted 2023-03-24

技术标签:

【中文标题】解析地址：模仿这 8 行 Perl 的 T-SQL 代码？【英文标题】：Parsing an address: T-SQL code to mimic these 8 lines of Perl? 【发布时间】：2021-02-04 21:22:40 【问题描述】：

我只是从另一个程序中得到一个未解析的地址，我需要将它作为它的组件存储在接收系统中。我需要一些帮助！我给你洗猫。随便。

好消息是我可以依靠这些换行符。我可以指望城市后的逗号+空格，我可以指望州或省的两位数缩写，后跟空格。所以（没有打高尔夫球）我很快用 Perl 写了它，以提供一些工作代码。

关键是如果我们在 \n 上拆分输入，我只想要第二行/元素（地址 1）、最后一行/元素（国家/地区）和倒数第二个元素（城市、ST zip）。然后我需要将该元素拆分为其组件。我下面的 Perl 代码可以工作，但是如何在 T-SQL 中重新创建它？

$_ = "Company\n".
    "Address 1\n".
    "Address 2 (opt)\n".
    "Address 3 (opt)\n".
    "City, ST zip\n".   
    "Country";

# also works for "City, PV zip zip\n"

@add = split('\n');

$address = $add[1]; # who cares about addy and addy3
$country = pop(@add);
$ctz = pop(@add);
if ($ctz =~ /(.*), (..) (.*)/) 
    # Yes a $ctz line like "City of Angels, II, MO 65423" would break it
    $city = $1;
    $state = $2;
    $zip = $3;
 else 
    $city = $state = $zip = '';


print "Address: $address\n".
    "City: $city\n".
    "State Code: $state\n".
    "Zip: $zip\n".
    "Country: $country\n";

【问题讨论】：

其实，那个正则表达式可以解析City of Angels, II, MO 65423就好了哦，对了，贪婪。好吧，我敢肯定有一些奇怪的城市名称会让我们失望。但是人只是说取最后一行，倒数第二行将其拆分，然后在 tsql 中取第二行让我发疯。集成商有大约 50 行代码，但它失败了。我在 Fiver 上付钱给某人，但他的代码也不起作用，不值得为 20 美元与他争吵。有人吗？你试过正则表达式了吗？ -- 我找到了很多例子here。我敢肯定那里有更多的文档。如果是的话，特别是什么给你带来了麻烦（我没有从上一条评论中得到描述，关于“最后一行”然后是“倒数第二行”然后是“第二行”......你能显示确切的行吗？（或者，更好的是，您的SQL 尝试？） 【参考方案1】：

从大量借鉴Jeff Moden 的字符串拆分器开始，但它处理多字符分隔符。它按顺序返回分隔的项目并带有索引列：

CREATE FUNCTION [dbo].[DelimitedSplit8K]
--===== Define I/O parameters
        (@pString VARCHAR(8000), @pDelimiter VARCHAR(16))
--WARNING!!! DO NOT USE MAX DATA-TYPES HERE!  IT WILL KILL PERFORMANCE!
RETURNS TABLE WITH SCHEMABINDING AS
 RETURN
--===== "Inline" CTE Driven "Tally Table" produces values from 1 up to 10,000...
     -- enough to cover VARCHAR(8000)
  WITH E1(N) AS (
                 SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
                 SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
                 SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
                ),                          --10E+1 or 10 rows
       E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
       E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
 cteTally(N) AS (--==== This provides the "base" CTE and limits the number of rows right up front
                     -- for both a performance gain and prevention of accidental "overruns"
                 SELECT TOP (ISNULL(DATALENGTH(@pString),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
                ),
cteStart(N1) AS (--==== This returns N+1 (starting position of each "element" just once for each delimiter)
                 SELECT 1 UNION ALL
                 SELECT t.N+ Len( @pDelimiter ) FROM cteTally t WHERE SUBSTRING(@pString,t.N, Len( @pDelimiter ) ) = @pDelimiter
                ),
cteLen(N1,L1) AS(--==== Return start and length (for use in substring)
                 SELECT s.N1,
                        ISNULL(NULLIF(CHARINDEX(@pDelimiter,@pString,s.N1),0)-s.N1 ,8000)
                   FROM cteStart s
                )
--===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.
 SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY l.N1),
        Item       = SUBSTRING(@pString, l.N1, l.L1)
   FROM cteLen l;

然后放开你的数据：

declare @Newline as Char(2) = Char(13) + Char(10); -- This may need work to match your newlines.
declare @Sample as VarChar(1024) =
  'Company' + @Newline +
  'Address 1' + @Newline +
  'Address 2 (opt)' + @Newline +
  'Address 3 (opt)' + @Newline +
  'City, ST zip' + @Newline +
  'Country';

select *
  from dbo.DelimitedSplit8K( @Sample, @Newline );

剩下的练习是弄清楚你想如何处理可选项目。

DBfiddle 对于好奇的人。

【讨论】：

【参考方案2】：

抱歉，我认为您需要 perl 中的代码，但您要求的是 T-SQL 代码。

将代码留给可能感兴趣的陌生人。

调查以下代码片段是否符合您的任务

use strict;
use warnings;
use feature 'say';

use Data::Dumper;

my @data = <DATA>;
my $address;

chomp @data;

$address->company = $data[0];
push @$address->street, @data[1..$#data-2];
$address->@qw/city state zip/ = split '[, ]+', $data[-2];
$address->country = $data[-1];

say Dumper($address);

say '--- Print the address ' . '-' x 25;

my @fields = keys % $address ;

for my $field ( @fields ) 
    say ucfirst $field . ": " . 
            (
                ref $address->$field eq 'ARRAY' 
                ? join "\n\t", @ $address->$field  
                : $address->$field
            );


__DATA__
Company
Address 1
Address 2 (opt)
Address 3 (opt)
City, ST zip   
Country

输出

$VAR1 = 
          'city' => 'City',
          'country' => 'Country',
          'company' => 'Company',
          'state' => 'ST',
          'zip' => 'zip',
          'street' => [
                        'Address 1',
                        'Address 2 (opt)',
                        'Address 3 (opt)'
                      ]
        ;

--- Print the address -------------------------
Street: Address 1
        Address 2 (opt)
        Address 3 (opt)
City: City
State: ST
Zip: zip
Country: Country

【讨论】：

【参考方案3】：

这可能有点难看，但我今天自学了 TSQL。 :)

declare @string varchar(2000), @ctz varchar(100), @delim varchar(1), @idx integer;

set @delim = CHAR(10);  -- What we get from BC
set @string = 'Company'+@delim+'Address1'+@delim+'Address2'+@delim+'City, ST Zip'+@delim+'Country'; -- We we get from BC
--set @string = 'Company'+@delim+'Address1'+@delim+'City, ST Zip'+@delim+'Country'; -- We we get from BC
--set @string = 'Company'+@delim+'Address1'+@delim+'Address2'+@delim++'Address3'+@delim+'City, PR zip zip'+@delim+'Country'; -- We we get from BC

-- Start from the bottom
select @idx = LEN(@string) - CHARINDEX(@delim,REVERSE(@string)) + 1;    -- last occurance of our delim
select SUBSTRING(@string,@idx+1,2000) as country;

select @string = SUBSTRING(@string,1,@idx-1);   -- shorten our string now including the delim

select @idx = LEN(@string) - CHARINDEX(@delim,REVERSE(@string)) + 1;
select @ctz = SUBSTRING(@string,@idx+1,2000); -- deal with this later
-- select @ctz as ctz;

select @string = SUBSTRING(@string,1,@idx-1);  -- shorten it again including the delim

-- Now start at the top to remove company

select @idx = CHARINDEX(@delim, @string); -- first occurance of delim
select @string = SUBSTRING(@string,@idx+1,2000); -- just remove everything up to that point (Company)

select @idx = CHARINDEX(@delim, @string); -- first occurance at end of add1

if @idx = 0
    select @string as address1;
else 
    BEGIN
    select SUBSTRING(@string,1,@idx-1) as address1;
    select @string = SUBSTRING(@string,@idx+1,2000); -- keep shortening
    select replace(@string, @delim, ',') as address2; -- if there anything else
    END

select @idx = PATINDEX('%, [A-z][A-Z] %',@ctz); -- A regexp to find ", ST "
select SUBSTRING(@ctz,1,@idx-1) as city;
select SUBSTRING(@ctz,@idx+2,2) as st;
select SUBSTRING(@ctz,@idx+1+2+1,100) as zip; -- index+space+state+space
GO

【讨论】：

以上是关于解析地址：模仿这 8 行 Perl 的 T-SQL 代码？的主要内容，如果未能解决你的问题，请参考以下文章

在Perl中使用Getopt：：Long进行命令行解析

Perl模块 Getopt::Long 解析

perl函数操作符解析

perl如何解析字符串？

[LeetCode][8]String to Integer (atoi)解析与模仿Java源码实现 -Java实现