在 SSIS .dtsx 包中查找表名

Posted

技术标签:

【中文标题】在 SSIS .dtsx 包中查找表名【英文标题】:Finding table names in SSIS .dtsx packages 【发布时间】:2021-12-14 17:19:33 【问题描述】:

我正在尝试扫描 SSIS .dtsx 包以获取表名。是的,我知道我应该使用 [xml] 和一个解析 SQL 语言的工具。这在这个时候似乎是不可能的。 PowerShell 可以理解 [xml],但 SQL 解析器通常会花费 ++,并且使用 ANTLR 是一项投资,目前无法接受。我愿意接受建议,但我不是在寻求工具推荐。

有两 (2) 个问题。

1) `&.;` does not appear to be recognized as separate from the table name capture item
2) TABLE5 does not appear to be found

是的,我也知道架构名称不应该硬编码到源代码中。这使得 DBA 管理数据库变得困难/不可能。这就是这里的做法。

如何使正则表达式从捕获中省略 &.*; 并识别 dbo.TABLE5

这是我用来扫描 .dtsx 文件的代码。

PS C:\src\sql> Get-Content .\Find-FromJoinSql.ps1
Get-ChildItem -File -Filter '*.dtsx' |
    ForEach-Object 
        $Filename = $_.Name
        Select-String -Pattern '(FROM|JOIN)(\s|&.*;)+(\S+)(\s|&.*;)+' -Path $_ -AllMatches |
        ForEach-Object 
            if ($_.Matches.Groups.captures[3].value -match 'dbo') 
                "$Filename === $($_.Matches.Groups.captures[3].value)"
            
        
    

这是 .dtsx 文件中文本类型的一个小示例。

PS C:\src\sql> Get-Content .\sls_test.dtsx
USE ADATABASE;
SELECT * FROM dbo.TABLE1 WHERE F1 = 3;
SELECT * FROM dbo.TABLE2 T2
    FULL OUTER JOIN dbo.TABLEJ TJ
        ON T2.KEY = TJ.KEY;
SELECT * FROM dbo.TABLE3 T3
    INNER JOIN ADATABASE2.dbo.TABLEK
TK ON
T3.user_id = TK.user_id

SELECT * FROM dbo.TABLE4 T4 FULL OUTER JOIN dbo.TABLE5 T5
    ON T4.F1 = T5.F1;
EXIT

对这些数据运行脚本会产生:

PS C:\src\sql> .\Find-FromJoinSql.ps1
sls_test.dtsx === dbo.TABLE1
sls_test.dtsx === dbo.TABLE2
sls_test.dtsx === dbo.TABLEJ
sls_test.dtsx === dbo.TABLE3
sls_test.dtsx === ADATABASE2.dbo.TABLEK
TK
sls_test.dtsx === dbo.TABLE4

PS C:\src\sql> $PSVersionTable.PSVersion.ToString()
7.1.5

【问题讨论】:

您可能遇到的其他事情是包本身中的表达式语言。我可以在那里有一个静态文本,但可以有一个执行 SQL 任务(获取所有表),然后在 Foreach 枚举器中分解该结果集。您的解析会找到单个存储的表引用,但运行时会产生 N 个额外的表(甚至可能不是静态设计时表)。或者使用外部配置来实现事情的人等等。如果是我,我会抓住BimlExpress,对 Biml 进行逆向工程,然后搜索它。不太复杂的文件 仍然存在动态和表达式方面的挑战,但要搜索的文本量较少 @billinkc,SSIS 可以通过多种方式处理包文件中未静态指定的内容,这是绝对正确的。我已经多次看到包中的 SQL 甚至没有使用,因为另一种机制处于活动状态。我只是在尝试寻找唾手可得的果实,以便让一个团队在了解一组包所引用的内容方面有一个良好的开端。我不能把它变成 Collibra 或 Erwin。 在那里看到
 实体很奇怪 - 换行符。 XML 允许以明文形式换行字符,事实上,规范要求反序列化器将它们遇到的所有换行符序列标准化为换行符。对于这样的实体,我只希望看到
(回车)出现在行尾。无论如何...您是否尝试过用&[^;]+; 替换您的&.*; 模式? PowerShell 在匹配替换中对&$ 具有特殊含义,因此可能需要将& 转义为\& 【参考方案1】:

有些实体 (
) 在这些文件中没有被替换,这确实很奇怪。

稍微更改正则表达式模式以捕获 dbo.table 名称,如下所示。

使用获取内容

$regex = [regex] '(?im)(?:FROM|JOIN)(?:\s|&[^;]+;)+([^\s&]+)(?:\s|&[^;]+;)*'
Get-ChildItem -Path D:\Test -File -Filter '*.dtsx' |
    ForEach-Object 
        $match = $regex.Match((Get-Content -Path $_.FullName -Raw))
        while ($match.Success) 
            "$($_.Name) === $($match.Groups[1].Value)"
            $match = $match.NextMatch()
         
    

使用选择字符串

至于为什么Select-String -AllMatches 跳过了你的Table5。 来自the docs:“当 Select-String 在一行文本中找到多个匹配项时,它仍然只为该行发出一个 MatchInfo 对象,但该对象的 Matches 属性包含所有匹配项。” em>

这意味着您需要另一个循环来从每个 $MatchInfo 对象中获取所有 $Matches 以将它们放入您的输出中:

$pattern = '(?:FROM|JOIN)(?:\s|&[^;]+;)+([^\s&]+)(?:\s|&[^;]+;)*'
Get-ChildItem -Path 'D:\Test' -File -Filter '*.dtsx' |
    ForEach-Object 
        $Filename = $_.Name
        Select-String -Pattern $pattern -Path $_.FullName -AllMatches |
        ForEach-Object 
            # loop again, because each $MatchInfo object may contain multiple
            # $Matches objects if more matches were found in the same line
            foreach ($match in $_.Matches) 
                if ($match.Groups[1].value -match 'dbo') 
                    "$Filename === $($match.Groups[1].value)"
                
            
        
    

输出:

sls_test.dtsx === dbo.TABLE1
sls_test.dtsx === dbo.TABLE2
sls_test.dtsx === dbo.TABLEJ
sls_test.dtsx === dbo.TABLE3
sls_test.dtsx === ADATABASE2.dbo.TABLEK
sls_test.dtsx === dbo.TABLE4
sls_test.dtsx === dbo.TABLE5

正则表达式详细信息:

(?im)              Use case-insensitive matching and have '^' and '$' match at linebreaks
(?:                Match the regular expression below
                   Match either the regular expression below (attempting the next alternative only if this one fails)
      FROM         Match the characters “FROM” literally
   |               Or match regular expression number 2 below (the entire group fails if this one fails to match)
      JOIN         Match the characters “JOIN” literally
)                 
(?:                Match the regular expression below
   |               Match either the regular expression below (attempting the next alternative only if this one fails)
      \s           Match a single character that is a “whitespace character” (spaces, tabs, line breaks, etc.)
   |               Or match regular expression number 2 below (the entire group fails if this one fails to match)
      &            Match the character “&” literally
      [^;]         Match any character that is NOT a “;”
         +         Between one and unlimited times, as many times as possible, giving back as needed (greedy)
      ;            Match the character “;” literally
)+                 Between one and unlimited times, as many times as possible, giving back as needed (greedy)
(                  Match the regular expression below and capture its match into backreference number 1
   [^\s&]          Match a single character NOT present in the list below
                   A whitespace character (spaces, tabs, line breaks, etc.)
                   The character “&”
      +            Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)                 
(?:                Match the regular expression below
   |               Match either the regular expression below (attempting the next alternative only if this one fails)
      \s           Match a single character that is a “whitespace character” (spaces, tabs, line breaks, etc.)
   |               Or match regular expression number 2 below (the entire group fails if this one fails to match)
      &            Match the character “&” literally
      [^;]         Match any character that is NOT a “;”
         +         Between one and unlimited times, as many times as possible, giving back as needed (greedy)
      ;            Match the character “;” literally
)*                 Between zero and unlimited times, as many times as possible, giving back as needed (greedy)

【讨论】:

以上是关于在 SSIS .dtsx 包中查找表名的主要内容,如果未能解决你的问题,请参考以下文章

从 VS Data 工具中动态选择表名

调试 ssis 包 - 重新加载 InnerPackage.dtsx

SSDT 包 (.dtsx) 上数据流中的错误转换日期 nvarchar 到 datetime

在 SSIS 2012 的 OLE DB 源中使用动态 SQL

SQL如何判断两个表数据结构是不是一样

如果月份大于当年的 8 号,则获取表名?