如何在视图中获取列级依赖项

Posted

技术标签:

【中文标题】如何在视图中获取列级依赖项【英文标题】:How to get column-level dependencies in a view 【发布时间】:2018-01-21 08:58:09 【问题描述】:

我已经对此事进行了一些研究,但还没有解决方案。我想要得到的是视图中的列级依赖项。所以,假设我们有一张这样的表

create table TEST(
    first_name varchar(10),
    last_name varchar(10),
    street varchar(10),
    number int
)

还有这样的视图:

create view vTEST
as
    select
        first_name + ' ' + last_name as [name],
        street + ' ' + cast(number as varchar(max)) as [address]
    from dbo.TEST

我想要得到这样的结果:

column_name depends_on_column_name depends_on_table_name
----------- --------------------- --------------------
name        first_name            dbo.TEST
name        last_name             dbo.TEST
address     street                dbo.TEST
address     number                dbo.TEST

我尝试过sys.dm_sql_referenced_entities 函数,但referencing_minor_id 的视图总是为0。

select
    referencing_minor_id,
    referenced_schema_name + '.' + referenced_entity_name as depends_on_table_name,
    referenced_minor_name as depends_on_column_name
from sys.dm_sql_referenced_entities('dbo.vTEST', 'OBJECT')

referencing_minor_id depends_on_table_name depends_on_column_name
-------------------- --------------------- ----------------------
0                    dbo.TEST              NULL
0                    dbo.TEST              first_name
0                    dbo.TEST              last_name
0                    dbo.TEST              street
0                    dbo.TEST              number

sys.sql_expression_dependencies 和过时的sys.sql_dependencies 也是如此。

那么我错过了什么还是不可能做到的?

有一些相关问题 (Find the real column name of an alias used in a view?),但正如我所说 - 我还没有找到可行的解决方案。

编辑 1:我尝试使用 DAC 来查询此信息是否存储在 System Base Tables 的某个位置,但没有找到

【问题讨论】:

mssqltips.com/sqlservertip/2999/… WITH SCHEMABINDING 可以链接依赖项,但我不确定这是否可以让您创建这样的结果。 我认为没有实用的纯TSQL方案。您可能会在this 问题上找到一些有用的信息:解析 TSQL。 DBA 堆栈交换有一个类似的问题,它使用 sys.sql_dependencies 和 sys.sql_expression_dependencies。不幸的是,前者目前处于维护模式,而后者没有削减它。 dba.stackexchange.com/questions/77813 受here 评论的启发,您还可以尝试在信息架构中对VIEW_COLUMN_USAGE 运行 sp_helptext。对我来说 VIEW_COLUMN_USAGE 也使用 sys.sql_dependencies 但我仍然坚持使用 SQL Server 2008,所以我不知道这是否适用于更新的版本。 【参考方案1】:

不幸的是,SQL Server 没有显式存储源表列和视图列之间的映射。我怀疑主要原因仅仅是由于视图的潜在复杂性(表达式列、在这些列上调用的函数、嵌套查询等)。

我能想到的确定视图列和源列之间映射的唯一方法是解析与视图关联的查询或解析视图的执行计划。

我在这里概述的方法侧重于第二个选项,并且依赖于 SQL Server 将避免为查询不需要的列生成输出列表这一事实。

第一步是获取视图所需的依赖表及其关联列的列表。这可以通过 SQL Server 中的标准系统表来实现。

接下来,我们通过游标枚举视图的所有列。

对于每个视图列,我们创建一个临时包装存储过程,它只从视图中选择有问题的单个列。因为只请求单个列,SQL Server 将只检索输出该单个视图列所需的信息。

新创建的过程将以仅格式模式运行查询,因此不会对数据库造成任何实际的 I/O 操作,但它会在执行时生成估计的执行计划。生成查询计划后,我们从执行计划中查询输出列表。由于我们知道选择了哪个视图列,我们现在可以将输出列表与相关的视图列相关联。我们可以通过仅关联构成原始依赖项列表一部分的列来进一步细化关联,这将消除结果集中的表达式输出。

请注意,使用此方法,如果视图需要将不同的表连接在一起以生成输出,则将返回生成输出所需的所有列,即使它没有直接用于列表达式,因为它仍然是直接需要的.

下面的存储过程演示了上面的实现方法:

CREATE PROCEDURE ViewGetColumnDependencies
(
    @viewName   NVARCHAR(50)
)
AS
BEGIN

    CREATE TABLE #_suppress_output
    (
        result NVARCHAR(500) NULL
    );


    DECLARE @viewTableColumnMapping TABLE
    (
        [ViewName]                  NVARCHAR(50),
        [SourceObject]              NVARCHAR(50),
        [SourceObjectColumnName]    NVARCHAR(50),
        [ViewAliasColumn]           NVARCHAR(50)
    )


    -- Get list of dependent tables and their associated columns required for the view.
    INSERT INTO @viewTableColumnMapping
    (
        [ViewName]                  
        ,[SourceObject]             
        ,[SourceObjectColumnName]               
    )
    SELECT          v.[name] AS [ViewName]
                    ,'[' + OBJECT_NAME(d.referenced_major_id) + ']' AS [SourceObject]
                    ,c.[name] AS [SourceObjectColumnName]
    FROM            sys.views v
    LEFT OUTER JOIN sys.sql_dependencies d ON d.object_id = v.object_id
    LEFT OUTER JOIN sys.columns c ON c.object_id = d.referenced_major_id AND c.column_id = d.referenced_minor_id
    WHERE           v.[name] = @viewName;


    DECLARE @aliasColumn NVARCHAR(50);

    -- Next, we enumerate all of the views columns via a cursor. 
    DECLARE ViewColumnNameCursor CURSOR FOR
    SELECT              aliases.name AS [AliasName]
    FROM                sys.views v
    LEFT OUTER JOIN     sys.columns AS aliases  on v.object_id = aliases.object_id -- c.column_id=aliases.column_id AND aliases.object_id = object_id('vTEST')
    WHERE   v.name = @viewName;

    OPEN ViewColumnNameCursor  

    FETCH NEXT FROM ViewColumnNameCursor   
    INTO @aliasColumn  

    DECLARE @tql_create_proc NVARCHAR(MAX);
    DECLARE @queryPlan XML;

    WHILE @@FETCH_STATUS = 0  
    BEGIN 

        /*
        For each view column, we create a temporary wrapper stored procedure that 
        only selects the single column in question from view. The stored procedure 
        will run the query in format only mode and will therefore not cause any 
        actual I/O operations on the database, but it will generate an estimated 
        execution plan when executed.
        */
         SET @tql_create_proc = 'CREATE PROCEDURE ___WrapView
                                AS
                                    SET FMTONLY ON;
                                    SELECT CONVERT(NVARCHAR(MAX), [' + @aliasColumn + ']) FROM [' + @viewName + '];
                                    SET FMTONLY OFF;';

        EXEC (@tql_create_proc);

        -- Execute the procedure to generate a query plan. The insert into the temp table is only done to
        -- suppress the empty result set from being displayed as part of the output.
        INSERT INTO #_suppress_output
        EXEC ___WrapView;

        -- Get the query plan for the wrapper procedure that was just executed.
        SELECT  @queryPlan =   [qp].[query_plan]  
        FROM    [sys].[dm_exec_procedure_stats] AS [ps]
                JOIN [sys].[dm_exec_query_stats] AS [qs] ON [ps].[plan_handle] = [qs].[plan_handle]
                CROSS APPLY [sys].[dm_exec_query_plan]([qs].[plan_handle]) AS [qp]
        WHERE   [ps].[database_id] = DB_ID() AND  OBJECT_NAME([ps].[object_id], [ps].[database_id])  = '___WrapView'

        -- Drop the wrapper view
        DROP PROCEDURE ___WrapView

        /*
        After the query plan is generate, we query the output lists from the execution plan. 
        Since we know which view column was selected we can now associate the output list to 
        view column in question. We can further refine the association by only associating 
        columns that form part of our original dependency list, this will eliminate expression 
        outputs from the result set. 
        */
        ;WITH QueryPlanOutputList AS
        (
          SELECT    T.X.value('local-name(.)', 'NVARCHAR(max)') as Structure,
                    T.X.value('./@Table[1]', 'NVARCHAR(50)') as [SourceTable],
                    T.X.value('./@Column[1]', 'NVARCHAR(50)') as [SourceColumnName],
                    T.X.query('*') as SubNodes

          FROM @queryPlan.nodes('*') as T(X)
          UNION ALL 
          SELECT QueryPlanOutputList.structure + N'/' + T.X.value('local-name(.)', 'nvarchar(max)'),
                 T.X.value('./@Table[1]', 'NVARCHAR(50)') as [SourceTable],
                 T.X.value('./@Column[1]', 'NVARCHAR(50)') as [SourceColumnName],
                 T.X.query('*')
          FROM QueryPlanOutputList
          CROSS APPLY QueryPlanOutputList.SubNodes.nodes('*') as T(X)
        )
        UPDATE @viewTableColumnMapping
        SET     ViewAliasColumn = @aliasColumn
        FROM    @viewTableColumnMapping CM
        INNER JOIN  
                (
                    SELECT DISTINCT  QueryPlanOutputList.Structure
                                    ,QueryPlanOutputList.[SourceTable]
                                    ,QueryPlanOutputList.[SourceColumnName]
                    FROM    QueryPlanOutputList
                    WHERE   QueryPlanOutputList.Structure like '%/OutputList/ColumnReference'
                ) SourceColumns ON CM.[SourceObject] = SourceColumns.[SourceTable] AND CM.SourceObjectColumnName = SourceColumns.SourceColumnName

        FETCH NEXT FROM ViewColumnNameCursor   
        INTO @aliasColumn 
    END

    CLOSE ViewColumnNameCursor;
    DEALLOCATE ViewColumnNameCursor; 

    DROP TABLE #_suppress_output

    SELECT *
    FROM    @viewTableColumnMapping
    ORDER BY [ViewAliasColumn]

END

现在可以按如下方式执行存储过程:

EXEC dbo.ViewGetColumnDependencies @viewName = 'vTEST'

【讨论】:

【参考方案2】:

我正在玩这个,但没有时间继续下去。也许这会有所帮助:

-- Returns all table columns called in the view and the objects they pull from

SELECT
     v.[name] AS ViewName
    ,d.[referencing_id] AS ViewObjectID 
    ,c.[name] AS ColumnNames
    ,OBJECT_NAME(d.referenced_id) AS ReferencedTableName
    ,d.referenced_id AS TableObjectIDsReferenced
FROM 
sys.views v 
INNER JOIN sys.sql_expression_dependencies d ON d.referencing_id = v.[object_id]
INNER JOIN sys.objects o ON d.referencing_id = o.[object_id]
INNER JOIN sys.columns c ON d.referenced_id = c.[object_id]
WHERE v.[name] = 'vTEST'

-- Returns all output columns in the view

SELECT 
     OBJECT_NAME([object_id]) AS ViewName
    ,[object_id] AS ViewObjectID
    ,[name] AS OutputColumnName
FROM sys.columns
WHERE OBJECT_ID('vTEST') = [object_id]

-- Get the view definition

SELECT 
    VIEW_DEFINITION
FROM INFORMATION_SCHEMA.VIEWS
WHERE TABLE_NAME = 'vTEST'

【讨论】:

【参考方案3】:

这是一个基于查询计划的解决方案。它有一些冒险

几乎可以处理任何选择查询 没有架构绑定

缺点

尚未正确测试 如果 Microsoft 更改 XML 查询计划,可能会突然中断。

核心思想是XML查询计划中的每个列表达式都定义在“DefinedValue”节点中。 “DefinedValue”的第一个子节点是对输出列的引用,第二个是表达式。该表达式根据输入列和常量值进行计算。 如上所述,这只是基于经验观察,需要进行适当的测试。

这是一个调用示例:

exec dbo.GetColumnDependencies 'select * from dbo.vTEST'

target_column_name | source_column_name        | const_value
---------------------------------------------------
address            | Expr1007                  | NULL
name               | Expr1006                  | NULL
Expr1006           | NULL                      | ' '
Expr1006           | [testdb].[dbo].first_name | NULL
Expr1006           | [testdb].[dbo].last_name  | NULL
Expr1007           | NULL                      | ' '
Expr1007           | [testdb].[dbo].number     | NULL
Expr1007           | [testdb].[dbo].street     | NULL

这是代码。 首先得到XML查询计划。

declare @select_query as varchar(4000) = 'select * from dbo.vTEST' -- IT'S YOUR QUERY HERE.
declare @select_into_query    as varchar(4000) = 'select top (1) * into #foo from (' + @select_query + ') as src'
      , @xml_plan             as xml           = null
      , @xml_generation_tries as tinyint       = 10
;
while (@xml_plan is null and @xml_generation_tries > 0) -- There is no guaranty that plan will be cached.
begin 
  execute (@select_into_query);
  select @xml_plan = pln.query_plan
    from sys.dm_exec_query_stats as qry
      cross apply sys.dm_exec_sql_text(qry.sql_handle) as txt
      cross apply sys.dm_exec_query_plan(qry.plan_handle) as pln
    where txt.text = @select_into_query
  ;
end
if (@xml_plan is null
) begin
    raiserror(N'Can''t extract XML query plan from cache.' ,15 ,0);
    return;
  end
;

接下来是一个主查询。它最大的部分是用于列提取的递归公用表表达式。

with xmlnamespaces(default 'http://schemas.microsoft.com/sqlserver/2004/07/showplan'
                  ,'http://schemas.microsoft.com/sqlserver/2004/07/showplan' as shp -- Used in .query() for predictive namespace using. 
)
    , cte_column_dependencies as
    (

递归的种子是一个查询,它为存储 1 行感兴趣的选择查询的 #foo 表提取列。

select
    (select foo_col.info.query('./ColumnReference') for xml raw('shp:root') ,type) -- Becouse .value() can't extract attribute from root node.
      as target_column_info
  , (select foo_col.info.query('./ScalarOperator/Identifier/ColumnReference') for xml raw('shp:root') ,type)
      as source_column_info
  , cast(null as xml) as const_info
  , 1 as iteration_no
from @xml_plan.nodes('//Update/SetPredicate/ScalarOperator/ScalarExpressionList/ScalarOperator/MultipleAssign/Assign')
        as foo_col(info)
where foo_col.info.exist('./ColumnReference[@Table="[#foo]"]') = 1

递归部分搜索具有依赖列的“DefinedValue”节点,并提取列表达式中使用的所有“ColumnReference”和“Const”子节点。 XML 到 SQL 的转换过于复杂。

union all    
select
    (select internal_col.info.query('.') for xml raw('shp:root') ,type)
  , source_info.column_info
  , source_info.const_info
  , prev_dependencies.iteration_no + 1
from @xml_plan.nodes('//DefinedValue/ColumnReference') as internal_col(info)
  inner join cte_column_dependencies as prev_dependencies -- Filters by depended columns.
        on prev_dependencies.source_column_info.value('(//ColumnReference/@Column)[1]' ,'nvarchar(4000)') = internal_col.info.value('(./@Column)[1]' ,'nvarchar(4000)')
        and exists (select prev_dependencies.source_column_info.value('(.//@Schema)[1]'   ,'nvarchar(4000)') intersect select internal_col.info.value('(./@Schema)[1]'   ,'nvarchar(4000)'))
        and exists (select prev_dependencies.source_column_info.value('(.//@Database)[1]' ,'nvarchar(4000)') intersect select internal_col.info.value('(./@Database)[1]' ,'nvarchar(4000)'))
        and exists (select prev_dependencies.source_column_info.value('(.//@Server)[1]'   ,'nvarchar(4000)') intersect select internal_col.info.value('(./@Server)[1]'   ,'nvarchar(4000)'))
  cross apply ( -- Becouse only column or only constant can be places in result row.
            select (select source_col.info.query('.') for xml raw('shp:root') ,type) as column_info
                 , null                                                              as const_info
              from internal_col.info.nodes('..//ColumnReference') as source_col(info)
            union all
            select null                                                         as column_info
                 , (select const.info.query('.') for xml raw('shp:root') ,type) as const_info
              from internal_col.info.nodes('..//Const') as const(info)
        ) as source_info
where source_info.column_info is null
    or (
        -- Except same node selected by '..//ColumnReference' from its sources. Sorry, I'm not so well to check it with XQuery simple.
            source_info.column_info.value('(//@Column)[1]' ,'nvarchar(4000)') <> internal_col.info.value('(./@Column)[1]' ,'nvarchar(4000)')
        and (select source_info.column_info.value('(//@Schema)[1]'   ,'nvarchar(4000)') intersect select internal_col.info.value('(./@Schema)[1]'   ,'nvarchar(4000)')) is null
        and (select source_info.column_info.value('(//@Database)[1]' ,'nvarchar(4000)') intersect select internal_col.info.value('(./@Database)[1]' ,'nvarchar(4000)')) is null
        and (select source_info.column_info.value('(//@Server)[1]'   ,'nvarchar(4000)') intersect select internal_col.info.value('(./@Server)[1]'   ,'nvarchar(4000)')) is null
      )
)

最后,是将 XML 转换为适当的人类文本的 select 语句。

select
  --  col_dep.target_column_info
  --, col_dep.source_column_info
  --, col_dep.const_info
    coalesce(col_dep.target_column_info.value('(.//shp:ColumnReference/@Server)[1]'   ,'nvarchar(4000)') + '.' ,'')
  + coalesce(col_dep.target_column_info.value('(.//shp:ColumnReference/@Database)[1]' ,'nvarchar(4000)') + '.' ,'')
  + coalesce(col_dep.target_column_info.value('(.//shp:ColumnReference/@Schema)[1]'   ,'nvarchar(4000)') + '.' ,'')
  + col_dep.target_column_info.value('(.//shp:ColumnReference/@Column)[1]' ,'nvarchar(4000)')
    as target_column_name
  , coalesce(col_dep.source_column_info.value('(.//shp:ColumnReference/@Server)[1]'   ,'nvarchar(4000)') + '.' ,'')
  + coalesce(col_dep.source_column_info.value('(.//shp:ColumnReference/@Database)[1]' ,'nvarchar(4000)') + '.' ,'')
  + coalesce(col_dep.source_column_info.value('(.//shp:ColumnReference/@Schema)[1]'   ,'nvarchar(4000)') + '.' ,'')
  + col_dep.source_column_info.value('(.//shp:ColumnReference/@Column)[1]' ,'nvarchar(4000)')
    as source_column_name
  , col_dep.const_info.value('(/shp:root/shp:Const/@ConstValue)[1]' ,'nvarchar(4000)')
    as const_value
from cte_column_dependencies as col_dep
order by col_dep.iteration_no ,target_column_name ,source_column_name
option (maxrecursion 512) -- It's an assurance from infinite loop.

【讨论】:

【参考方案4】:

此解决方案只能部分回答您的问题。它不适用于作为表达式的列。

您可以使用sys.dm_exec_describe_first_result_set 获取列信息:

@include_browse_information

如果设置为 1,则分析每个查询,就好像它在查询上有一个 FOR BROWSE 选项。返回附加的键列和源表信息。

CREATE TABLE txu(id INT, first_name VARCHAR(10), last_name VARCHAR(10));
CREATE TABLE txd(id INT, id_fk INT, address VARCHAR(100));

CREATE VIEW v_txu
AS
SELECT t.id AS PK_id,
       t.first_name  AS name,
       d.address,
       t.first_name + t.last_name AS name_full
FROM txu t
JOIN txd d
  ON t.id = d.id_fk

主要查询:

SELECT name, source_database, source_schema,
      source_table, source_column 
FROM sys.dm_exec_describe_first_result_set(N'SELECT * FROM v_txu', null, 1) ;  

输出:

+-----------+--------------------+---------------+--------------+---------------+
|   name    |   source_database  | source_schema | source_table | source_column |
+-----------+--------------------+---------------+--------------+---------------+
| PK_id     | fiddle_0f9d47226c4 | dbo           | txu          | id            |
| name      | fiddle_0f9d47226c4 | dbo           | txu          | first_name    |
| address   | fiddle_0f9d47226c4 | dbo           | txd          | address       |
| name_full | null               | null          | null         | null          |
+-----------+--------------------+---------------+--------------+---------------+

DBFiddleDemo

【讨论】:

1- 例如,从 2008 年到 2019 年,它在传输的数据库上无法正常工作 2- 基于联合和别名的视图也存在问题【参考方案5】:

所有你需要的都在视图的定义中提到。

所以我们可以通过以下步骤提取此信息:-

    将视图定义分配给字符串变量。

    用 (,) 逗号分隔。

    通过将 CROSS APPLY 与 XML 结合使用,将别名与 (+) 加号运算符分开。

    使用系统表获取原始表一样的准确信息。

演示:-

Create PROC psp_GetLevelDependsView (@sViewName varchar(200))
AS
BEGIN

    Declare @stringToSplit nvarchar(1000),
            @name NVARCHAR(255),
            @dependsTableName NVARCHAR(50),
            @pos INT

    Declare @returnList TABLE ([Name] [nvarchar] (500))

    SELECT TOP 1 @dependsTableName= table_schema + '.'+  TABLE_NAME
    FROM    INFORMATION_SCHEMA.VIEW_COLUMN_USAGE

    select @stringToSplit = definition
    from sys.objects     o
    join sys.sql_modules m on m.object_id = o.object_id
    where o.object_id = object_id( @sViewName)
     and o.type = 'V'

     WHILE CHARINDEX(',', @stringToSplit) > 0
     BEGIN
        SELECT @pos  = CHARINDEX(',', @stringToSplit)  
        SELECT @name = SUBSTRING(@stringToSplit, 1, @pos-1)

        INSERT INTO @returnList 
        SELECT @name

        SELECT @stringToSplit = SUBSTRING(@stringToSplit, @pos+1, LEN(@stringToSplit)-@pos)
     END

     INSERT INTO @returnList
     SELECT @stringToSplit

    select COLUMN_NAME  ,  b.Name as Expression
    Into #Temp
    FROM INFORMATION_SCHEMA.COLUMNS a , @returnList b
    WHERE TABLE_NAME= @sViewName
    And (b.Name) like '%' + ( COLUMN_NAME) + '%'

    SELECT A.COLUMN_NAME as column_name,  
         Split.a.value('.', 'VARCHAR(100)') AS depends_on_column_name ,   @dependsTableName as depends_on_table_name
         Into #temp2
     FROM  
     (
         SELECT COLUMN_NAME,  
             CAST ('<M>' + REPLACE(Expression, '+', '</M><M>') + '</M>' AS XML) AS Data  
         FROM  #Temp
     ) AS A CROSS APPLY Data.nodes ('/M') AS Split(a); 

    SELECT b.column_name , a.COLUMN_NAME as depends_on_column_name , b.depends_on_table_name
    FROM INFORMATION_SCHEMA.VIEW_COLUMN_USAGE a , #temp2 b
    WHERE VIEW_NAME= @sViewName
    and b.depends_on_column_name  like '%' + a.COLUMN_NAME + '%'

     drop table #Temp
     drop table #Temp2

 END

测试:-

exec psp_GetLevelDependsView 'vTest'

结果:-

column_name depends_on_column_name depends_on_table_name
----------- --------------------- --------------------
name        first_name            dbo.TEST
name        last_name             dbo.TEST
address     street                dbo.TEST
address     number                dbo.TEST

【讨论】:

感谢您的回答,尽管我更愿意避免解析视图 - 它很容易变得过于复杂。我希望这些信息存储在系统表中的某个地方

以上是关于如何在视图中获取列级依赖项的主要内容,如果未能解决你的问题,请参考以下文章

如何在extjs的列级获取嵌套的json数据

如何在 gradle 中获取依赖项的元数据?

如何在 Redshift 或 Postgres 的视图中获取列依赖关系?

Redshift - 如何显示 CASCADE 将删除的依赖项?

如何在适配器类中的列表视图项上获取动画?

如何检查 pom.xml 以获取更新的依赖项