Azure 数据工厂链活动

Posted

技术标签:

【中文标题】Azure 数据工厂链活动【英文标题】:Azure DataFactory chain activities 【发布时间】:2017-02-08 23:35:42 【问题描述】:

我是 DataFactory 的新手,在理解如何正确创建将在执行复制功能之前执行存储过程的管道时遇到问题。

存储的过程只是目标表的TRUNCATE,用作第二个活动的输出数据集。

从 DataFactory 文档中,它告诉我要首先执行存储的 proc,将 proc 的“输出”指定为第二个活动的“输入”。

但是,存储过程没有真正的“输出”。为了让它“工作”,我克隆了第二个活动的输出,更改了它的名称并将其设置为external=false 以使其通过配置错误,但这显然是一个总的错误。

这对我来说没有意义,至少在此存储过程执行TRUNCATE 操作的情况下,为什么甚至需要定义一个输出。

但是,当我尝试使用存储过程的输出作为附加输入时,我收到一个关于表名重复的错误。

如何在运行复制活动之前让TRUNCATE 存储过程活动成功执行(并完成)?

这是管道代码:


    "name": "Traffic CRM - System User Stage",
    "properties": 
        "description": "Move System User to Stage",
        "activities": [
            
                "type": "SqlServerStoredProcedure",
                "typeProperties": 
                    "storedProcedureName": "dbo.usp_Truncate_Traffic_Crm_SystemUser",
                    "storedProcedureParameters": 
                ,
                "outputs": [
                    
                        "name": "Smart App - usp Truncate System User"
                    
                ],
                "policy": 
                    "timeout": "01:00:00",
                    "concurrency": 1,
                    "retry": 3
                ,
                "scheduler": 
                    "frequency": "Day",
                    "interval": 1
                ,
                "name": "Smart App - SystemUser Truncate"
            ,
            
                "type": "Copy",
                "typeProperties": 
                    "source": 
                        "type": "SqlSource",
                        "sqlReaderQuery": "select * from [dbo].[Traffic_Crm_SystemUser]"
                    ,
                    "sink": 
                        "type": "SqlSink",
                        "writeBatchSize": 0,
                        "writeBatchTimeout": "00:00:00"
                    ,
                    "translator": 
                        "type": "TabularTranslator",
                        "columnMappings": "All columns mapped here"
                    
                ,
                "inputs": [
                    
                        "name": "Traffic CRM - SytemUser Stage"
                    
                ],
                "outputs": [
                    
                        "name": "Smart App - System User Stage Production"
                    
                ],
                "policy": 
                    "timeout": "1.00:00:00",
                    "concurrency": 1,
                    "executionPriorityOrder": "NewestFirst",
                    "style": "StartOfInterval",
                    "retry": 3,
                    "longRetry": 0,
                    "longRetryInterval": "00:00:00"
                ,
                "scheduler": 
                    "frequency": "Day",
                    "interval": 1
                ,
                "name": "Activity-0-[dbo]_[Traffic_Crm_SystemUser]->[dbo]_[Traffic_Crm_SystemUser]"
            
        ],
        "start": "2017-01-19T14:30:57.309Z",
        "end": "2099-12-31T05:00:00Z",
        "isPaused": false,
        "hubName": "stagingdatafactory1_hub",
        "pipelineMode": "Scheduled"
    

【问题讨论】:

【参考方案1】:

您的 SP 活动输出数据集,即“名称”:“Smart App - usp Truncate System User”应该为下一个活动输入。如果您对放入数据集的内容感到困惑,只需创建一个如下所示的虚拟数据集


    "name": "DummySPDS",
    "properties": 
        "published": false,
        "type": "SqlServerTable",
        "linkedServiceName": "SQLServerLS",
        "typeProperties": 
            "tableName": "dummyTable"
        ,
        "availability": 
            "frequency": "Hour",
            "interval": 1
        ,
        "IsExternal":"True"
    

这是完整的管道代码


    "name": "Traffic CRM - System User Stage",
    "properties": 
        "description": "Move System User to Stage",
        "activities": [
            
                "type": "SqlServerStoredProcedure",
                "typeProperties": 
                    "storedProcedureName": "dbo.usp_Truncate_Traffic_Crm_SystemUser",
                    "storedProcedureParameters": 
                ,
                "inputs": [
                    
                        "name": "DummySPDS"
                    
                ],
                "outputs": [
                    
                        "name": "Smart App - usp Truncate System User"
                    
                ],
                "policy": 
                    "timeout": "01:00:00",
                    "concurrency": 1,
                    "retry": 3
                ,
                "scheduler": 
                    "frequency": "Day",
                    "interval": 1
                ,
                "name": "Smart App - SystemUser Truncate"
            ,
            
                "type": "Copy",
                "typeProperties": 
                    "source": 
                        "type": "SqlSource",
                        "sqlReaderQuery": "select * from [dbo].[Traffic_Crm_SystemUser]"
                    ,
                    "sink": 
                        "type": "SqlSink",
                        "writeBatchSize": 0,
                        "writeBatchTimeout": "00:00:00"
                    ,
                    "translator": 
                        "type": "TabularTranslator",
                        "columnMappings": "All columns mapped here"
                    
                ,
                "inputs": [
                    
                        "name": "Smart App - usp Truncate System User"
                    
                ],
                "outputs": [
                    
                        "name": "Smart App - System User Stage Production"
                    
                ],
                "policy": 
                    "timeout": "1.00:00:00",
                    "concurrency": 1,
                    "executionPriorityOrder": "NewestFirst",
                    "style": "StartOfInterval",
                    "retry": 3,
                    "longRetry": 0,
                    "longRetryInterval": "00:00:00"
                ,
                "scheduler": 
                    "frequency": "Day",
                    "interval": 1
                ,
                "name": "Activity-0-[dbo]_[Traffic_Crm_SystemUser]->[dbo]_[Traffic_Crm_SystemUser]"
            
        ],
        "start": "2017-01-19T14:30:57.309Z",
        "end": "2099-12-31T05:00:00Z",
        "isPaused": false,
        "hubName": "stagingdatafactory1_hub",
        "pipelineMode": "Scheduled"

【讨论】:

我按照描述添加了一个虚拟数据集,但是,第二个活动丢失了复制活动所需的映射。然后我尝试向inputs 添加第二项,但收到duplicate object key referenced table name 的错误,即使我的虚拟数据集不包含相同的表名。这是我曾经建议将第二个 name 对象添加到输入的文章:***.com/questions/35970079/… 我提供了完整的管道代码,虽然没有在 Azure 上测试,但应该可以工作。

以上是关于Azure 数据工厂链活动的主要内容,如果未能解决你的问题,请参考以下文章

Azure 数据工厂 - 删除活动时出错

Azure 数据工厂复制活动失败。用户登录失败

Azure 数据工厂 - 为每个活动获取内部元数据

如何更改 Azure 数据工厂中管道活动的集成运行时

如何使用 Azure 数据工厂中的每个活动合并文件

Azure 数据工厂:执行管道活动无法引用调用管道,需要循环行为