Azure 数据工厂将数据流映射到 CSV 接收器导致零字节文件

Posted

技术标签:

【中文标题】Azure 数据工厂将数据流映射到 CSV 接收器导致零字节文件【英文标题】:Azure Data Factory Mapping Data Flow to CSV sink results in zero-byte files 【发布时间】:2020-04-14 21:06:48 【问题描述】:

我正在提高我的 Azure 数据工厂能力,比较复制活动性能与映射数据流写入 Azure Blob 存储中的单个 CSV 文件。

当我通过 Azure Blob 存储链接服务 (azureBlobLinkedService) 通过数据集 (azureBlobSingleCSVFileNameDataset) 写入单个 CSV 时,使用复制活动在我期望的 blob 存储容器中获取输出。例如,/output/csv/singleFiles 文件夹下的容器 MyContainer 中的 MyData.csv 的输出文件。

当我使用映射数据流通过相同的 Blob 存储链接服务但通过不同的数据集 (azureBlobSingleCSVNoFileNameDataset) 写入单个 CSV 时,我得到以下信息:

MyContainer/output/csv/singleFiles(零长度文件) MyContainer/output/csv/singleFiles/MyData.csv(包含我期望的数据)

我不明白为什么在使用映射数据流时会生成零长度文件。

这是我的源文件:

linkedService/azureBlobLinkedService


    "name": "azureBlobLinkedService",
    "type": "Microsoft.DataFactory/factories/linkedservices",
    "properties": 
        "type": "AzureBlobStorage",
        "parameters": 
            "azureBlobConnectionStringSecretName": 
                "type": "string"
            
        ,
        "annotations": [],
        "typeProperties": 
            "connectionString": 
                "type": "AzureKeyVaultSecret",
                "store": 
                    "referenceName": "AzureKeyVaultLinkedService",
                    "type": "LinkedServiceReference"
                ,
                "secretName": "@linkedService().azureBlobConnectionStringSecretName"
            
        
    

数据集/azureBlobSingleCSVFileNameDataset


    "name": "azureBlobSingleCSVFileNameDataset",
    "properties": 
        "linkedServiceName": 
            "referenceName": "azureBlobLinkedService",
            "type": "LinkedServiceReference",
            "parameters": 
                "azureBlobConnectionStringSecretName": 
                    "value": "@dataset().azureBlobConnectionStringSecretName",
                    "type": "Expression"
                
            
        ,
        "parameters": 
            "azureBlobConnectionStringSecretName": 
                "type": "string"
            ,
            "azureBlobSingleCSVFileName": 
                "type": "string"
            ,
            "azureBlobSingleCSVFolderPath": 
                "type": "string"
            ,
            "azureBlobSingleCSVContainerName": 
                "type": "string"
            
        ,
        "annotations": [],
        "type": "DelimitedText",
        "typeProperties": 
            "location": 
                "type": "AzureBlobStorageLocation",
                "fileName": 
                    "value": "@dataset().azureBlobSingleCSVFileName",
                    "type": "Expression"
                ,
                "folderPath": 
                    "value": "@dataset().azureBlobSingleCSVFolderPath",
                    "type": "Expression"
                ,
                "container": 
                    "value": "@dataset().azureBlobSingleCSVContainerName",
                    "type": "Expression"
                
            ,
            "columnDelimiter": ",",
            "escapeChar": "\\",
            "firstRowAsHeader": true,
            "quoteChar": "\""
        ,
        "schema": []
    ,
    "type": "Microsoft.DataFactory/factories/datasets"

管道/Azure SQL 表到 Blob 单个 CSV 复制管道(这会产生预期的结果)


    "name": "Azure SQL Table to Blob Single CSV Copy Pipeline",
    "properties": 
        "activities": [
            
                "name": "Copy Azure SQL Table to Blob Single CSV",
                "type": "Copy",
                "dependsOn": [],
                "policy": 
                    "timeout": "7.00:00:00",
                    "retry": 0,
                    "retryIntervalInSeconds": 30,
                    "secureOutput": false,
                    "secureInput": false
                ,
                "userProperties": [],
                "typeProperties": 
                    "source": 
                        "type": "AzureSqlSource",
                        "queryTimeout": "02:00:00"
                    ,
                    "sink": 
                        "type": "DelimitedTextSink",
                        "storeSettings": 
                            "type": "AzureBlobStorageWriteSettings"
                        ,
                        "formatSettings": 
                            "type": "DelimitedTextWriteSettings",
                            "quoteAllText": true,
                            "fileExtension": ".csv"
                        
                    ,
                    "enableStaging": false
                ,
                "inputs": [
                    
                        "referenceName": "azureSqlDatabaseTableDataset",
                        "type": "DatasetReference",
                        "parameters": 
                            "azureSqlDatabaseConnectionStringSecretName": 
                                "value": "@pipeline().parameters.sourceAzureSqlDatabaseConnectionStringSecretName",
                                "type": "Expression"
                            ,
                            "azureSqlDatabaseTableSchemaName": 
                                "value": "@pipeline().parameters.sourceAzureSqlDatabaseTableSchemaName",
                                "type": "Expression"
                            ,
                            "azureSqlDatabaseTableTableName": 
                                "value": "@pipeline().parameters.sourceAzureSqlDatabaseTableTableName",
                                "type": "Expression"
                            
                        
                    
                ],
                "outputs": [
                    
                        "referenceName": "azureBlobSingleCSVFileNameDataset",
                        "type": "DatasetReference",
                        "parameters": 
                            "azureBlobConnectionStringSecretName": 
                                "value": "@pipeline().parameters.sinkAzureBlobConnectionStringSecretName",
                                "type": "Expression"
                            ,
                            "azureBlobSingleCSVFileName": 
                                "value": "@pipeline().parameters.sinkAzureBlobSingleCSVFileName",
                                "type": "Expression"
                            ,
                            "azureBlobSingleCSVFolderPath": 
                                "value": "@pipeline().parameters.sinkAzureBlobSingleCSVFolderPath",
                                "type": "Expression"
                            ,
                            "azureBlobSingleCSVContainerName": 
                                "value": "@pipeline().parameters.sinkAzureBlobSingleCSVContainerName",
                                "type": "Expression"
                            
                        
                    
                ]
            
        ],
        "parameters": 
            "sourceAzureSqlDatabaseConnectionStringSecretName": 
                "type": "string"
            ,
            "sourceAzureSqlDatabaseTableSchemaName": 
                "type": "string"
            ,
            "sourceAzureSqlDatabaseTableTableName": 
                "type": "string"
            ,
            "sinkAzureBlobConnectionStringSecretName": 
                "type": "string"
            ,
            "sinkAzureBlobSingleCSVContainerName": 
                "type": "string"
            ,
            "sinkAzureBlobSingleCSVFolderPath": 
                "type": "string"
            ,
            "sinkAzureBlobSingleCSVFileName": 
                "type": "string"
            
        ,
        "annotations": []
    ,
    "type": "Microsoft.DataFactory/factories/pipelines"

dataset/azureBlobSingleCSVNoFileNameDataset:(映射数据流需要数据集中没有文件名,在映射数据流中设置)


    "name": "azureBlobSingleCSVNoFileNameDataset",
    "properties": 
        "linkedServiceName": 
            "referenceName": "azureBlobLinkedService",
            "type": "LinkedServiceReference",
            "parameters": 
                "azureBlobConnectionStringSecretName": 
                    "value": "@dataset().azureBlobConnectionStringSecretName",
                    "type": "Expression"
                
            
        ,
        "parameters": 
            "azureBlobConnectionStringSecretName": 
                "type": "string"
            ,
            "azureBlobSingleCSVFolderPath": 
                "type": "string"
            ,
            "azureBlobSingleCSVContainerName": 
                "type": "string"
            
        ,
        "annotations": [],
        "type": "DelimitedText",
        "typeProperties": 
            "location": 
                "type": "AzureBlobStorageLocation",
                "folderPath": 
                    "value": "@dataset().azureBlobSingleCSVFolderPath",
                    "type": "Expression"
                ,
                "container": 
                    "value": "@dataset().azureBlobSingleCSVContainerName",
                    "type": "Expression"
                
            ,
            "columnDelimiter": ",",
            "escapeChar": "\\",
            "firstRowAsHeader": true,
            "quoteChar": "\""
        ,
        "schema": []
    ,
    "type": "Microsoft.DataFactory/factories/datasets"

数据流/azureSqlDatabaseTableToAzureBlobSingleCSVDataFlow


    "name": "azureSqlDatabaseTableToAzureBlobSingleCSVDataFlow",
    "properties": 
        "type": "MappingDataFlow",
        "typeProperties": 
            "sources": [
                
                    "dataset": 
                        "referenceName": "azureSqlDatabaseTableDataset",
                        "type": "DatasetReference"
                    ,
                    "name": "readFromAzureSqlDatabase"
                
            ],
            "sinks": [
                
                    "dataset": 
                        "referenceName": "azureBlobSingleCSVNoFileNameDataset",
                        "type": "DatasetReference"
                    ,
                    "name": "writeToAzureBlobSingleCSV"
                
            ],
            "transformations": [
                
                    "name": "enrichWithRuntimeMetadata"
                
            ],
            "script": "\nparameters\n\tsourceConnectionSecretName as string,\n\tsinkConnectionStringSecretName as string,\n\tsourceObjectName as string,\n\tsinkObjectName as string,\n\tdataFactoryName as string,\n\tdataFactoryPipelineName as string,\n\tdataFactoryPipelineRunId as string,\n\tsinkFileNameNoPath as string\n\nsource(allowSchemaDrift: true,\n\tvalidateSchema: false,\n\tisolationLevel: 'READ_UNCOMMITTED',\n\tformat: 'table') ~> readFromAzureSqlDatabase\nreadFromAzureSqlDatabase derive(__sourceConnectionStringSecretName = $sourceConnectionSecretName,\n\t\t__sinkConnectionStringSecretName = $sinkConnectionStringSecretName,\n\t\t__sourceObjectName = $sourceObjectName,\n\t\t__sinkObjectName = $sinkObjectName,\n\t\t__dataFactoryName = $dataFactoryName,\n\t\t__dataFactoryPipelineName = $dataFactoryPipelineName,\n\t\t__dataFactoryPipelineRunId = $dataFactoryPipelineRunId) ~> enrichWithRuntimeMetadata\nenrichWithRuntimeMetadata sink(allowSchemaDrift: true,\n\tvalidateSchema: false,\n\tpartitionFileNames:[($sinkFileNameNoPath)],\n\tpartitionBy('hash', 1),\n\tquoteAll: true) ~> writeToAzureBlobSingleCSV"
        
    

管道/Azure SQL 表到 Blob 单个 CSV 数据流管道(这会产生预期的结果,以及文件夹路径中的零字节文件。)


    "name": "Azure SQL Table to Blob Single CSV Data Flow Pipeline",
    "properties": 
        "activities": [
            
                "name": "Copy Sql Database Table To Blob Single CSV Data Flow",
                "type": "ExecuteDataFlow",
                "dependsOn": [],
                "policy": 
                    "timeout": "7.00:00:00",
                    "retry": 0,
                    "retryIntervalInSeconds": 30,
                    "secureOutput": false,
                    "secureInput": false
                ,
                "userProperties": [],
                "typeProperties": 
                    "dataflow": 
                        "referenceName": "azureSqlDatabaseTableToAzureBlobSingleCSVDataFlow",
                        "type": "DataFlowReference",
                        "parameters": 
                            "sourceConnectionSecretName": 
                                "value": "'@pipeline().parameters.sourceAzureSqlDatabaseConnectionStringSecretName'",
                                "type": "Expression"
                            ,
                            "sinkConnectionStringSecretName": 
                                "value": "'@pipeline().parameters.sinkAzureBlobConnectionStringSecretName'",
                                "type": "Expression"
                            ,
                            "sourceObjectName": 
                                "value": "'@concat('[', pipeline().parameters.sourceAzureSqlDatabaseTableSchemaName, '].[', pipeline().parameters.sourceAzureSqlDatabaseTableTableName, ']')'",
                                "type": "Expression"
                            ,
                            "sinkObjectName": 
                                "value": "'@concat(pipeline().parameters.sinkAzureBlobSingleCSVContainerName, '/', pipeline().parameters.sinkAzureBlobSingleCSVFolderPath, '/', \npipeline().parameters.sinkAzureBlobSingleCSVFileName)'",
                                "type": "Expression"
                            ,
                            "dataFactoryName": 
                                "value": "'@pipeline().DataFactory'",
                                "type": "Expression"
                            ,
                            "dataFactoryPipelineName": 
                                "value": "'@pipeline().Pipeline'",
                                "type": "Expression"
                            ,
                            "dataFactoryPipelineRunId": 
                                "value": "'@pipeline().RunId'",
                                "type": "Expression"
                            ,
                            "sinkFileNameNoPath": 
                                "value": "'@pipeline().parameters.sinkAzureBlobSingleCSVFileName'",
                                "type": "Expression"
                            
                        ,
                        "datasetParameters": 
                            "readFromAzureSqlDatabase": 
                                "azureSqlDatabaseConnectionStringSecretName": 
                                    "value": "@pipeline().parameters.sourceAzureSqlDatabaseConnectionStringSecretName",
                                    "type": "Expression"
                                ,
                                "azureSqlDatabaseTableSchemaName": 
                                    "value": "@pipeline().parameters.sourceAzureSqlDatabaseTableSchemaName",
                                    "type": "Expression"
                                ,
                                "azureSqlDatabaseTableTableName": 
                                    "value": "@pipeline().parameters.sourceAzureSqlDatabaseTableTableName",
                                    "type": "Expression"
                                
                            ,
                            "writeToAzureBlobSingleCSV": 
                                "azureBlobConnectionStringSecretName": 
                                    "value": "@pipeline().parameters.sinkAzureBlobConnectionStringSecretName",
                                    "type": "Expression"
                                ,
                                "azureBlobSingleCSVFolderPath": 
                                    "value": "@pipeline().parameters.sinkAzureBlobSingleCSVFolderPath",
                                    "type": "Expression"
                                ,
                                "azureBlobSingleCSVContainerName": 
                                    "value": "@pipeline().parameters.sinkAzureBlobSingleCSVContainerName",
                                    "type": "Expression"
                                
                            
                        
                    ,
                    "compute": 
                        "coreCount": 8,
                        "computeType": "General"
                    
                
            
        ],
        "parameters": 
            "sourceAzureSqlDatabaseConnectionStringSecretName": 
                "type": "string"
            ,
            "sourceAzureSqlDatabaseTableSchemaName": 
                "type": "string"
            ,
            "sourceAzureSqlDatabaseTableTableName": 
                "type": "string"
            ,
            "sinkAzureBlobConnectionStringSecretName": 
                "type": "string"
            ,
            "sinkAzureBlobSingleCSVContainerName": 
                "type": "string"
            ,
            "sinkAzureBlobSingleCSVFolderPath": 
                "type": "string"
            ,
            "sinkAzureBlobSingleCSVFileName": 
                "type": "string"
            
        ,
        "annotations": []
    ,
    "type": "Microsoft.DataFactory/factories/pipelines"

【问题讨论】:

【参考方案1】:

获得 0 长度(字节)文件的原因意味着,虽然您的管道可能已成功运行,但它没有返回或产生任何输出。

更好的技术之一是预览每个阶段的输出,以确保每个阶段都有预期的输出。

【讨论】:

(这会产生预期的结果,加上文件夹路径中的零字节文件。)我得到的输出是我期望在每个文件夹点加上零字节文件。当我通过相同的 Blob 存储链接服务,但通过不同的数据集 (azureBlobSingleCSVNoFileNameDataset),使用映射数据流我得到以下内容: MyContainer/output/csv/singleFiles(零长度文件) MyContainer/output/csv/singleFiles/MyData.csv(包含我期望的数据)我不明白为什么在使用映射数据流时会生成零长度文件。 关于这个问题的任何更新?我也有同样的经历。

以上是关于Azure 数据工厂将数据流映射到 CSV 接收器导致零字节文件的主要内容,如果未能解决你的问题,请参考以下文章

Azure 数据工厂附加大量与 csv 文件具有不同架构的文件

到 Azure SQL 数据库的数据流输出仅包含 Azure 数据工厂中的 NULL 数据

在 Azure 数据工厂中成功完成数据流后,为啥没有将数据传输到我的接收器表?

将具有不同架构(列)的多个文件 (.csv) 合并/合并为单个文件 .csv - Azure 数据工厂

在映射数据流(Azure 数据工厂)内的表达式函数中创建动态 Json

是否可以使用 Azure Synapse 和 Azure 数据工厂将 CSV 转换为 XML?