Azure 数据工厂将数据流映射到 CSV 接收器导致零字节文件
Posted
技术标签:
【中文标题】Azure 数据工厂将数据流映射到 CSV 接收器导致零字节文件【英文标题】:Azure Data Factory Mapping Data Flow to CSV sink results in zero-byte files 【发布时间】:2020-04-14 21:06:48 【问题描述】:我正在提高我的 Azure 数据工厂能力,比较复制活动性能与映射数据流写入 Azure Blob 存储中的单个 CSV 文件。
当我通过 Azure Blob 存储链接服务 (azureBlobLinkedService) 通过数据集 (azureBlobSingleCSVFileNameDataset) 写入单个 CSV 时,使用复制活动在我期望的 blob 存储容器中获取输出。例如,/output/csv/singleFiles 文件夹下的容器 MyContainer 中的 MyData.csv 的输出文件。
当我使用映射数据流通过相同的 Blob 存储链接服务但通过不同的数据集 (azureBlobSingleCSVNoFileNameDataset) 写入单个 CSV 时,我得到以下信息:
MyContainer/output/csv/singleFiles(零长度文件) MyContainer/output/csv/singleFiles/MyData.csv(包含我期望的数据)我不明白为什么在使用映射数据流时会生成零长度文件。
这是我的源文件:
linkedService/azureBlobLinkedService
"name": "azureBlobLinkedService",
"type": "Microsoft.DataFactory/factories/linkedservices",
"properties":
"type": "AzureBlobStorage",
"parameters":
"azureBlobConnectionStringSecretName":
"type": "string"
,
"annotations": [],
"typeProperties":
"connectionString":
"type": "AzureKeyVaultSecret",
"store":
"referenceName": "AzureKeyVaultLinkedService",
"type": "LinkedServiceReference"
,
"secretName": "@linkedService().azureBlobConnectionStringSecretName"
数据集/azureBlobSingleCSVFileNameDataset
"name": "azureBlobSingleCSVFileNameDataset",
"properties":
"linkedServiceName":
"referenceName": "azureBlobLinkedService",
"type": "LinkedServiceReference",
"parameters":
"azureBlobConnectionStringSecretName":
"value": "@dataset().azureBlobConnectionStringSecretName",
"type": "Expression"
,
"parameters":
"azureBlobConnectionStringSecretName":
"type": "string"
,
"azureBlobSingleCSVFileName":
"type": "string"
,
"azureBlobSingleCSVFolderPath":
"type": "string"
,
"azureBlobSingleCSVContainerName":
"type": "string"
,
"annotations": [],
"type": "DelimitedText",
"typeProperties":
"location":
"type": "AzureBlobStorageLocation",
"fileName":
"value": "@dataset().azureBlobSingleCSVFileName",
"type": "Expression"
,
"folderPath":
"value": "@dataset().azureBlobSingleCSVFolderPath",
"type": "Expression"
,
"container":
"value": "@dataset().azureBlobSingleCSVContainerName",
"type": "Expression"
,
"columnDelimiter": ",",
"escapeChar": "\\",
"firstRowAsHeader": true,
"quoteChar": "\""
,
"schema": []
,
"type": "Microsoft.DataFactory/factories/datasets"
管道/Azure SQL 表到 Blob 单个 CSV 复制管道(这会产生预期的结果)
"name": "Azure SQL Table to Blob Single CSV Copy Pipeline",
"properties":
"activities": [
"name": "Copy Azure SQL Table to Blob Single CSV",
"type": "Copy",
"dependsOn": [],
"policy":
"timeout": "7.00:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
,
"userProperties": [],
"typeProperties":
"source":
"type": "AzureSqlSource",
"queryTimeout": "02:00:00"
,
"sink":
"type": "DelimitedTextSink",
"storeSettings":
"type": "AzureBlobStorageWriteSettings"
,
"formatSettings":
"type": "DelimitedTextWriteSettings",
"quoteAllText": true,
"fileExtension": ".csv"
,
"enableStaging": false
,
"inputs": [
"referenceName": "azureSqlDatabaseTableDataset",
"type": "DatasetReference",
"parameters":
"azureSqlDatabaseConnectionStringSecretName":
"value": "@pipeline().parameters.sourceAzureSqlDatabaseConnectionStringSecretName",
"type": "Expression"
,
"azureSqlDatabaseTableSchemaName":
"value": "@pipeline().parameters.sourceAzureSqlDatabaseTableSchemaName",
"type": "Expression"
,
"azureSqlDatabaseTableTableName":
"value": "@pipeline().parameters.sourceAzureSqlDatabaseTableTableName",
"type": "Expression"
],
"outputs": [
"referenceName": "azureBlobSingleCSVFileNameDataset",
"type": "DatasetReference",
"parameters":
"azureBlobConnectionStringSecretName":
"value": "@pipeline().parameters.sinkAzureBlobConnectionStringSecretName",
"type": "Expression"
,
"azureBlobSingleCSVFileName":
"value": "@pipeline().parameters.sinkAzureBlobSingleCSVFileName",
"type": "Expression"
,
"azureBlobSingleCSVFolderPath":
"value": "@pipeline().parameters.sinkAzureBlobSingleCSVFolderPath",
"type": "Expression"
,
"azureBlobSingleCSVContainerName":
"value": "@pipeline().parameters.sinkAzureBlobSingleCSVContainerName",
"type": "Expression"
]
],
"parameters":
"sourceAzureSqlDatabaseConnectionStringSecretName":
"type": "string"
,
"sourceAzureSqlDatabaseTableSchemaName":
"type": "string"
,
"sourceAzureSqlDatabaseTableTableName":
"type": "string"
,
"sinkAzureBlobConnectionStringSecretName":
"type": "string"
,
"sinkAzureBlobSingleCSVContainerName":
"type": "string"
,
"sinkAzureBlobSingleCSVFolderPath":
"type": "string"
,
"sinkAzureBlobSingleCSVFileName":
"type": "string"
,
"annotations": []
,
"type": "Microsoft.DataFactory/factories/pipelines"
dataset/azureBlobSingleCSVNoFileNameDataset:(映射数据流需要数据集中没有文件名,在映射数据流中设置)
"name": "azureBlobSingleCSVNoFileNameDataset",
"properties":
"linkedServiceName":
"referenceName": "azureBlobLinkedService",
"type": "LinkedServiceReference",
"parameters":
"azureBlobConnectionStringSecretName":
"value": "@dataset().azureBlobConnectionStringSecretName",
"type": "Expression"
,
"parameters":
"azureBlobConnectionStringSecretName":
"type": "string"
,
"azureBlobSingleCSVFolderPath":
"type": "string"
,
"azureBlobSingleCSVContainerName":
"type": "string"
,
"annotations": [],
"type": "DelimitedText",
"typeProperties":
"location":
"type": "AzureBlobStorageLocation",
"folderPath":
"value": "@dataset().azureBlobSingleCSVFolderPath",
"type": "Expression"
,
"container":
"value": "@dataset().azureBlobSingleCSVContainerName",
"type": "Expression"
,
"columnDelimiter": ",",
"escapeChar": "\\",
"firstRowAsHeader": true,
"quoteChar": "\""
,
"schema": []
,
"type": "Microsoft.DataFactory/factories/datasets"
数据流/azureSqlDatabaseTableToAzureBlobSingleCSVDataFlow
"name": "azureSqlDatabaseTableToAzureBlobSingleCSVDataFlow",
"properties":
"type": "MappingDataFlow",
"typeProperties":
"sources": [
"dataset":
"referenceName": "azureSqlDatabaseTableDataset",
"type": "DatasetReference"
,
"name": "readFromAzureSqlDatabase"
],
"sinks": [
"dataset":
"referenceName": "azureBlobSingleCSVNoFileNameDataset",
"type": "DatasetReference"
,
"name": "writeToAzureBlobSingleCSV"
],
"transformations": [
"name": "enrichWithRuntimeMetadata"
],
"script": "\nparameters\n\tsourceConnectionSecretName as string,\n\tsinkConnectionStringSecretName as string,\n\tsourceObjectName as string,\n\tsinkObjectName as string,\n\tdataFactoryName as string,\n\tdataFactoryPipelineName as string,\n\tdataFactoryPipelineRunId as string,\n\tsinkFileNameNoPath as string\n\nsource(allowSchemaDrift: true,\n\tvalidateSchema: false,\n\tisolationLevel: 'READ_UNCOMMITTED',\n\tformat: 'table') ~> readFromAzureSqlDatabase\nreadFromAzureSqlDatabase derive(__sourceConnectionStringSecretName = $sourceConnectionSecretName,\n\t\t__sinkConnectionStringSecretName = $sinkConnectionStringSecretName,\n\t\t__sourceObjectName = $sourceObjectName,\n\t\t__sinkObjectName = $sinkObjectName,\n\t\t__dataFactoryName = $dataFactoryName,\n\t\t__dataFactoryPipelineName = $dataFactoryPipelineName,\n\t\t__dataFactoryPipelineRunId = $dataFactoryPipelineRunId) ~> enrichWithRuntimeMetadata\nenrichWithRuntimeMetadata sink(allowSchemaDrift: true,\n\tvalidateSchema: false,\n\tpartitionFileNames:[($sinkFileNameNoPath)],\n\tpartitionBy('hash', 1),\n\tquoteAll: true) ~> writeToAzureBlobSingleCSV"
管道/Azure SQL 表到 Blob 单个 CSV 数据流管道(这会产生预期的结果,以及文件夹路径中的零字节文件。)
"name": "Azure SQL Table to Blob Single CSV Data Flow Pipeline",
"properties":
"activities": [
"name": "Copy Sql Database Table To Blob Single CSV Data Flow",
"type": "ExecuteDataFlow",
"dependsOn": [],
"policy":
"timeout": "7.00:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
,
"userProperties": [],
"typeProperties":
"dataflow":
"referenceName": "azureSqlDatabaseTableToAzureBlobSingleCSVDataFlow",
"type": "DataFlowReference",
"parameters":
"sourceConnectionSecretName":
"value": "'@pipeline().parameters.sourceAzureSqlDatabaseConnectionStringSecretName'",
"type": "Expression"
,
"sinkConnectionStringSecretName":
"value": "'@pipeline().parameters.sinkAzureBlobConnectionStringSecretName'",
"type": "Expression"
,
"sourceObjectName":
"value": "'@concat('[', pipeline().parameters.sourceAzureSqlDatabaseTableSchemaName, '].[', pipeline().parameters.sourceAzureSqlDatabaseTableTableName, ']')'",
"type": "Expression"
,
"sinkObjectName":
"value": "'@concat(pipeline().parameters.sinkAzureBlobSingleCSVContainerName, '/', pipeline().parameters.sinkAzureBlobSingleCSVFolderPath, '/', \npipeline().parameters.sinkAzureBlobSingleCSVFileName)'",
"type": "Expression"
,
"dataFactoryName":
"value": "'@pipeline().DataFactory'",
"type": "Expression"
,
"dataFactoryPipelineName":
"value": "'@pipeline().Pipeline'",
"type": "Expression"
,
"dataFactoryPipelineRunId":
"value": "'@pipeline().RunId'",
"type": "Expression"
,
"sinkFileNameNoPath":
"value": "'@pipeline().parameters.sinkAzureBlobSingleCSVFileName'",
"type": "Expression"
,
"datasetParameters":
"readFromAzureSqlDatabase":
"azureSqlDatabaseConnectionStringSecretName":
"value": "@pipeline().parameters.sourceAzureSqlDatabaseConnectionStringSecretName",
"type": "Expression"
,
"azureSqlDatabaseTableSchemaName":
"value": "@pipeline().parameters.sourceAzureSqlDatabaseTableSchemaName",
"type": "Expression"
,
"azureSqlDatabaseTableTableName":
"value": "@pipeline().parameters.sourceAzureSqlDatabaseTableTableName",
"type": "Expression"
,
"writeToAzureBlobSingleCSV":
"azureBlobConnectionStringSecretName":
"value": "@pipeline().parameters.sinkAzureBlobConnectionStringSecretName",
"type": "Expression"
,
"azureBlobSingleCSVFolderPath":
"value": "@pipeline().parameters.sinkAzureBlobSingleCSVFolderPath",
"type": "Expression"
,
"azureBlobSingleCSVContainerName":
"value": "@pipeline().parameters.sinkAzureBlobSingleCSVContainerName",
"type": "Expression"
,
"compute":
"coreCount": 8,
"computeType": "General"
],
"parameters":
"sourceAzureSqlDatabaseConnectionStringSecretName":
"type": "string"
,
"sourceAzureSqlDatabaseTableSchemaName":
"type": "string"
,
"sourceAzureSqlDatabaseTableTableName":
"type": "string"
,
"sinkAzureBlobConnectionStringSecretName":
"type": "string"
,
"sinkAzureBlobSingleCSVContainerName":
"type": "string"
,
"sinkAzureBlobSingleCSVFolderPath":
"type": "string"
,
"sinkAzureBlobSingleCSVFileName":
"type": "string"
,
"annotations": []
,
"type": "Microsoft.DataFactory/factories/pipelines"
【问题讨论】:
【参考方案1】:获得 0 长度(字节)文件的原因意味着,虽然您的管道可能已成功运行,但它没有返回或产生任何输出。
更好的技术之一是预览每个阶段的输出,以确保每个阶段都有预期的输出。
【讨论】:
(这会产生预期的结果,加上文件夹路径中的零字节文件。)我得到的输出是我期望在每个文件夹点加上零字节文件。当我通过相同的 Blob 存储链接服务,但通过不同的数据集 (azureBlobSingleCSVNoFileNameDataset),使用映射数据流我得到以下内容: MyContainer/output/csv/singleFiles(零长度文件) MyContainer/output/csv/singleFiles/MyData.csv(包含我期望的数据)我不明白为什么在使用映射数据流时会生成零长度文件。 关于这个问题的任何更新?我也有同样的经历。以上是关于Azure 数据工厂将数据流映射到 CSV 接收器导致零字节文件的主要内容,如果未能解决你的问题,请参考以下文章
Azure 数据工厂附加大量与 csv 文件具有不同架构的文件
到 Azure SQL 数据库的数据流输出仅包含 Azure 数据工厂中的 NULL 数据
在 Azure 数据工厂中成功完成数据流后,为啥没有将数据传输到我的接收器表?
将具有不同架构(列)的多个文件 (.csv) 合并/合并为单个文件 .csv - Azure 数据工厂