Azure数据工厂映射数据流到CSV接收器的结果为零字节文件
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Azure数据工厂映射数据流到CSV接收器的结果为零字节文件相关的知识,希望对你有一定的参考价值。
[我正在加强Azure数据工厂的功能,比较复制活动性能与将数据流映射到Azure Blob存储中的单个CSV文件的情况。
[当我通过数据集(azureBlobSingleCSVFileNameDataset)通过Azure Blob存储链接服务(azureBlobLinkedService)写入单个CSV时,请使用复制活动在期望的Blob存储容器中获取输出。例如,在容器MyContainer的/ output / csv / singleFiles文件夹下的MyData.csv输出文件。
[当我通过相同的Blob存储链接服务但通过不同的数据集(azureBlobSingleCSVNoFileNameDataset)通过映射数据流写入单个CSV时,我得到以下信息:
- MyContainer / output / csv / singleFiles(零长度文件)
- MyContainer / output / csv / singleFiles / MyData.csv(包含我期望的数据)
我不明白为什么使用映射数据流时会生成零长度的文件。
这是我的源文件:
linkedService / azureBlobLinkedService
{
"name": "azureBlobLinkedService",
"type": "Microsoft.DataFactory/factories/linkedservices",
"properties": {
"type": "AzureBlobStorage",
"parameters": {
"azureBlobConnectionStringSecretName": {
"type": "string"
}
},
"annotations": [],
"typeProperties": {
"connectionString": {
"type": "AzureKeyVaultSecret",
"store": {
"referenceName": "AzureKeyVaultLinkedService",
"type": "LinkedServiceReference"
},
"secretName": "@{linkedService().azureBlobConnectionStringSecretName}"
}
}
}
}
数据集/ azureBlobSingleCSVFileName数据集
{
"name": "azureBlobSingleCSVFileNameDataset",
"properties": {
"linkedServiceName": {
"referenceName": "azureBlobLinkedService",
"type": "LinkedServiceReference",
"parameters": {
"azureBlobConnectionStringSecretName": {
"value": "@dataset().azureBlobConnectionStringSecretName",
"type": "Expression"
}
}
},
"parameters": {
"azureBlobConnectionStringSecretName": {
"type": "string"
},
"azureBlobSingleCSVFileName": {
"type": "string"
},
"azureBlobSingleCSVFolderPath": {
"type": "string"
},
"azureBlobSingleCSVContainerName": {
"type": "string"
}
},
"annotations": [],
"type": "DelimitedText",
"typeProperties": {
"location": {
"type": "AzureBlobStorageLocation",
"fileName": {
"value": "@dataset().azureBlobSingleCSVFileName",
"type": "Expression"
},
"folderPath": {
"value": "@dataset().azureBlobSingleCSVFolderPath",
"type": "Expression"
},
"container": {
"value": "@dataset().azureBlobSingleCSVContainerName",
"type": "Expression"
}
},
"columnDelimiter": ",",
"escapeChar": "\",
"firstRowAsHeader": true,
"quoteChar": """
},
"schema": []
},
"type": "Microsoft.DataFactory/factories/datasets"
}
管道/ Azure SQL表到Blob单个CSV复制管道(这将产生预期的结果)
{
"name": "Azure SQL Table to Blob Single CSV Copy Pipeline",
"properties": {
"activities": [
{
"name": "Copy Azure SQL Table to Blob Single CSV",
"type": "Copy",
"dependsOn": [],
"policy": {
"timeout": "7.00:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"source": {
"type": "AzureSqlSource",
"queryTimeout": "02:00:00"
},
"sink": {
"type": "DelimitedTextSink",
"storeSettings": {
"type": "AzureBlobStorageWriteSettings"
},
"formatSettings": {
"type": "DelimitedTextWriteSettings",
"quoteAllText": true,
"fileExtension": ".csv"
}
},
"enableStaging": false
},
"inputs": [
{
"referenceName": "azureSqlDatabaseTableDataset",
"type": "DatasetReference",
"parameters": {
"azureSqlDatabaseConnectionStringSecretName": {
"value": "@pipeline().parameters.sourceAzureSqlDatabaseConnectionStringSecretName",
"type": "Expression"
},
"azureSqlDatabaseTableSchemaName": {
"value": "@pipeline().parameters.sourceAzureSqlDatabaseTableSchemaName",
"type": "Expression"
},
"azureSqlDatabaseTableTableName": {
"value": "@pipeline().parameters.sourceAzureSqlDatabaseTableTableName",
"type": "Expression"
}
}
}
],
"outputs": [
{
"referenceName": "azureBlobSingleCSVFileNameDataset",
"type": "DatasetReference",
"parameters": {
"azureBlobConnectionStringSecretName": {
"value": "@pipeline().parameters.sinkAzureBlobConnectionStringSecretName",
"type": "Expression"
},
"azureBlobSingleCSVFileName": {
"value": "@pipeline().parameters.sinkAzureBlobSingleCSVFileName",
"type": "Expression"
},
"azureBlobSingleCSVFolderPath": {
"value": "@pipeline().parameters.sinkAzureBlobSingleCSVFolderPath",
"type": "Expression"
},
"azureBlobSingleCSVContainerName": {
"value": "@pipeline().parameters.sinkAzureBlobSingleCSVContainerName",
"type": "Expression"
}
}
}
]
}
],
"parameters": {
"sourceAzureSqlDatabaseConnectionStringSecretName": {
"type": "string"
},
"sourceAzureSqlDatabaseTableSchemaName": {
"type": "string"
},
"sourceAzureSqlDatabaseTableTableName": {
"type": "string"
},
"sinkAzureBlobConnectionStringSecretName": {
"type": "string"
},
"sinkAzureBlobSingleCSVContainerName": {
"type": "string"
},
"sinkAzureBlobSingleCSVFolderPath": {
"type": "string"
},
"sinkAzureBlobSingleCSVFileName": {
"type": "string"
}
},
"annotations": []
},
"type": "Microsoft.DataFactory/factories/pipelines"
}
数据集/ azureBlobSingleCSVNoFileNameDataset :(映射数据流中不需要数据集中的文件名,在映射数据流中设置)
{
"name": "azureBlobSingleCSVNoFileNameDataset",
"properties": {
"linkedServiceName": {
"referenceName": "azureBlobLinkedService",
"type": "LinkedServiceReference",
"parameters": {
"azureBlobConnectionStringSecretName": {
"value": "@dataset().azureBlobConnectionStringSecretName",
"type": "Expression"
}
}
},
"parameters": {
"azureBlobConnectionStringSecretName": {
"type": "string"
},
"azureBlobSingleCSVFolderPath": {
"type": "string"
},
"azureBlobSingleCSVContainerName": {
"type": "string"
}
},
"annotations": [],
"type": "DelimitedText",
"typeProperties": {
"location": {
"type": "AzureBlobStorageLocation",
"folderPath": {
"value": "@dataset().azureBlobSingleCSVFolderPath",
"type": "Expression"
},
"container": {
"value": "@dataset().azureBlobSingleCSVContainerName",
"type": "Expression"
}
},
"columnDelimiter": ",",
"escapeChar": "\",
"firstRowAsHeader": true,
"quoteChar": """
},
"schema": []
},
"type": "Microsoft.DataFactory/factories/datasets"
}
数据流/ azureSqlDatabaseTableToAzureBlobSingleCSVDataFlow
{
"name": "azureSqlDatabaseTableToAzureBlobSingleCSVDataFlow",
"properties": {
"type": "MappingDataFlow",
"typeProperties": {
"sources": [
{
"dataset": {
"referenceName": "azureSqlDatabaseTableDataset",
"type": "DatasetReference"
},
"name": "readFromAzureSqlDatabase"
}
],
"sinks": [
{
"dataset": {
"referenceName": "azureBlobSingleCSVNoFileNameDataset",
"type": "DatasetReference"
},
"name": "writeToAzureBlobSingleCSV"
}
],
"transformations": [
{
"name": "enrichWithRuntimeMetadata"
}
],
"script": "
parameters{
sourceConnectionSecretName as string,
sinkConnectionStringSecretName as string,
sourceObjectName as string,
sinkObjectName as string,
dataFactoryName as string,
dataFactoryPipelineName as string,
dataFactoryPipelineRunId as string,
sinkFileNameNoPath as string
}
source(allowSchemaDrift: true,
validateSchema: false,
isolationLevel: 'READ_UNCOMMITTED',
format: 'table') ~> readFromAzureSqlDatabase
readFromAzureSqlDatabase derive({__sourceConnectionStringSecretName} = $sourceConnectionSecretName,
{__sinkConnectionStringSecretName} = $sinkConnectionStringSecretName,
{__sourceObjectName} = $sourceObjectName,
{__sinkObjectName} = $sinkObjectName,
{__dataFactoryName} = $dataFactoryName,
{__dataFactoryPipelineName} = $dataFactoryPipelineName,
{__dataFactoryPipelineRunId} = $dataFactoryPipelineRunId) ~> enrichWithRuntimeMetadata
enrichWithRuntimeMetadata sink(allowSchemaDrift: true,
validateSchema: false,
partitionFileNames:[($sinkFileNameNoPath)],
partitionBy('hash', 1),
quoteAll: true) ~> writeToAzureBlobSingleCSV"
}
}
}
管道/ Azure SQL表到Blob单个CSV数据流管道(这将产生预期的结果,以及文件夹路径中的零字节文件。)
{
"name": "Azure SQL Table to Blob Single CSV Data Flow Pipeline",
"properties": {
"activities": [
{
"name": "Copy Sql Database Table To Blob Single CSV Data Flow",
"type": "ExecuteDataFlow",
"dependsOn": [],
"policy": {
"timeout": "7.00:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"dataflow": {
"referenceName": "azureSqlDatabaseTableToAzureBlobSingleCSVDataFlow",
"type": "DataFlowReference",
"parameters": {
"sourceConnectionSecretName": {
"value": "'@{pipeline().parameters.sourceAzureSqlDatabaseConnectionStringSecretName}'",
"type": "Expression"
},
"sinkConnectionStringSecretName": {
"value": "'@{pipeline().parameters.sinkAzureBlobConnectionStringSecretName}'",
"type": "Expression"
},
"sourceObjectName": {
"value": "'@{concat('[', pipeline().parameters.sourceAzureSqlDatabaseTableSchemaName, '].[', pipeline().parameters.sourceAzureSqlDatabaseTableTableName, ']')}'",
"type": "Expression"
},
"sinkObjectName": {
"value": "'@{concat(pipeline().parameters.sinkAzureBlobSingleCSVContainerName, '/', pipeline().parameters.sinkAzureBlobSingleCSVFolderPath, '/',
pipeline().parameters.sinkAzureBlobSingleCSVFileName)}'",
"type": "Expression"
},
"dataFactoryName": {
"value": "'@{pipeline().DataFactory}'",
"type": "Expression"
},
"dataFactoryPipelineName": {
"value": "'@{pipeline().Pipeline}'",
"type": "Expression"
},
"dataFactoryPipelineRunId": {
"value": "'@{pipeline().RunId}'",
"type": "Expression"
},
"sinkFileNameNoPath": {
"value": "'@{pipeline().parameters.sinkAzureBlobSingleCSVFileName}'",
"type": "Expression"
}
},
"datasetParameters": {
"readFromAzureSqlDatabase": {
"azureSqlDatabaseConnectionStringSecretName": {
"value": "@pipeline().parameters.sourceAzureSqlDatabaseConnectionStringSecretName",
"type": "Expression"
},
"azureSqlDatabaseTableSchemaName": {
"value": "@pipeline().parameters.sourceAzureSqlDatabaseTableSchemaName",
"type": "Expression"
},
"azureSqlDatabaseTableTableName": {
"value": "@pipeline().parameters.sourceAzureSqlDatabaseTableTableName",
"type": "Expression"
}
},
"writeToAzureBlobSingleCSV": {
"azureBlobConnectionStringSecretName": {
"value": "@pipeline().parameters.sinkAzureBlobConnectionStringSecretName",
"type": "Expression"
},
"azureBlobSingleCSVFolderPath": {
"value": "@pipeline().parameters.sinkAzureBlobSingleCSVFolderPath",
"type": "Expression"
},
"azureBlobSingleCSVContainerName": {
"value": "@pipeline().parameters.sinkAzureBlobSingleCSVContainerName",
"type": "Expression"
}
}
}
},
"compute": {
"coreCount": 8,
"computeType": "General"
}
}
}
],
"parameters": {
"sourceAzureSqlDatabaseConnectionStringSecretName": {
"type": "string"
},
"sourceAzureSqlDatabaseTableSchemaName": {
"type": "string"
},
"sourceAzureSqlDatabaseTableTableName": {
"type": "string"
},
"sinkAzureBlobConnectionStringSecretName": {
"type": "string"
},
"sinkAzureBlobSingleCSVContainerName": {
"type": "string"
},
"sinkAzureBlobSingleCSVFolderPath": {
"type": "string"
},
"sinkAzureBlobSingleCSVFileName": {
"type": "string"
}
},
"annotations": []
},
"type": "Microsoft.DataFactory/factories/pipelines"
}
获得0个长度(字节)文件的原因意味着,尽管您的管道可能已成功运行,但它没有返回或没有产生任何输出。
更好的技术之一是预览每个阶段的输出,以确保每个阶段都有预期的输出。
以上是关于Azure数据工厂映射数据流到CSV接收器的结果为零字节文件的主要内容,如果未能解决你的问题,请参考以下文章
Azure 数据工厂附加大量与 csv 文件具有不同架构的文件
到 Azure SQL 数据库的数据流输出仅包含 Azure 数据工厂中的 NULL 数据
将具有不同架构(列)的多个文件 (.csv) 合并/合并为单个文件 .csv - Azure 数据工厂
在映射数据流(Azure 数据工厂)内的表达式函数中创建动态 Json