Azure数据工厂映射数据流到CSV接收器的结果为零字节文件

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Azure数据工厂映射数据流到CSV接收器的结果为零字节文件相关的知识,希望对你有一定的参考价值。

[我正在加强Azure数据工厂的功能,比较复制活动性能与将数据流映射到Azure Blob存储中的单个CSV文件的情况。

[当我通过数据集(azureBlobSingleCSVFileNameDataset)通过Azure Blob存储链接服务(azureBlobLinkedService)写入单个CSV时,请使用复制活动在期望的Blob存储容器中获取输出。例如,在容器MyContainer的/ output / csv / singleFiles文件夹下的MyData.csv输出文件。

[当我通过相同的Blob存储链接服务但通过不同的数据集(azureBlobSingleCSVNoFileNameDataset)通过映射数据流写入单个CSV时,我得到以下信息:

  • MyContainer / output / csv / singleFiles(零长度文件)
  • MyContainer / output / csv / singleFiles / MyData.csv(包含我期望的数据)

我不明白为什么使用映射数据流时会生成零长度的文件。

这是我的源文件:

linkedService / azureBlobLinkedService

{
    "name": "azureBlobLinkedService",
    "type": "Microsoft.DataFactory/factories/linkedservices",
    "properties": {
        "type": "AzureBlobStorage",
        "parameters": {
            "azureBlobConnectionStringSecretName": {
                "type": "string"
            }
        },
        "annotations": [],
        "typeProperties": {
            "connectionString": {
                "type": "AzureKeyVaultSecret",
                "store": {
                    "referenceName": "AzureKeyVaultLinkedService",
                    "type": "LinkedServiceReference"
                },
                "secretName": "@{linkedService().azureBlobConnectionStringSecretName}"
            }
        }
    }
}

数据集/ azureBlobSingleCSVFileName数据集

{
    "name": "azureBlobSingleCSVFileNameDataset",
    "properties": {
        "linkedServiceName": {
            "referenceName": "azureBlobLinkedService",
            "type": "LinkedServiceReference",
            "parameters": {
                "azureBlobConnectionStringSecretName": {
                    "value": "@dataset().azureBlobConnectionStringSecretName",
                    "type": "Expression"
                }
            }
        },
        "parameters": {
            "azureBlobConnectionStringSecretName": {
                "type": "string"
            },
            "azureBlobSingleCSVFileName": {
                "type": "string"
            },
            "azureBlobSingleCSVFolderPath": {
                "type": "string"
            },
            "azureBlobSingleCSVContainerName": {
                "type": "string"
            }
        },
        "annotations": [],
        "type": "DelimitedText",
        "typeProperties": {
            "location": {
                "type": "AzureBlobStorageLocation",
                "fileName": {
                    "value": "@dataset().azureBlobSingleCSVFileName",
                    "type": "Expression"
                },
                "folderPath": {
                    "value": "@dataset().azureBlobSingleCSVFolderPath",
                    "type": "Expression"
                },
                "container": {
                    "value": "@dataset().azureBlobSingleCSVContainerName",
                    "type": "Expression"
                }
            },
            "columnDelimiter": ",",
            "escapeChar": "\",
            "firstRowAsHeader": true,
            "quoteChar": """
        },
        "schema": []
    },
    "type": "Microsoft.DataFactory/factories/datasets"
}

管道/ Azure SQL表到Blob单个CSV复制管道(这将产生预期的结果)

{
    "name": "Azure SQL Table to Blob Single CSV Copy Pipeline",
    "properties": {
        "activities": [
            {
                "name": "Copy Azure SQL Table to Blob Single CSV",
                "type": "Copy",
                "dependsOn": [],
                "policy": {
                    "timeout": "7.00:00:00",
                    "retry": 0,
                    "retryIntervalInSeconds": 30,
                    "secureOutput": false,
                    "secureInput": false
                },
                "userProperties": [],
                "typeProperties": {
                    "source": {
                        "type": "AzureSqlSource",
                        "queryTimeout": "02:00:00"
                    },
                    "sink": {
                        "type": "DelimitedTextSink",
                        "storeSettings": {
                            "type": "AzureBlobStorageWriteSettings"
                        },
                        "formatSettings": {
                            "type": "DelimitedTextWriteSettings",
                            "quoteAllText": true,
                            "fileExtension": ".csv"
                        }
                    },
                    "enableStaging": false
                },
                "inputs": [
                    {
                        "referenceName": "azureSqlDatabaseTableDataset",
                        "type": "DatasetReference",
                        "parameters": {
                            "azureSqlDatabaseConnectionStringSecretName": {
                                "value": "@pipeline().parameters.sourceAzureSqlDatabaseConnectionStringSecretName",
                                "type": "Expression"
                            },
                            "azureSqlDatabaseTableSchemaName": {
                                "value": "@pipeline().parameters.sourceAzureSqlDatabaseTableSchemaName",
                                "type": "Expression"
                            },
                            "azureSqlDatabaseTableTableName": {
                                "value": "@pipeline().parameters.sourceAzureSqlDatabaseTableTableName",
                                "type": "Expression"
                            }
                        }
                    }
                ],
                "outputs": [
                    {
                        "referenceName": "azureBlobSingleCSVFileNameDataset",
                        "type": "DatasetReference",
                        "parameters": {
                            "azureBlobConnectionStringSecretName": {
                                "value": "@pipeline().parameters.sinkAzureBlobConnectionStringSecretName",
                                "type": "Expression"
                            },
                            "azureBlobSingleCSVFileName": {
                                "value": "@pipeline().parameters.sinkAzureBlobSingleCSVFileName",
                                "type": "Expression"
                            },
                            "azureBlobSingleCSVFolderPath": {
                                "value": "@pipeline().parameters.sinkAzureBlobSingleCSVFolderPath",
                                "type": "Expression"
                            },
                            "azureBlobSingleCSVContainerName": {
                                "value": "@pipeline().parameters.sinkAzureBlobSingleCSVContainerName",
                                "type": "Expression"
                            }
                        }
                    }
                ]
            }
        ],
        "parameters": {
            "sourceAzureSqlDatabaseConnectionStringSecretName": {
                "type": "string"
            },
            "sourceAzureSqlDatabaseTableSchemaName": {
                "type": "string"
            },
            "sourceAzureSqlDatabaseTableTableName": {
                "type": "string"
            },
            "sinkAzureBlobConnectionStringSecretName": {
                "type": "string"
            },
            "sinkAzureBlobSingleCSVContainerName": {
                "type": "string"
            },
            "sinkAzureBlobSingleCSVFolderPath": {
                "type": "string"
            },
            "sinkAzureBlobSingleCSVFileName": {
                "type": "string"
            }
        },
        "annotations": []
    },
    "type": "Microsoft.DataFactory/factories/pipelines"
}

数据集/ azureBlobSingleCSVNoFileNameDataset :(映射数据流中不需要数据集中的文件名,在映射数据流中设置)

{
    "name": "azureBlobSingleCSVNoFileNameDataset",
    "properties": {
        "linkedServiceName": {
            "referenceName": "azureBlobLinkedService",
            "type": "LinkedServiceReference",
            "parameters": {
                "azureBlobConnectionStringSecretName": {
                    "value": "@dataset().azureBlobConnectionStringSecretName",
                    "type": "Expression"
                }
            }
        },
        "parameters": {
            "azureBlobConnectionStringSecretName": {
                "type": "string"
            },
            "azureBlobSingleCSVFolderPath": {
                "type": "string"
            },
            "azureBlobSingleCSVContainerName": {
                "type": "string"
            }
        },
        "annotations": [],
        "type": "DelimitedText",
        "typeProperties": {
            "location": {
                "type": "AzureBlobStorageLocation",
                "folderPath": {
                    "value": "@dataset().azureBlobSingleCSVFolderPath",
                    "type": "Expression"
                },
                "container": {
                    "value": "@dataset().azureBlobSingleCSVContainerName",
                    "type": "Expression"
                }
            },
            "columnDelimiter": ",",
            "escapeChar": "\",
            "firstRowAsHeader": true,
            "quoteChar": """
        },
        "schema": []
    },
    "type": "Microsoft.DataFactory/factories/datasets"
}

数据流/ azureSqlDatabaseTableToAzureBlobSingleCSVDataFlow

{
    "name": "azureSqlDatabaseTableToAzureBlobSingleCSVDataFlow",
    "properties": {
        "type": "MappingDataFlow",
        "typeProperties": {
            "sources": [
                {
                    "dataset": {
                        "referenceName": "azureSqlDatabaseTableDataset",
                        "type": "DatasetReference"
                    },
                    "name": "readFromAzureSqlDatabase"
                }
            ],
            "sinks": [
                {
                    "dataset": {
                        "referenceName": "azureBlobSingleCSVNoFileNameDataset",
                        "type": "DatasetReference"
                    },
                    "name": "writeToAzureBlobSingleCSV"
                }
            ],
            "transformations": [
                {
                    "name": "enrichWithRuntimeMetadata"
                }
            ],
            "script": "
parameters{
	sourceConnectionSecretName as string,
	sinkConnectionStringSecretName as string,
	sourceObjectName as string,
	sinkObjectName as string,
	dataFactoryName as string,
	dataFactoryPipelineName as string,
	dataFactoryPipelineRunId as string,
	sinkFileNameNoPath as string
}
source(allowSchemaDrift: true,
	validateSchema: false,
	isolationLevel: 'READ_UNCOMMITTED',
	format: 'table') ~> readFromAzureSqlDatabase
readFromAzureSqlDatabase derive({__sourceConnectionStringSecretName} = $sourceConnectionSecretName,
		{__sinkConnectionStringSecretName} = $sinkConnectionStringSecretName,
		{__sourceObjectName} = $sourceObjectName,
		{__sinkObjectName} = $sinkObjectName,
		{__dataFactoryName} = $dataFactoryName,
		{__dataFactoryPipelineName} = $dataFactoryPipelineName,
		{__dataFactoryPipelineRunId} = $dataFactoryPipelineRunId) ~> enrichWithRuntimeMetadata
enrichWithRuntimeMetadata sink(allowSchemaDrift: true,
	validateSchema: false,
	partitionFileNames:[($sinkFileNameNoPath)],
	partitionBy('hash', 1),
	quoteAll: true) ~> writeToAzureBlobSingleCSV"
        }
    }
}

管道/ Azure SQL表到Blob单个CSV数据流管道(这将产生预期的结果,以及文件夹路径中的零字节文件。)

{
    "name": "Azure SQL Table to Blob Single CSV Data Flow Pipeline",
    "properties": {
        "activities": [
            {
                "name": "Copy Sql Database Table To Blob Single CSV Data Flow",
                "type": "ExecuteDataFlow",
                "dependsOn": [],
                "policy": {
                    "timeout": "7.00:00:00",
                    "retry": 0,
                    "retryIntervalInSeconds": 30,
                    "secureOutput": false,
                    "secureInput": false
                },
                "userProperties": [],
                "typeProperties": {
                    "dataflow": {
                        "referenceName": "azureSqlDatabaseTableToAzureBlobSingleCSVDataFlow",
                        "type": "DataFlowReference",
                        "parameters": {
                            "sourceConnectionSecretName": {
                                "value": "'@{pipeline().parameters.sourceAzureSqlDatabaseConnectionStringSecretName}'",
                                "type": "Expression"
                            },
                            "sinkConnectionStringSecretName": {
                                "value": "'@{pipeline().parameters.sinkAzureBlobConnectionStringSecretName}'",
                                "type": "Expression"
                            },
                            "sourceObjectName": {
                                "value": "'@{concat('[', pipeline().parameters.sourceAzureSqlDatabaseTableSchemaName, '].[', pipeline().parameters.sourceAzureSqlDatabaseTableTableName, ']')}'",
                                "type": "Expression"
                            },
                            "sinkObjectName": {
                                "value": "'@{concat(pipeline().parameters.sinkAzureBlobSingleCSVContainerName, '/', pipeline().parameters.sinkAzureBlobSingleCSVFolderPath, '/', 
pipeline().parameters.sinkAzureBlobSingleCSVFileName)}'",
                                "type": "Expression"
                            },
                            "dataFactoryName": {
                                "value": "'@{pipeline().DataFactory}'",
                                "type": "Expression"
                            },
                            "dataFactoryPipelineName": {
                                "value": "'@{pipeline().Pipeline}'",
                                "type": "Expression"
                            },
                            "dataFactoryPipelineRunId": {
                                "value": "'@{pipeline().RunId}'",
                                "type": "Expression"
                            },
                            "sinkFileNameNoPath": {
                                "value": "'@{pipeline().parameters.sinkAzureBlobSingleCSVFileName}'",
                                "type": "Expression"
                            }
                        },
                        "datasetParameters": {
                            "readFromAzureSqlDatabase": {
                                "azureSqlDatabaseConnectionStringSecretName": {
                                    "value": "@pipeline().parameters.sourceAzureSqlDatabaseConnectionStringSecretName",
                                    "type": "Expression"
                                },
                                "azureSqlDatabaseTableSchemaName": {
                                    "value": "@pipeline().parameters.sourceAzureSqlDatabaseTableSchemaName",
                                    "type": "Expression"
                                },
                                "azureSqlDatabaseTableTableName": {
                                    "value": "@pipeline().parameters.sourceAzureSqlDatabaseTableTableName",
                                    "type": "Expression"
                                }
                            },
                            "writeToAzureBlobSingleCSV": {
                                "azureBlobConnectionStringSecretName": {
                                    "value": "@pipeline().parameters.sinkAzureBlobConnectionStringSecretName",
                                    "type": "Expression"
                                },
                                "azureBlobSingleCSVFolderPath": {
                                    "value": "@pipeline().parameters.sinkAzureBlobSingleCSVFolderPath",
                                    "type": "Expression"
                                },
                                "azureBlobSingleCSVContainerName": {
                                    "value": "@pipeline().parameters.sinkAzureBlobSingleCSVContainerName",
                                    "type": "Expression"
                                }
                            }
                        }
                    },
                    "compute": {
                        "coreCount": 8,
                        "computeType": "General"
                    }
                }
            }
        ],
        "parameters": {
            "sourceAzureSqlDatabaseConnectionStringSecretName": {
                "type": "string"
            },
            "sourceAzureSqlDatabaseTableSchemaName": {
                "type": "string"
            },
            "sourceAzureSqlDatabaseTableTableName": {
                "type": "string"
            },
            "sinkAzureBlobConnectionStringSecretName": {
                "type": "string"
            },
            "sinkAzureBlobSingleCSVContainerName": {
                "type": "string"
            },
            "sinkAzureBlobSingleCSVFolderPath": {
                "type": "string"
            },
            "sinkAzureBlobSingleCSVFileName": {
                "type": "string"
            }
        },
        "annotations": []
    },
    "type": "Microsoft.DataFactory/factories/pipelines"
}
答案

获得0个长度(字节)文件的原因意味着,尽管您的管道可能已成功运行,但它没有返回或没有产生任何输出。

更好的技术之一是预览每个阶段的输出,以确保每个阶段都有预期的输出。

以上是关于Azure数据工厂映射数据流到CSV接收器的结果为零字节文件的主要内容,如果未能解决你的问题,请参考以下文章

Azure 数据工厂附加大量与 csv 文件具有不同架构的文件

到 Azure SQL 数据库的数据流输出仅包含 Azure 数据工厂中的 NULL 数据

将具有不同架构(列)的多个文件 (.csv) 合并/合并为单个文件 .csv - Azure 数据工厂

在映射数据流(Azure 数据工厂)内的表达式函数中创建动态 Json

是否可以使用 Azure Synapse 和 Azure 数据工厂将 CSV 转换为 XML?

如何在Azure SQL数据仓库中输入度数符号(°)