如果通过验证，AWS lambda 读取 zip 文件执行验证并解压缩到 s3 存储桶

Posted 2023-03-22

技术标签:

【中文标题】如果通过验证，AWS lambda 读取 zip 文件执行验证并解压缩到 s3 存储桶【英文标题】：AWS lambda read zip file perform validation and unzip to s3 bucket if validation is passed 【发布时间】：2020-10-06 14:10:28 【问题描述】：

我有一个 zip 文件到达 s3 存储桶的要求，我需要使用 python 编写一个 lambda 来读取 zip 文件执行一些验证并在另一个 S3 存储桶上解压缩。

Zip 文件包含以下内容：

a.csv b.csv c.csv trigger_file.txt

trigger_file.txt -- 包含 zip 文件的名称和记录数（例如：a.csv:120、b.csv:10、c.csv:50）

所以使用 lambda 我需要读取触发器文件检查 zip 文件夹中的文件数量是否等于触发器文件中提到的文件数量，如果将解压缩传递到 s3 存储桶。

下面的代码我已经准备好了：

def write_to_s3(config_dict):
    inp_bucket = config_dict["inp_bucket"]
    inp_key = config_dict["inp_key"]
    out_bucket = config_dict["out_bucket"]
    des_key = config_dict["des_key"]
    processed_key = config_dict["processed_key"]

    obj = S3_CLIENT.get_object(Bucket=inp_bucket, Key=inp_key)
    putObjects = []
    with io.BytesIO(obj["Body"].read()) as tf:
        # rewind the file
        tf.seek(0)

    # Read the file as a zipfile perform transformations and process the members
    with zipfile.ZipFile(tf, mode='r') as zipf:
        for file in zipf.infolist():
            fileName = file.filename
            print("file name before while loop :",fileName)
            try:
                found = False
                while not found :
                    if fileName == "Trigger_file.txt" :
                        with zipf.open(fileName , 'r') as thefile:
                            my_list = [i.decode('utf8').split(' ') for i in thefile]
                            my_list = str(my_list)[1:-1]
                            print("my_list :",my_list)
                            print("fileName :",fileName)
                            found = True
                            break
                            thefile.close()
                    else:
                        print("Trigger file not found ,try again")
            except Exception as exp_handler:
                    raise exp_handler

            if 'csv' in fileName :
                try:
                    if fileName in my_list:
                        print("Validation Success , all files in Trigger file  are present procced for extraction")
                    else:
                        print("Validation Failed")
                except Exception as exp_handler:
                    raise exp_handler

    # *****FUNCTION TO UNZIP ********


def lambda_handler(event, context):
    try:
        inp_bucket = event['Records'][0]['s3']['bucket']['name']
        inp_key = urllib.parse.unquote_plus(event['Records'][0]['s3']['object']['key'], encoding='utf-8')
        config_dict = build_conf_obj(os.environ['config_bucket'],os.environ['config_file'], os.environ['param_name'])
        write_to_s3(config_dict)
    except Exception as exp_handler:
        print("ERROR")

一切顺利，我面临的唯一问题是在验证部分，我认为 while 循环是错误的，因为它正在进入无限循环。

期望：

如果找到，则在 zip 文件夹中搜索 trigger_file.txt，然后打破循环进行验证并将其解压缩到 s3 文件夹。如果没有找到，请继续搜索直到字典结束。

错误输出（超时）：

Response:

  "errorMessage": "2020-06-16T20:09:06.168Z 39253b98-db87-4e65-b288-b585d268ac5f Task timed out after 60.06 seconds"


Request ID:
"39253b98-db87-4e65-b288-b585d268ac5f"

Function Logs:
 again
Trigger file not found ,try again
Trigger file not found ,try again
Trigger file not found ,try again
Trigger file not found ,try again
Trigger file not found ,trEND RequestId: 39253b98-db87-4e65-b288-b585d268ac5f
REPORT RequestId: 39253b98-db87-4e65-b288-b585d268ac5f  Duration: 60060.06 ms   Billed Duration: 60000 ms   Memory Size: 3008 MB    Max Memory Used: 83 MB  Init Duration: 389.65 ms    
2020-06-16T20:09:06.168Z 39253

【问题讨论】：

打印的是什么（如果有的话）？当您将 Lambda 函数的运行时间从 60 秒增加到 15m 时会发生什么？您还可以考虑在尝试读取文本文件之前使用内置的 Python 临时文件库在本地保存文本文件吗？您也不应该在这里使用 while 循环。不必要的无限循环只会使调试更加困难，并且更容易崩溃。请更正代码的缩进。所以没有解决方案可以即时读取 zip 文件并进行验证??? 当我将 Lambda 运行时间从 60 秒增加到 15 分钟时，它会进入无限循环打印“未找到触发文件，重试”多次并在 15 分钟后超时 【参考方案1】：

在代码中的以下while循环中，如果fileName不是"Trigger_file.txt"，则会陷入无限循环。

found = False
while not found:
    if fileName == "Trigger_file.txt":
        with zipf.open(fileName , 'r') as thefile:
            my_list = [i.decode('utf8').split(' ') for i in thefile]
            my_list = str(my_list)[1:-1]
            print("my_list :",my_list)
            print("fileName :",fileName)
            found = True
            break
            thefile.close()
    else:
        print("Trigger file not found ,try again")

我认为你可以用以下代码替换部分write_to_s3 功能代码：

def write_to_s3(config_dict):

    ######################
    #### Do something ####
    ######################    

    # Read the file as a zipfile perform transformations and process the members
    with zipfile.ZipFile(tf, mode='r') as zipf:
        found = False
        for file in zipf.infolist():
            fileName = file.filename
            if fileName == "Trigger_file.txt":
                with zipf.open(fileName, 'r') as thefile:
                    my_list = [i.decode('utf8').split(' ') for i in thefile]
                    my_list = str(my_list)[1:-1]
                    print("my_list :", my_list)
                    print("fileName :", fileName)
                    found = True
                    thefile.close()
                    break

        if found is False:
            print("Trigger file not found ,try again")
            return

        for file in zipf.infolist():
            fileName = file.filename
            if 'csv' in fileName:
                if fileName not in my_list:
                    print("Validation Failed")
                    return

        print("Validation Success , all files in Trigger file  are present procced for extraction")

    # *****FUNCTION TO UNZIP ********

【讨论】：

非常感谢，我也用完全相同的逻辑刮掉了旧的。你好 Gorisanson ，你能帮我***.com/questions/66964780/…

以上是关于如果通过验证，AWS lambda 读取 zip 文件执行验证并解压缩到 s3 存储桶的主要内容，如果未能解决你的问题，请参考以下文章