读取由 s3 事件触发的文件

Posted 2023-03-08

技术标签:

【中文标题】读取由 s3 事件触发的文件【英文标题】：reading files triggered by s3 event 【发布时间】：2018-04-06 06:55:43 【问题描述】：

这是我想做的：

用户将 csv 文件上传到 AWS S3 存储桶。上传文件后，S3 存储桶会调用我创建的 lambda 函数。我的 lambda 函数读取 csv 文件内容，然后发送包含文件内容和信息的电子邮件

当地环境

无服务器框架版本 1.22.0

Python 2.7

这是我的 serverless.yml 文件

service: aws-python # NOTE: update this with your service name

provider:
  name: aws
  runtime: python2.7
  stage: dev
  region: us-east-1
  iamRoleStatements:
        - Effect: "Allow"
          Action:
              - s3:*
              - "ses:SendEmail"
              - "ses:SendRawEmail"
              - "s3:PutBucketNotification"
          Resource: "*"

functions:
  csvfile:
    handler: handler.csvfile
    description: send mail whenever a csv file is uploaded on S3 
    events:
      - s3:
          bucket: mine2
          event: s3:ObjectCreated:*
          rules:
            - suffix: .csv

这是我的 lambda 函数：

import json
import boto3
import botocore
import logging
import sys
import traceback
import csv

from botocore.exceptions import ClientError
from pprint import pprint
from time import strftime, gmtime
from json import dumps, loads, JSONEncoder, JSONDecoder


#setup simple logging for INFO
logger = logging.getLogger()
logger.setLevel(logging.INFO)

from botocore.exceptions import ClientError

def csvfile(event, context):
    """Send email whenever a csvfile is uploaded to S3 """
    body = 
    emailcontent = ''
    status_code = 200
    #set email information
    email_from = '****@*****.com'
    email_to = '****@****.com'
    email_subject = 'new file is uploaded'
    try:
        s3 = boto3.resource(u's3')
        s3 = boto3.client('s3')
        for record in event['Records']:
            filename = record['s3']['object']['key']
            filesize = record['s3']['object']['size']
            source = record['requestParameters']['sourceIPAddress']
            eventTime = record['eventTime']
        # get a handle on the bucket that holds your file
        bucket = s3.Bucket(u'mine2')
        # get a handle on the object you want (i.e. your file)
        obj = bucket.Object(key= event[u'Records'][0][u's3'][u'object'][u'key'])
        # get the object
        response = obj.get()
        # read the contents of the file and split it into a list of lines
        lines = response[u'Body'].read().split()
        # now iterate over those lines
        for row in csv.DictReader(lines):    
            print(row)
            emailcontent = emailcontent + '\n' + row 
    except Exception as e:
        print(traceback.format_exc())
        status_code = 500
        body["message"] = json.dumps(e)

    email_body = "File Name: " + filename + "\n" + "File Size: " + str(filesize) + "\n" +  "Upload Time: " + eventTime + "\n" + "User Details: " + source + "\n" + "content of the csv file :" + emailcontent
    ses = boto3.client('ses')
    ses.send_email(Source = email_from,
        Destination = 'ToAddresses': [email_to,],, 
            Message = 'Subject': 'Data': email_subject, 'Body':'Text' : 'Data': email_body
            )
    print('Function execution Completed')

我不知道我做错了什么，导致当我刚刚获得有关文件的信息时部分工作正常，当我添加读取部分时 lambda 函数不返回任何内容

【问题讨论】：

【参考方案1】：

我建议在您的 IAM 策略中添加对 Cloudwatch 的访问权限。实际上你的 lambda 函数没有返回任何东西，但是你可以在 Cloudwatch 中看到你的日志输出。我真的建议在设置logger 时使用logger.info(message) 而不是print。

我希望这有助于调试您的功能。

除了发送部分，我会这样重写（刚刚在AWS控制台测试过）：

import logging
import boto3

logger = logging.getLogger()
logger.setLevel(logging.INFO)

s3 = boto3.client('s3')

def lambda_handler(event, context):
    email_content = ''

    # retrieve bucket name and file_key from the S3 event
    bucket_name = event['Records'][0]['s3']['bucket']['name']
    file_key = event['Records'][0]['s3']['object']['key']
    logger.info('Reading  from '.format(file_key, bucket_name))
    # get the object
    obj = s3.get_object(Bucket=bucket_name, Key=file_key)
    # get lines inside the csv
    lines = obj['Body'].read().split(b'\n')
    for r in lines:
       logger.info(r.decode())
       email_content = email_content + '\n' + r.decode()
    logger.info(email_content)

【讨论】：

嘿兄弟，这真的很有帮助，非常感谢，我确实得到了答案，事实上在我的代码中我确实写了两个 s3 =boto3.resource(u's3') s3 = boto3.client ('s3') 这是错误。只是一点点改进 - 最好在处理函数之外初始化数据库连接、sdks 等（在全局级别）。 Lambda 服务保留了函数的上下文，因此后续调用的执行速度更快 - 更多信息请参见：docs.aws.amazon.com/lambda/latest/dg/best-practices.html 好点，我将 s3 客户端的初始化移到了 lambda 处理程序之外

以上是关于读取由 s3 事件触发的文件的主要内容，如果未能解决你的问题，请参考以下文章