读取由 s3 事件触发的文件
Posted
技术标签:
【中文标题】读取由 s3 事件触发的文件【英文标题】:reading files triggered by s3 event 【发布时间】:2018-04-06 06:55:43 【问题描述】:这是我想做的:
-
用户将 csv 文件上传到 AWS S3 存储桶。
上传文件后,S3 存储桶会调用我创建的 lambda 函数。
我的 lambda 函数读取 csv 文件内容,然后发送包含文件内容和信息的电子邮件
当地环境
无服务器框架版本 1.22.0
Python 2.7
这是我的 serverless.yml 文件
service: aws-python # NOTE: update this with your service name
provider:
name: aws
runtime: python2.7
stage: dev
region: us-east-1
iamRoleStatements:
- Effect: "Allow"
Action:
- s3:*
- "ses:SendEmail"
- "ses:SendRawEmail"
- "s3:PutBucketNotification"
Resource: "*"
functions:
csvfile:
handler: handler.csvfile
description: send mail whenever a csv file is uploaded on S3
events:
- s3:
bucket: mine2
event: s3:ObjectCreated:*
rules:
- suffix: .csv
这是我的 lambda 函数:
import json
import boto3
import botocore
import logging
import sys
import traceback
import csv
from botocore.exceptions import ClientError
from pprint import pprint
from time import strftime, gmtime
from json import dumps, loads, JSONEncoder, JSONDecoder
#setup simple logging for INFO
logger = logging.getLogger()
logger.setLevel(logging.INFO)
from botocore.exceptions import ClientError
def csvfile(event, context):
"""Send email whenever a csvfile is uploaded to S3 """
body =
emailcontent = ''
status_code = 200
#set email information
email_from = '****@*****.com'
email_to = '****@****.com'
email_subject = 'new file is uploaded'
try:
s3 = boto3.resource(u's3')
s3 = boto3.client('s3')
for record in event['Records']:
filename = record['s3']['object']['key']
filesize = record['s3']['object']['size']
source = record['requestParameters']['sourceIPAddress']
eventTime = record['eventTime']
# get a handle on the bucket that holds your file
bucket = s3.Bucket(u'mine2')
# get a handle on the object you want (i.e. your file)
obj = bucket.Object(key= event[u'Records'][0][u's3'][u'object'][u'key'])
# get the object
response = obj.get()
# read the contents of the file and split it into a list of lines
lines = response[u'Body'].read().split()
# now iterate over those lines
for row in csv.DictReader(lines):
print(row)
emailcontent = emailcontent + '\n' + row
except Exception as e:
print(traceback.format_exc())
status_code = 500
body["message"] = json.dumps(e)
email_body = "File Name: " + filename + "\n" + "File Size: " + str(filesize) + "\n" + "Upload Time: " + eventTime + "\n" + "User Details: " + source + "\n" + "content of the csv file :" + emailcontent
ses = boto3.client('ses')
ses.send_email(Source = email_from,
Destination = 'ToAddresses': [email_to,],,
Message = 'Subject': 'Data': email_subject, 'Body':'Text' : 'Data': email_body
)
print('Function execution Completed')
我不知道我做错了什么,导致当我刚刚获得有关文件的信息时部分工作正常,当我添加读取部分时 lambda 函数不返回任何内容
【问题讨论】:
【参考方案1】:我建议在您的 IAM 策略中添加对 Cloudwatch 的访问权限。
实际上你的 lambda 函数没有返回任何东西,但是你可以在 Cloudwatch 中看到你的日志输出。我真的建议在设置logger
时使用logger.info(message)
而不是print
。
我希望这有助于调试您的功能。
除了发送部分,我会这样重写(刚刚在AWS控制台测试过):
import logging
import boto3
logger = logging.getLogger()
logger.setLevel(logging.INFO)
s3 = boto3.client('s3')
def lambda_handler(event, context):
email_content = ''
# retrieve bucket name and file_key from the S3 event
bucket_name = event['Records'][0]['s3']['bucket']['name']
file_key = event['Records'][0]['s3']['object']['key']
logger.info('Reading from '.format(file_key, bucket_name))
# get the object
obj = s3.get_object(Bucket=bucket_name, Key=file_key)
# get lines inside the csv
lines = obj['Body'].read().split(b'\n')
for r in lines:
logger.info(r.decode())
email_content = email_content + '\n' + r.decode()
logger.info(email_content)
【讨论】:
嘿兄弟,这真的很有帮助,非常感谢,我确实得到了答案,事实上在我的代码中我确实写了两个 s3 =boto3.resource(u's3') s3 = boto3.client ('s3') 这是错误。 只是一点点改进 - 最好在处理函数之外初始化数据库连接、sdks 等(在全局级别)。 Lambda 服务保留了函数的上下文,因此后续调用的执行速度更快 - 更多信息请参见:docs.aws.amazon.com/lambda/latest/dg/best-practices.html 好点,我将 s3 客户端的初始化移到了 lambda 处理程序之外以上是关于读取由 s3 事件触发的文件的主要内容,如果未能解决你的问题,请参考以下文章