Spring Batch - 从 S3 读取多个文件
Posted
技术标签:
【中文标题】Spring Batch - 从 S3 读取多个文件【英文标题】:Spring Batch - Read multiple files from S3 【发布时间】:2021-01-04 07:17:33 【问题描述】:就像从 s3 读取 spring 批处理中的单个文件一样,我们使用
@Bean
public FlatFileItemReader<Map<String, Object>> itemReader()
FlatFileItemReader<Map<String, Object>> reader = new FlatFileItemReader<>();
reader.setLineMapper(new JsonLineMapper());
reader.setRecordSeparatorPolicy(new JsonRecordSeparatorPolicy());
reader.setResource(resourceLoader.getResource("s3://" + amazonS3Bucket + "/" + file));
return reader;
但是,如果我想从某个特定文件夹/键中读取所有文件,那么 MultiResourceItemReader 是否有一些东西,如下所示(我们用于本地文件系统)
MultiResourceItemReader<UserData> reader = new MultiResourceItemReader<>();
reader.setResources(resources);
【问题讨论】:
【参考方案1】:像这样创建一个 MultiResourceItemReader,
@Autowired
private AmazonS3 s3;
@Autowired
private ResourceLoader resourceLoader;
public MultiResourceItemReader<String> fileItemReader() throws Exception
List<Resource> resourceList = new ArrayList<>();
String s3ResponseFilePath = "s3://bucket/path/"; //put your s3 path here
//TODO: warn: this functn can only return max 1000 objects
s3objects = s3.listObjects("bucket", s3ResponseFilePath).getObjectSummaries();
for(S3ObjectSummary it:s3objects)
resourceList.add(resourceLoader.getResource( "s3://" + s3Config.getBucket() + "/" + it.getKey()));
Resource[] resources = resourceList.toArray(new Resource[resourceList.size()]);
MultiResourceItemReader<String> reader = new MultiResourceItemReader<>();
reader.setResources(resources);
reader.setDelegate(flatFileItemReader());
return reader;
这个阅读器需要一个delegate和lineMapper,你可以这样实现,
private FlatFileItemReader<String> flatFileItemReader() throws Exception
FlatFileItemReader<String> reader = new FlatFileItemReader<>();
JsonLineMapper lineMapper = new JsonLineMapper();
reader.setLineMapper(lineMapper);
reader.afterPropertiesSet();
return reader;
public class JsonLineMapper implements LineMapper<String>
private ObjectMapper mapper = new ObjectMapper();
@Override
public String mapLine(String s, int i) throws Exception
return s;
【讨论】:
似乎在my question中使用相同的策略不起作用【参考方案2】:不,由您来创建Resource
数组并将其传递给MultiResourceItemReader
。
【讨论】:
我在问...我可以使用 MultiResourceItemReader 读取多个 s3 文件,如果可以,那么如何? 是的,你可以。您需要创建一个 s3 资源数组并将它们传递给MultiResourceItemReader
。
@GauravRaghav - 你能实现这个吗?你能显示一些代码吗?
@Pra_A - 是的,刚刚发布了我的答案
@GauravRaghav 如果我的回答有帮助,请接受:***.com/help/someone-answers。谢谢。以上是关于Spring Batch - 从 S3 读取多个文件的主要内容,如果未能解决你的问题,请参考以下文章
Spring Batch中如何读取多个CSV文件合并数据进行处理?
Spring Boot Spring Batch:没有 Spring 批处理元数据表的多个 DataSource