Python系列之入门篇——HDFS
简介
HDFS (Hadoop Distributed File System) Hadoop分布式文件系统,具有高容错性,适合部署在廉价的机器上。Python
提供了两种接口方式,分别是hdfscli(Restful Api Call),pyhdfs(RPC Call),这一节主要讲hdfscli的使用
代码示例
安装
pip install hdfs
引入相关模块
from hdfs import *
创建客户端
""" It has two different kind of client, Client and InsecureClient. Client: cannot define file owner InsecureClient: can define file owner, default None """ hdfs_root_path = ‘http://localhost:50070‘ fs = Client(hdfs_root_path) fs = InsecureClient(hdfs_root_path, user=‘hdfs‘)
创建目录
""" Change file permission to 777, default None """ fs.makedirs(‘/test‘, permission=777)
写文件
""" Write append or not depends on the file is exist or not strict: If `False`, return `None` rather than raise an exception if the path doesn‘t exist. """ content = fs.content(hdfs_file_path, strict=False) if content is None: fs.write(‘/test/test.txt‘, data=data, permission=777) else: fs.write(‘/test/test.txt‘, data=data, append=True)
上传文件
""" overwrite default False, if don‘t set True, when you upload the file which is exist in hdfs, it will raise File is exist Exception. """ client.upload(hdfs_path, local_path, overwrite=True)
总结
还没有找到判断文件是否存在的方法,目前代码示例中用fs.content()来替换,如果大家有更好的方式,也麻烦分享给我