CDH6.3.0上配置各种对象存储
Posted 醉舞斜陽
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了CDH6.3.0上配置各种对象存储相关的知识,希望对你有一定的参考价值。
cm-hdfs:
ufile: 还需添加jar包
S3:是自带jar包
OSS: CDH6.3.0不需要下载包, CDH5需要
core-site.xml 的群集范围高级配置代码段(安全阀)
fs.oss.endpoint oss-eu-west-1.aliyuncs.com #oss的外网地址
fs.oss.accessKeyId
fs.oss.accessKeySecret
fs.oss.impl org.apache.hadoop.fs.aliyun.oss.AliyunOSSFileSystem
fs.oss.buffer.dir /tmp/oss
fs.oss.connection.secure.enabled false #是否enable https, 根据需要来设置,enable https会影响性能
fs.oss.connection.maximum 10000
#默认jar包位置:
/opt/cloudera/parcels/CDH-6.3.0-1.cdh6.3.0.p0.1279813/jars/aliyun-sdk-oss-2.8.3.jar
/opt/cloudera/parcels/CDH-6.3.0-1.cdh6.3.0.p0.1279813/jars/hadoop-aliyun-3.0.0-cdh6.3.0.jar
待测试:
cp jindofs-sdk-2.3.0.jar /opt/cloudera/parcels/CDH-6.3.0-1.cdh6.3.0.p0.1279813/jars/
参考链接:https://github.com/aliyun/alibabacloud-jindodata/blob/master/docs/jindofs_sdk_how_to_hadoop_cdh.md
把原有的aliyun-sdk-oss-2.8.3.jar做个备份删除,重启hdfs服务,看新报能不能用?
#在CDH中需在cm中配置(需测试)
hadoop-env.sh配置修改
打开文件: vim $HADOOP_HOME/etc/hadoop/hadoop-env.sh
在相应位置增加如下内容:
export HADOOP_OPTIONAL_TOOLS="hadoop-aliyun"
修改完成之后,重启Hadoop集群
注:CDH需要重启组件后要部署客户端。
#检查
hdfs dfs -ls oss://dbbigdata/
hive建表语句
create external table if not exists dim_sony_dev_list_oss (
`ymd` string comment \'日\',
`uuid` string comment \'当贝用户设备uuid\',
`chanel` string comment \'渠道\',
`brand` string comment \'品牌\',
`packagename` string comment \'包名\',
`unit_type` string comment \'型号\',
`model` string comment \'索尼机型\',
`vcode` string comment \'版本号\',
`vname` string comment \'版本名称\',
`sony_user_id` string comment \'索尼用户ID\',
`user_id` string comment \'当贝用户ID\',
`ip` string comment \'用户IP\',
`province` string comment \'省\',
`city` string comment \'市\',
`region` string comment \'区\',
`add_time` string comment \'设备新增时间(yyyy-MM-dd HH:mm:ss)\',
`mac` string comment \'mac\',
`cause1` string comment \'2.6号以前的设备\',
`cause2` string comment \'同IP设备数量超过3个\',
`cause3` string comment \'门店展示,上报了com.sony.dtv.multiscreendemo(演示应用)的\',
`cause4` string comment \'上报了非sonyos_sonyos渠道的\',
`cause5` string comment \'国外IP\',
`cause6` string comment \'版本异常:低于1.0.1版本的\',
`cause7` string comment \'一个Mac下的uuid数量超过3个\',
`cause8` string comment \'在2.6号以前登陆过的用户\',
`cause9` string comment \'不在型号列表中的设备\',
`ifblacklist` string comment \'是否在黑名单库,1:是,0:否\'
)
PARTITIONED BY (pt STRING)
row format delimited
fields terminated by \'\\001\'
lines terminated by \'\\n\'
STORED AS TEXTFILE
location \'oss://dbbigdata/hangwenping/dim_sony_dev_list_oss\';
插入语句
set mapreduce.map.memory.mb=3072;
set mapreduce.reduce.memory.mb=3072;
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.max.dynamic.partitions.pernode=10000;
set hive.exec.max.dynamic.partitions=10000;
set hive.exec.max.created.files=10000;
insert overwrite table dim_sony_dev_list_oss partition(pt) select * from dim_sony_dev_list;
插入的时候报错:
Error while processing statement: FAILED: Execution Error, return code 3 from org.apache.hadoop.hive.ql.exec.spark.SparkTask. Spark job failed due to: Class org.apache.hadoop.fs.aliyun.oss.AliyunOSSFileSystem not found
解决办法:
进入到hive core-site.xml配置:
做spark impala hive的软链接
impala的软链接
进入到impala的lib目录
cd /opt/cloudera/parcels/CDH-6.3.0-1.cdh6.3.0.p0.1279813/lib/impala/lib
执行以下命令,所有的hive,impala,spark节点都要执行,可以用ansible,注ln -s要用绝对路径,我这里没有写。
进入到impala节点的目录
cd /opt/cloudera/parcels/CDH-6.3.0-1.cdh6.3.0.p0.1279813/lib/impala/lib
ln -s ../../../jars/hadoop-aliyun-3.0.0-cdh6.3.0.jar hadoop-aliyun-3.0.0-cdh6.3.0.jar
ln -s ../../../jars/aliyun-sdk-oss-2.8.3.jar aliyun-sdk-oss-2.8.3.jar
ln -s ../../../jars/jdom-1.1.jar jdom-1.1.jar
进入到spark的jars目录
cd /opt/cloudera/parcels/CDH-6.3.0-1.cdh6.3.0.p0.1279813/lib/spark/jars
ln -s ../../../jars/hadoop-aliyun-3.0.0-cdh6.3.0.jar hadoop-aliyun-3.0.0-cdh6.3.0.jar
ln -s ../../../jars/aliyun-sdk-oss-2.8.3.jar aliyun-sdk-oss-2.8.3.jar
ln -s ../../../jars/jdom-1.1.jar jdom-1.1.jar
进入到hive的lib目录执行
cd /opt/cloudera/parcels/CDH-6.3.0-1.cdh6.3.0.p0.1279813/lib/hive/lib/
ln -s ../../../jars/hadoop-aliyun-3.0.0-cdh6.3.0.jar hadoop-aliyun-3.0.0-cdh6.3.0.jar
ln -s ../../../jars/aliyun-sdk-oss-2.8.3.jar aliyun-sdk-oss-2.8.3.jar
ln -s ../../../jars/jdom-1.1.jar jdom-1.1.jar
如下图:
查询语句
select * from dim_sony_dev_list_oss;
hive查询oss表中的数据
impala查询表中的数据
oss中的数据
以上是关于CDH6.3.0上配置各种对象存储的主要内容,如果未能解决你的问题,请参考以下文章
Streamsets 安装额外Stage包——CDH6.3.0包报错REST API call error: java.io.EOFException
Streamsets 安装额外Stage包——CDH6.3.0包报错REST API call error: java.io.EOFException