Hive往分桶表表中导入数据

Posted 2023-01-23 动若脱兔--

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了Hive往分桶表表中导入数据相关的知识，希望对你有一定的参考价值。

1 创建分桶表

第一步要开启分桶否则导入数据会报错

set hive.enforce.bucketing = true

create table stu_buck(id int, name string)
clustered by(id) 
into 4 buckets
row format delimited fields terminated by '\\t';

create table tp(id int, name string)
row format delimited fields terminated by '\\t';

load data local inpath "/temp/student1/student.txt" into table tp;

insert overwrite table stu_buck  select * from tp;

可以看到分了4个桶，所以显示的是4个文件，我目前是用的Hadoop是2.x，hive也是2.x，如果使用3.x的话应该就不需要使用中间表就可以向hive中导入数据了。

（1）reduce 的个数设置为-1,让 Job 自行决定需要用多少个 reduce 或者将 reduce 的个
数设置为大于等于分桶表的桶数
（2）从 hdfs 中 load 数据到分桶表中，避免本地文件找不到问题
（3）不要使用本地模式

以上是关于Hive往分桶表表中导入数据的主要内容，如果未能解决你的问题，请参考以下文章