hive

Posted 2020-10-12 satyrs

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了hive相关的知识，希望对你有一定的参考价值。

CREATE TABLE t1(name string,id int) ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘ ‘;
LOAD DATA LOCAL INPATH ‘/Users/***/Desktop/test.txt‘ INTO TABLE t1;

然后在hdfs上查看， port 50070
dfs -ls /user/wyq/hive;
---------------------------------------------------
eclipse java(jar cvf demoudf.jar ///.java)

import java.util.Data;
import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.io.Text;
import java.next.DataFormat;
   public class UnixTodate extends UDF{
   public Text evaluate(Text text){
     if (text ==null) return null;
     long timestamp = Long.parseLong(text.toString());
     return new Text(toDate(timestamp)); 
}
   private String toDate(long timestamp){
     Date date = new Date(timestamp*1000);
     return DateFormat.getInstance().format(date).toString();
}

}

ADD jar /Users/wyq/Desktop/demoudf.jar;
create temporary function userdate as ‘demoudf.UnixTodate‘;
create table test(id string, unixtime string) row format delimited fields terminated by ‘,‘;
load data local inpath ‘/Users/wyq/Desktop/udf_test.txt‘ into table test;
select * from test;
select id,userdate(unixtime) from test;

cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF
-----------------------------------------------------
用python将表列进行转换

row format delimited fields terminated by ‘\t‘
load data local inpath ‘///‘ into table test
add File ///.py;
insert overwrite table u_data_new select transform (col1,col2,unixtime) using ‘python ...py‘ as (col1,col2,unixtime) from u_data

python:

for line in sys.std: 
  line = line.strip() 
  col1,col2,unixtime = line.split(‘\t‘)
  weekday=datetime.datetime.formtimestamp(float(unixtime)).isoweekday()
  print ‘\t‘.join(col1,col2,string(weekday))

-------------------------------------------------------
hive :

1组件架构： hiveserver2（beeline）,hive,metadb

其中Execution Engine – The component which executes the execution plan created by the compiler. The plan is a DAG of stages. The execution engine manages the dependencies between these different stages of the plan and executes these stages on the appropriate system components.

2连接方式到hiveserver2 :GUI CLI JDBC(beeline)

3数据源：用kafka，sqoop等获得data，放入hdfs，这些数据各种结构都有。关系数据库的表，MongoDB 或json数据，或日志

4怎么执行hql的？背后运行的是mapreduce or Tez jobs(类似于pig latin脚本执行pig)（tracking url）insert into test values("wangyuq","123");

stage?将你的数据移到目的位置之前，将会staing 那儿一段时间 staging文件没了。

5优劣与评价。pig是对非结构化数据处理的好的etl。

hive不是关系数据库，只是维护存储在HDFS的数据的metadata，使得对大数据操作就像sql操作表一样，只不过hql和sql稍有出入。hive使用metastore存表。hive默认derby但是可自定义更换。

使我们能用sql来执行mr。可以对hdfs数据进行query。

---但是：
hive不能承诺优化，只是简单，因此hive性能不能支持实时
index view,有限制（partition bucket）
read only 不支持update
和sql 的datatype不完全一样
新的partition可以被插入但不能

6与hdfs？hdfs里有hive

7那么如何处理数据？（partition bucket semidata->structured）

load语句：将hdfs搬运到hive，hdfs不再有该数据。只是将真正的data转到了hive目录下。

8那么怎么存数据的？ data在hdfs上，schema在metastore里。

9安装及error
mysql：（用户管理问题）

step 1: SET PASSWORD = PASSWORD(‘your new password‘);
step 2: ALTER USER ‘root‘@‘localhost‘ PASSWORD EXPIRE NEVER;
step 3: flush privileges;

1.$mysql -u root -p
2.mysql> create user ‘hive‘ identified by ‘123456‘;
Query OK, 0 rows affected (0.00 sec)
3.mysql> grant all privileges on *.* to ‘hive‘ with grant option;
Query OK, 0 rows affected (0.00 sec)
4.mysql> flush privileges;
Query OK, 0 rows affected (0.01 sec)

create user ‘hive‘@‘%‘ identified by ‘hive‘;
grant all privileges on *.* to ‘hive‘@‘%‘ with grant option;
flush privileges;

启动hadoop：
hadoop namenode -format; start-all.sh

以上是关于hive的主要内容，如果未能解决你的问题，请参考以下文章

微信小程序代码片段

VSCode自定义代码片段——CSS选择器

谷歌浏览器调试jsp 引入代码片段，如何调试代码片段中的js

片段和活动之间的核心区别是啥？哪些代码可以写成片段？

VSCode自定义代码片段——.vue文件的模板

VSCode自定义代码片段6——CSS选择器