Hive简记

Posted 混沌战神阿瑞斯

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Hive简记相关的知识,希望对你有一定的参考价值。

在大数据工作中难免遇到数据仓库(OLAP)架构,以及通过Hive SQL简化分布式计算的场景。所以想通过这篇博客对Hive使用有一个大致总结,希望道友多多指教!

摘要:

  1.Hive安装

  2.Hive DDL命令

  3.Hive DML初步

  4.Hive DML高级

  5.Hive 优化与配置参数

内容:

  1.Hive安装

  依赖:mysql,jdk,hadoop

  安装文档参考:官方文档;注意这里hive默认使用Derby数据库,只支持单用户登录。修改具体配置请参考官网说明:

Metadata Store

Metadata is in an embedded Derby database whose disk storage location is determined by the Hive configuration variable named javax.jdo.option.ConnectionURL. By default this location is ./metastore_db (see conf/hive-default.xml).

Right now, in the default configuration, this metadata can only be seen by one user at a time.

Metastore can be stored in any database that is supported by JPOX. The location and the type of the RDBMS can be controlled by the two variables javax.jdo.option.ConnectionURL and javax.jdo.option.ConnectionDriverName. Refer to JDO (or JPOX) documentation for more details on supported databases. The database schema is defined in JDO metadata annotations file package.jdo at src/contrib/hive/metastore/src/model.

In the future, the metastore itself can be a standalone server.

If you want to run the metastore as a network server so it can be accessed from multiple nodes, see Hive Using Derby in Server Mode.

   2.Hive DDL命令

  建表语句:

CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name    -- (Note: TEMPORARY available in Hive 0.14.0 and later)
  [(col_name data_type [COMMENT col_comment], ... [constraint_specification])]
  [COMMENT table_comment]
  [PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]  --分区
  [CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS] --分桶
  [SKEWED BY (col_name, col_name, ...)                  -- (Note: Available in Hive 0.10.0 and later)]
     ON ((col_value, col_value, ...), (col_value, col_value, ...), ...)
     [STORED AS DIRECTORIES]
  [                --存储格式
   [ROW FORMAT row_format]      
   [STORED AS file_format]
     | STORED BY ‘storage.handler.class.name‘ [WITH SERDEPROPERTIES (...)]  -- (Note: Available in Hive 0.6.0 and later)
  ]
  [LOCATION hdfs_path]      --外部表指定存储路径
  [TBLPROPERTIES (property_name=property_value, ...)]   -- (Note: Available in Hive 0.6.0 and later)
  [AS select_statement];   -- (Note: Available in Hive 0.5.0 and later; not supported for external tables)
 
CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name  --复制表
  LIKE existing_table_or_view_name
  [LOCATION hdfs_path];

  删除表:DROP TABLE [IF EXISTS] table_name [PURGE];

  截断表:TRUNCATE TABLE table_name [PARTITION partition_spec];

  查看表结构:

  DESCRIBE [EXTENDED|FORMATTED] 
  table_name[.col_name ( [.field_name] | [.‘$elem$‘] | [.‘$key$‘] | [.‘$value$‘] )* ];
   其他请查看官网DDL文档  
  3.Hive DML初步
  加载数据到Hive表:LOAD DATA [LOCAL] INPATH ‘filepath‘ [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)]
  插入数据:
  INSERT OVERWRITE TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...) [IF NOT EXISTS]] select_statement1 FROM from_statement;
  INSERT INTO TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...)] select_statement1 FROM from_statement;
  INSERT INTO TABLE tablename [PARTITION (partcol1[=val1], partcol2[=val2] ...)] VALUES ( value [, value ...] ) [, ( value [, value ...] )
  Hive数据导出
  INSERT OVERWRITE [LOCAL] DIRECTORY directory1
  [ROW FORMAT row_format] [STORED AS file_format] (Note: Only available starting with Hive 0.11.0)
  SELECT ... FROM ...
  更新数据:UPDATE tablename SET column = value [, column = value ...] [WHERE expression]
  删除数据:DELETE FROM tablename [WHERE expression]
  查询数据:
SELECT [ALL | DISTINCT] select_expr, select_expr, ...
  FROM table_reference
  [WHERE where_condition]
  [GROUP BY col_list]
  [ORDER BY col_list]
  [CLUSTER BY col_list
    | [DISTRIBUTE BY col_list] [SORT BY col_list]
  ]
 [LIMIT [offset,] rows]

  Hive内置函数

int

year(string date)

Return the year part of a date or a timestamp string: year("1970-01-01 00:00:00") = 1970, year("1970-01-01") = 1970

string

upper(string A)

returns the string resulting from converting all characters of A to upper case, for example, upper(‘fOoBaR‘) results in ‘FOOBAR‘

string

ucase(string A)

Same as upper

string

trim(string A)

returns the string resulting from trimming spaces from both ends of A, for example, trim(‘ foobar ‘) results in ‘foobar‘

string

to_date(string timestamp)

Return the date part of a timestamp string: to_date("1970-01-01 00:00:00") = "1970-01-01"

string

substr(string A, int start, int length)

returns the substring of A starting from start position with the given length, for example, 
substr(‘foobar‘, 4, 2) results in ‘ba‘

string

substr(string A, int start)

returns the substring of A starting from start position till the end of string A. For example, substr(‘foobar‘, 4) results in ‘bar‘

int

size(Map<K.V>)

returns the number of elements in the map type

int

size(Array<T>)

returns the number of elements in the array type

string

rtrim(string A)

returns the string resulting from trimming spaces from the end(right hand side) of A. For example, rtrim(‘ foobar ‘) results in ‘ foobar‘

BIGINT

round(double a)

returns the rounded BIGINT value of the double

string

regexp_replace(string A, string B, string C)

returns the string resulting from replacing all substrings in B that match the Java regular expression syntax(See Java regular expressions syntax) with C. For example, regexp_replace(‘foobar‘, ‘oo|ar‘, ) returns ‘fb‘

double

rand(), rand(int seed)

returns a random number (that changes from row to row). Specifiying the seed will make sure the generated random number sequence is deterministic.

int

month(string date)

Return the month part of a date or a timestamp string: month("1970-11-01 00:00:00") = 11, month("1970-11-01") = 11

string

ltrim(string A)

returns the string resulting from trimming spaces from the beginning(left hand side) of A. For example, ltrim(‘ foobar ‘) results in ‘foobar ‘

string

lower(string A)

returns the string resulting from converting all characters of B to lower case, for example, lower(‘fOoBaR‘) results in ‘foobar‘

string

lcase(string A)

Same as lower

string

get_json_object(string json_string, string path)

Extract json object from a json string based on json path specified, and return json string of the extracted json object. It will return null if the input json string is invalid.

string

from_unixtime(int unixtime)

convert the number of seconds from the UNIX epoch (1970-01-01 00:00:00 UTC) to a string representing the timestamp of that moment in the current system time zone in the format of "1970-01-01 00:00:00"

BIGINT

floor(double a)

returns the maximum BIGINT value that is equal or less than the double

int

day(string date)

Return the day part of a date or a timestamp string: day("1970-11-01 00:00:00") = 1, day("1970-11-01") = 1

string

concat(string A, string B,...)

returns the string resulting from concatenating B after A. For example, concat(‘foo‘, ‘bar‘) results in ‘foobar‘. This function accepts arbitrary number of arguments and return the concatenation of all of them.

BIGINT

ceil(double a)

returns the minimum BIGINT value that is equal or greater than the double

 

BIGINT

count(*), count(expr), count(DISTINCT expr[, expr_.])

count(*)—Returns the total number of retrieved rows, including rows containing NULL values; count(expr)—Returns the number of rows for which the supplied expression is non-NULL; count(DISTINCT expr[, expr])—Returns the number of rows for which the supplied expression(s) are unique and non-NULL.

DOUBLE

avg(col), avg(DISTINCT col)

returns the average of the elements in the group or the average of the distinct values of the column in the group

DOUBLE

max(col)

returns the maximum value of the column in the group

DOUBLE

min(col)

returns the minimum value of the column in the group

DOUBLE

sum(col), sum(DISTINCT col)

returns the sum of the elements in the group or the sum of the distinct values of the column in the group

详细部分请参考官网DML部分(load/insert/update/delete/mergeimport/exportexplain plan

以上是关于Hive简记的主要内容,如果未能解决你的问题,请参考以下文章

Liunx系统命令代码简记

Markdown语法简记

Resharp使用简记

import的简记

设计模式简记-通过重构增强代码可测试性实战

设计模式简记-通过重构增强代码可测试性实战