hive表新增字段后，新字段无法写入的问题 -- cascade

Posted 2023-04-24

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了hive表新增字段后，新字段无法写入的问题 -- cascade相关的知识，希望对你有一定的参考价值。

参考技术A 实际应用中，常常存在修改数据表结构的需求，比如：增加一个新字段。

如果使用如下语句新增列，可以成功添加列col1。但如果数据表tb已经有旧的分区（例如：dt=20190101），则该旧分区中的col1将为空且无法更新，即便insert overwrite该分区也不会生效。

解决方法：

解决方法很简单，就是增加col1时加上cascade关键字。示例如下：

加深记忆的方法也很简单，cascade的中文翻译为“级联”，也就是不仅变更新分区的表结构（metadata），同时也变更旧分区的表结构。

ADD COLUMNS lets you add new columns to the end of the existing columns but before the partition columns. This is supported for Avro backed tables as well, for Hive 0.14 and later.

REPLACE COLUMNS removes all existing columns and adds the new set of columns. This can be done only for tables with a native SerDe (DynamicSerDe, MetadataTypedColumnsetSerDe, LazySimpleSerDe and ColumnarSerDe). Refer to Hive SerDe for more information. REPLACE COLUMNS can also be used to drop columns. For example, "ALTER TABLE test_change REPLACE COLUMNS (a int, b int);" will remove column 'c' from test_change's schema.

The PARTITION clause is available in Hive 0.14.0 and later; see Upgrading Pre-Hive 0.13.0 Decimal Columns for usage.

The CASCADE|RESTRICT clause is available in Hive 1.1.0. ALTER TABLE ADD|REPLACE COLUMNS with CASCADE command changes the columns of a table's metadata, and cascades the same change to all the partition metadata. RESTRICT is the default, limiting column changes only to table metadata.

数据仓库数据可视化 Hive导出到MySql

大数据开发的最后一环，将数仓中ADS层的数据，导出到MySql，剩下就是Java工程师的事了。

1 在MySql中创建对应的ADS表，字段和类型与数仓中的表一致，略。

2 数据导出脚本。

①--update-mode

　　updateonly：只更新，无法插入新数据。

　　allowinsert：允许新增

②--update-key：允许更新的情况下，指定哪些字段匹配视为同一条数据，进行更新而不增加。多个字段用逗号分隔。

③--input-null-string和--input-null-non-string，分别表示，将字符串列和非字符串列的空串和“null”转换成‘\N‘。Hive中的Null在底层是以“N”来存储，而MySQL中的Null在底层就是Null，为了保证数据两端的一致性。在导出数据时采用--input-null-string和--input-null-non-string两个参数。导入数据时采用--null-string和--null-non-string。

#!/bin/bash

db_name=gmall

export_data() {
/opt/module/sqoop/bin/sqoop export --connect "jdbc:mysql://hadoop102:3306/${db_name}?useUnicode=true&characterEncoding=utf-8"  --username root --password 000000 --table $1 --num-mappers 1 --export-dir /warehouse/$db_name/ads/$1 --input-fields-terminated-by "	" --update-mode allowinsert --update-key "tm_id,category1_id,stat_mn,stat_date" --input-null-string ‘\N‘    --input-null-non-string ‘\N‘
}

case $1 in
  "ads_uv_count")
     export_data "ads_uv_count"
;;
  "ads_user_action_convert_day")
     export_data "ads_user_action_convert_day"
;;
  "ads_gmv_sum_day")
     export_data "ads_gmv_sum_day"
;;
   "all")
     export_data "ads_uv_count"
     export_data "ads_user_action_convert_day"
     export_data "ads_gmv_sum_day"
;;
esac

以上是关于hive表新增字段后，新字段无法写入的问题 -- cascade的主要内容，如果未能解决你的问题，请参考以下文章

hive表新增字段后，新字段无法写入的问题 -- cascade

数据仓库 数据可视化 Hive导出到MySql

数据仓库数据可视化 Hive导出到MySql