HMS2.x与HMS3.x是否支持互访?
Posted 咬定青松
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了HMS2.x与HMS3.x是否支持互访?相关的知识,希望对你有一定的参考价值。
本文首发微信公众号:码上观世界
HMS作为Hive的心脏,管理数据相关的所有元数据,连接着数据分析与数据存储,其本身也支持独立升级或替换。
HMS从1.0.0 到当前 HMS3.1.2 ,经历了多次变更,特别是HMS3相比之前的版本有了较大的修改,但在实际应用中,由于升级不及时等原因,经常会遇到两种版本共存的问题,甚至两个版本互相访问的情况,比如联邦查询,但是我们在没有验证或者验证不完备的情况下,很难明确回答两者是否能互通,因此让用户在实际应用中小心翼翼,希望不幸不要发生在他们头上。HMS2.x与HMS3.x是当前Hive Metastore使用比较广泛的两个版本系列,这篇文章从HMS协议入手,来梳理当前HMS3.x相对HMS2.x的变更,来给使用者一个定心丸:两者能否混用以及如何用才能避免不幸。
HMS包含两部分:HMS Client和HMS Server,两者通过Thrift RPC协议通信。通信协议类似API接口,考虑到版本升级,接口通常保持不变,包括接口名称和参数,如果实在要变更接口,也是新增一个接口,这样,新版本的接口不会影响原有接口的使用。虽然接口不变,两个HMS版本是可以互相调通的,但请求结果可能会变。HMS Client由协议定义和API组成。
协议定义在文件hive_metastore.thrift文件中,首先我们梳理下不同HMS版本的协议变更情况,根据变更和使用的情况,我们将HMS2.x到HMS3.x划分两段:HMS2.1-2.3,HMS2.3-3.1。
2.3相对2.1版本协议变更
这里将2.3相对2.1版本,hive_metastore.thrift文件主要变更的部分摘录出来:
#hive_metastore.thrift
struct PartitionValuesRequest
1: required string dbName,
2: required string tblName,
3: required list<FieldSchema> partitionKeys;
4: optional bool applyDistinct = true;
5: optional string filter;
6: optional list<FieldSchema> partitionOrder;
7: optional bool ascending = true;
8: optional i64 maxParts = -1;
struct PartitionValuesRow
1: required list<string> row;
struct PartitionValuesResponse
1: required list<PartitionValuesRow> partitionValues;
PartitionValuesResponse get_partition_values(1:PartitionValuesRequest request)
throws(1:MetaException o1, 2:NoSuchObjectException o2);
enum ClientCapability
TEST_CAPABILITY = 1
struct ClientCapabilities
1: required list<ClientCapability> values
struct GetTableRequest
1: required string dbName,
2: required string tblName,
3: optional ClientCapabilities capabilities
struct GetTableResult
1: required Table table
struct GetTablesRequest
1: required string dbName,
2: optional list<string> tblNames,
3: optional ClientCapabilities capabilities
struct GetTablesResult
1: required list<Table> tables
list<string> get_tables_by_type(1: string db_name, 2: string pattern, 3: string tableType) throws (1: MetaException o1)
GetTableResult get_table_req(1:GetTableRequest req) throws (1:MetaException o1, 2:NoSuchObjectException o2)
GetTablesResult get_table_objects_by_name_req(1:GetTablesRequest req)
从修改记录来看,协议主要增加了分区读取和响应以及表读取和响应的消息结构体和接口,因此HMS2.1 Client请求HMS2.3 Server不受影响,但是HMS2.3无法请求Hms2.1的新增接口。
3.1相对2.3版本协议变更
这里将3.1相对2.3版本,hive_metastore.thrift文件主要变更的部分摘录出来,主要有几个部分:
A 引入Catalog,用于管理数据库、分区、约束等元数据,表现在通信协议上,增加了对Catalog的增删改查操作的消息体和操作
#hive_metastore.thrift
struct CreateCatalogRequest
1: Catalog catalog
struct AlterCatalogRequest
1: string name,
2: Catalog newCat
struct GetCatalogRequest
1: string name
struct GetCatalogResponse
1: Catalog catalog
struct GetCatalogsResponse
1: list<string> names
struct DropCatalogRequest
1: string name
void create_catalog(1: CreateCatalogRequest catalog) throws (1:AlreadyExistsException o1, 2:InvalidObjectException o2, 3: MetaException o3)
void alter_catalog(1: AlterCatalogRequest rqst) throws (1:NoSuchObjectException o1, 2:InvalidOperationException o2, 3:MetaException o3)
GetCatalogResponse get_catalog(1: GetCatalogRequest catName) throws (1:NoSuchObjectException o1, 2:MetaException o2)
GetCatalogsResponse get_catalogs() throws (1:MetaException o1)
void drop_catalog(1: DropCatalogRequest catName) throws (1:NoSuchObjectException o1, 2:InvalidOperationException o2, 3:MetaException o3)
B 修改Catalog所属元数据的消息结构体,在原有基础协议追加一个可选catName字段
struct HiveObjectRef
1: HiveObjectType objectType,
2: string dbName,
3: string objectName,
4: list<string> partValues,
5: string columnName,
6: optional string catName --新增字段
// namespace for tables
struct Database
1: string name,
2: string description,
3: string locationUri,
4: map<string, string> parameters, // properties associated with the database
5: optional PrincipalPrivilegeSet privileges,
6: optional string ownerName,
7: optional PrincipalType ownerType,
8: optional string catalogName --新增字段
struct PartitionSpec
1: string dbName,
2: string tableName,
3: string rootPath,
4: optional PartitionSpecWithSharedSD sharedSDPartitionSpec,
5: optional PartitionListComposingSpec partitionList,
6: optional string catName --新增字段
struct GetTableRequest
1: required string dbName,
2: required string tblName,
3: optional ClientCapabilities capabilities,
4: optional string catName --新增字段
struct GetTablesRequest
1: required string dbName,
2: optional list<string> tblNames,
3: optional ClientCapabilities capabilities,
4: optional string catName --新增字段
C 删除原来Index相关的消息体和请求方法,取而代之为Constraints相关的消息体和接口协议定义
struct UniqueConstraintsRequest
1: required string catName,
2: required string db_name,
3: required string tbl_name,
struct UniqueConstraintsResponse
1: required list<SQLUniqueConstraint> uniqueConstraints
struct NotNullConstraintsRequest
1: required string catName,
2: required string db_name,
3: required string tbl_name,
struct NotNullConstraintsResponse
1: required list<SQLNotNullConstraint> notNullConstraints
struct DefaultConstraintsRequest
1: required string catName,
2: required string db_name,
3: required string tbl_name
struct DefaultConstraintsResponse
1: required list<SQLDefaultConstraint> defaultConstraints
struct CheckConstraintsRequest
1: required string catName,
2: required string db_name,
3: required string tbl_name
struct CheckConstraintsResponse
1: required list<SQLCheckConstraint> checkConstraints
struct DropConstraintRequest
1: required string dbname,
2: required string tablename,
3: required string constraintname,
4: optional string catName
struct AddUniqueConstraintRequest
1: required list<SQLUniqueConstraint> uniqueConstraintCols
struct AddNotNullConstraintRequest
1: required list<SQLNotNullConstraint> notNullConstraintCols
struct AddDefaultConstraintRequest
1: required list<SQLDefaultConstraint> defaultConstraintCols
struct AddCheckConstraintRequest
1: required list<SQLCheckConstraint> checkConstraintCols
// other constraints
UniqueConstraintsResponse get_unique_constraints(1:UniqueConstraintsRequest request)
throws(1:MetaException o1, 2:NoSuchObjectException o2)
NotNullConstraintsResponse get_not_null_constraints(1:NotNullConstraintsRequest request)
throws(1:MetaException o1, 2:NoSuchObjectException o2)
DefaultConstraintsResponse get_default_constraints(1:DefaultConstraintsRequest request)
throws(1:MetaException o1, 2:NoSuchObjectException o2)
CheckConstraintsResponse get_check_constraints(1:CheckConstraintsRequest request)
throws(1:MetaException o1, 2:NoSuchObjectException o2)
void add_unique_constraint(1:AddUniqueConstraintRequest req)
throws(1:NoSuchObjectException o1, 2:MetaException o2)
void add_not_null_constraint(1:AddNotNullConstraintRequest req)
throws(1:NoSuchObjectException o1, 2:MetaException o2)
void add_default_constraint(1:AddDefaultConstraintRequest req)
throws(1:NoSuchObjectException o1, 2:MetaException o2)
void add_check_constraint(1:AddCheckConstraintRequest req)
throws(1:NoSuchObjectException o1, 2:MetaException o2)
D 某些字段或数据类型变更,涉及到列统计消息体、动态分区、事务操作
thrift的varchar数据类型变更通过自动代码生成后统一为java String,对API使用没有影响。
接下来再看详细的接口API变更,接口定义在IMetaStoreClient和RetryingMetaStoreClient。
其中RetryingMetaStoreClient作为代理实现,支持失败重试功能。
2.3相对2.1版本API变化
新增如下接口:
#IMetaStoreClient.java
List<String> getTables(String dbName, String tablePattern, TableType tableType)
throws MetaException, TException, UnknownDBException;
public PartitionValuesResponse listPartitionValues(PartitionValuesRequest request)
throws MetaException, TException, NoSuchObjectException;
void alter_partition(String dbName, String tblName, Partition newPart)
throws InvalidOperationException, MetaException, TException;
3.1相对2.3版本API变化:
新增了Catalog相关的增删改查接口:
void createCatalog(Catalog catalog)
throws AlreadyExistsException, InvalidObjectException, MetaException, TException;
void alterCatalog(String catalogName, Catalog newCatalog)
throws NoSuchObjectException, InvalidObjectException, MetaException, TException;
Catalog getCatalog(String catName) throws NoSuchObjectException, MetaException, TException;
List<String> getCatalogs() throws MetaException, TException;
void dropCatalog(String catName)
throws NoSuchObjectException, InvalidOperationException, MetaException, TException;
跟Table、Database、Schema和Field相关的接口,并且涉及到Catalog的,在保留原来的接口上新增新的接口,跟之前的接口区别是新增了String catName参数:
List<String> getTables(String catName, String dbName, String tablePattern)
throws MetaException, TException, UnknownDBException;
List<TableMeta> getTableMeta(String catName, String dbPatterns, String tablePatterns,
List<String> tableTypes)
throws MetaException, TException, UnknownDBException;
List<String> getAllTables(String catName, String dbName)
throws MetaException, TException, UnknownDBException;
void dropDatabase(String catName, String dbName, boolean deleteData, boolean ignoreUnknownDb,
boolean cascade)
throws NoSuchObjectException, InvalidOperationException, MetaException, TException;
void dropTable(String catName, String dbName, String tableName, boolean deleteData,
boolean ignoreUnknownTable, boolean ifPurge)
throws MetaException, NoSuchObjectException, TException;
default void dropTable(String catName, String dbName, String tableName, boolean deleteData,
boolean ignoreUnknownTable)
throws MetaException, NoSuchObjectException, TException
dropTable(catName, dbName, tableName, deleteData, ignoreUnknownTable, false);
default void dropTable(String catName, String dbName, String tableName)
throws MetaException, NoSuchObjectException, TException
dropTable(catName, dbName, tableName, true, true, false);
List<FieldSchema> getFields(String catName, String db, String tableName)
throws MetaException, TException, UnknownTableException,
UnknownDBException;
List<FieldSchema> getSchema(String catName, String db, String tableName)
throws MetaException, TException, UnknownTableException,
UnknownDBException;
凡是跟分区相关的接口,在保留原来的接口基础上,都新增了一个,跟之前的区别是多了一个String catName 参数:
Partition appendPartition(String catName, String dbName, String tableName, List<String> partVals)
throws InvalidObjectException, AlreadyExistsException, MetaException, TException;
Partition appendPartition(String catName, String dbName, String tableName, String name)
throws InvalidObjectException, AlreadyExistsException, MetaException, TException;
Partition getPartition(String catName, String dbName, String tblName, List<String> partVals)
throws NoSuchObjectException, MetaException, TException;
Partition getPartition(String catName, String dbName, String tblName, String name)
throws MetaException, UnknownTableException, NoSuchObjectException, TException;
List<Partition> listPartitions(String catName, String db_name, String tbl_name, int max_parts)
throws NoSuchObjectException, MetaException, TException;
...
从API可知,HMS3.x相比HMS2.x较大变化是新增了Catalog相关的操作,但是原有接口继续保留。另外,通过RetryingMetaStoreClient创建IMetaStoreClient实例的接口参数类型也发生了变化:从HiveConf变成Configuration。
如下是HMS 2.3.8中RetryingMetaStoreClient的实现:
#RetryingMetaStoreClient @2.3.8
public class RetryingMetaStoreClient implements InvocationHandler
public static IMetaStoreClient getProxy(
HiveConf hiveConf, boolean allowEmbedded);
public static IMetaStoreClient getProxy(HiveConf hiveConf, HiveMetaHookLoader hookLoader,
String mscClassName);
public static IMetaStoreClient getProxy(HiveConf hiveConf, HiveMetaHookLoader hookLoader,
ConcurrentHashMap<String, Long> metaCallTimeMap, String mscClassName, boolean allowEmbedded);
//This constructor is meant for Hive internal use only. Please use getProxy(HiveConf hiveConf, HiveMetaHookLoader hookLoader) for external purpose.
public static IMetaStoreClient getProxy(HiveConf hiveConf, Class<?>[] constructorArgTypes,
Object[] constructorArgs, String mscClassName);
//This constructor is meant for Hive internal use only. Please use getProxy(HiveConf hiveConf, HiveMetaHookLoader hookLoader) for external purpose.
public static IMetaStoreClient getProxy(HiveConf hiveConf, Class<?>[] constructorArgTypes,
Object[] constructorArgs, ConcurrentHashMap<String, Long> metaCallTimeMap,
String mscClassName);
这里是HMS 3.1.2 中RetryingMetaStoreClient的实现:
#RetryingMetaStoreClient @3.1.2
public static IMetaStoreClient getProxy(Configuration hiveConf, boolean allowEmbedded) throws MetaException
return getProxy(hiveConf, new Class[]Configuration.class, HiveMetaHookLoader.class, Boolean.class, new Object[]hiveConf, null, allowEmbedded, (ConcurrentHashMap)null, HiveMetaStoreClient.class.getName());
@VisibleForTesting
public static IMetaStoreClient getProxy(Configuration hiveConf, HiveMetaHookLoader hookLoader, String mscClassName) throws MetaException
return getProxy(hiveConf, hookLoader, (ConcurrentHashMap)null, mscClassName, true);
public static IMetaStoreClient getProxy(Configuration hiveConf, HiveMetaHookLoader hookLoader, ConcurrentHashMap<String, Long> metaCallTimeMap, String mscClassName, boolean allowEmbedded) throws MetaException
return getProxy(hiveConf, new Class[]Configuration.class, HiveMetaHookLoader.class, Boolean.class, new Object[]hiveConf, hookLoader, allowEmbedded, metaCallTimeMap, mscClassName);
public static IMetaStoreClient getProxy(Configuration hiveConf, Class<?>[] constructorArgTypes, Object[] constructorArgs, String mscClassName) throws MetaException
return getProxy(hiveConf, constructorArgTypes, constructorArgs, (ConcurrentHashMap)null, mscClassName);
public static IMetaStoreClient getProxy(Configuration hiveConf, Class<?>[] constructorArgTypes, Object[] constructorArgs, ConcurrentHashMap<String, Long> metaCallTimeMap, String mscClassName) throws MetaException
Class<? extends IMetaStoreClient> baseClass = JavaUtils.getClass(mscClassName, IMetaStoreClient.class);
RetryingMetaStoreClient handler = new RetryingMetaStoreClient(hiveConf, constructorArgTypes, constructorArgs, metaCallTimeMap, baseClass);
return (IMetaStoreClient)Proxy.newProxyInstance(RetryingMetaStoreClient.class.getClassLoader(), baseClass.getInterfaces(), handler);
实验与结论
1. Hive Metastore client (2.x )低版本可以正常访问高版本HMS Server(2.x,3.x );
2. Hive Metastore client (3.x)低版本可以正常访问高版本HMS Server(3.x);
3. Hive Metastore client (2.x)高版本只能访问低版本HMS Server(2.x)中原本存在的方法,不能访问新增的方法;
4. Hive Metastore client (3.x)高版本只能访问低版本HMS Server(3.x)中原本存在的方法,不能访问新增的方法;
5. Hive Metastore client (3.x)高版本只能访问低版本中部分原本存在的方法;
如:
Hive Metastore client 2.x查询 Hive Metastore Server 3.x 可以正常查询数据库列表信息,反过来无法查询数据库列表信息:
hiveMetaStoreClient.getCatalogs() #org.apache.thrift.TApplicationException: Invalid method name: 'get_catalogs'
hiveMetaStoreClient.getAllDatabases() # Empty Result
特别地,当Hive Metastore client 使用代理RetryingMetaStoreClient创建时,只能由同一个大版本来实例化,比如Hive Metastore client 2.x只能由 HMS 2.x来初始化,同样的Hive Metastore client 3.x只能由 HMS 3.x来初始化。
以上是关于HMS2.x与HMS3.x是否支持互访?的主要内容,如果未能解决你的问题,请参考以下文章