连接到集群外的 Cloudera Impala / Hive

Posted

技术标签:

【中文标题】连接到集群外的 Cloudera Impala / Hive【英文标题】:connection to Cloudera Impala / Hive outside the cluster 【发布时间】:2016-04-14 06:30:07 【问题描述】:

我正在使用 cloudera impala 服务器版本 5.4.7 首先要确保端口已打开,我已使用 telnet 对其进行验证。

        Class.forName("org.apache.hive.jdbc.HiveDriver");
        DriverManager.setLoginTimeout(30);
try (java.sql.Connection connection = DriverManager.getConnection("jdbc:hive2://12.23.56.789:123456/someName;auth=noSasl"))
    System.out.println("connected");      

但我从来没有成功连接

我得到的只是这个超时错误:

可能是什么问题? 我使用与 cloudera 版本完全相同的配置单元版本

  [14 Apr 2016 06:27:26,797] [ERROR] [main] [org.apache.hive.jdbc.HiveConnection] - Error opening session
org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out
    at org.apache.thrift.transport.TiostreamTransport.read(TIOStreamTransport.java:129)
    at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
    at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
    at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)
    at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
    at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
    at org.apache.hive.service.cli.thrift.TCLIService$Client.recv_OpenSession(TCLIService.java:156)
    at org.apache.hive.service.cli.thrift.TCLIService$Client.OpenSession(TCLIService.java:143)
    at org.apache.hive.jdbc.HiveConnection.openSession(HiveConnection.java:475)
    at org.apache.hive.jdbc.HiveConnection.<init>(HiveConnection.java:181)
    at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105)
    at java.sql.DriverManager.getConnection(Unknown Source)
    at java.sql.DriverManager.getConnection(Unknown Source)
    at com.datorama.core.service.delivery.providers.DatabaseProvider.main(DatabaseProvider.java:330)
Caused by: java.net.SocketTimeoutException: Read timed out
    at java.net.SocketInputStream.socketRead0(Native Method)
    at java.net.SocketInputStream.socketRead(Unknown Source)
    at java.net.SocketInputStream.read(Unknown Source)
    at java.net.SocketInputStream.read(Unknown Source)
    at java.io.BufferedInputStream.fill(Unknown Source)
    at java.io.BufferedInputStream.read1(Unknown Source)
    at java.io.BufferedInputStream.read(Unknown Source)
    at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
    ... 13 more
Exception in thread "main" java.sql.SQLException: Could not establish connection to jdbc:hive2://54.69.2.250:21050/sage_global;auth=noSasl: java.net.SocketTimeoutException: Read timed out
    at org.apache.hive.jdbc.HiveConnection.openSession(HiveConnection.java:486)
    at org.apache.hive.jdbc.HiveConnection.<init>(HiveConnection.java:181)
    at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105)
    at java.sql.DriverManager.getConnection(Unknown Source)
    at java.sql.DriverManager.getConnection(Unknown Source)
    at com.datorama.core.service.delivery.providers.DatabaseProvider.main(DatabaseProvider.java:330)
Caused by: org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out
    at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)
    at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
    at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
    at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)
    at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
    at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
    at org.apache.hive.service.cli.thrift.TCLIService$Client.recv_OpenSession(TCLIService.java:156)
    at org.apache.hive.service.cli.thrift.TCLIService$Client.OpenSession(TCLIService.java:143)
    at org.apache.hive.jdbc.HiveConnection.openSession(HiveConnection.java:475)
    ... 5 more
Caused by: java.net.SocketTimeoutException: Read timed out
    at java.net.SocketInputStream.socketRead0(Native Method)
    at java.net.SocketInputStream.socketRead(Unknown Source)
    at java.net.SocketInputStream.read(Unknown Source)
    at java.net.SocketInputStream.read(Unknown Source)
    at java.io.BufferedInputStream.fill(Unknown Source)
    at java.io.BufferedInputStream.read1(Unknown Source)
    at java.io.BufferedInputStream.read(Unknown Source)
    at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
    ... 13 more

【问题讨论】:

【参考方案1】:

我们使用 JDBC 从集群外部进行大量查询。虽然我相信可以使用 Hive JDBC 驱动程序,但您肯定需要在 JDBC 连接字符串中设置适当的端口,对于 Impala 可能是 21050。您需要确保您的主机名(或 IP 地址)指向运行 Impala 守护程序的实例(对于 Hive,您可能指向名称节点)。我的猜测是端口号错误,因为似乎错误只是未能建立***连接。

我们决定使用 Cloudera 为 Impala 提供的特定驱动程序,尽管这可能不是必需的。我们还设置了一个负载均衡器,因此有一个稳定的地址可以将查询定向到,而不是要求调用者选择一个特定的 Impala 实例。这也平均分配了负载,让我们在集群中进行更改,而无需外部调用者进行任何更改。

【讨论】:

以上是关于连接到集群外的 Cloudera Impala / Hive的主要内容,如果未能解决你的问题,请参考以下文章

连接到 cloudera impala 环境时出现 Kerberos 错误

ETL informatica 大数据版(不是云版)可以连接到 Cloudera Impala 吗?

无法将 Impala-Kudu 连接到 Apache Kudu(没有 Cloudera Manager):获取 TTransportException 错误

使用边缘节点运行 Hadoop 集群时如何连接到 Impala

Cloudera RImpala 连接不工作

OBIEE 连接到黑斑羚