Jupyter Notebook 连接到远程配置单元

Posted

技术标签:

【中文标题】Jupyter Notebook 连接到远程配置单元【英文标题】:Jupyter Notebook connection to remote hive 【发布时间】:2018-06-14 04:54:15 【问题描述】:

我正在尝试从我们公司远程服务器的 Hive 获取数据。我使用 Anaconda3(Windows 64 位),我的 Hadoop 在 Ambari 上运行。

我已经尝试过像这些...

import findspark
findspark.init()
from pyspark import SparkContext, SparkConf
from pyspark.sql import HiveContext, SparkSession
sparkSession = (SparkSession.builder.appName('example-pyspark-read-from-hive').config("hive.metastore.uris","http://serv_ip:serv_port").enableHiveSupport().getOrCreate())
sparkSession.sql('show databases').show()

也许是我的配置有问题?也许我应该在所有这些之前在 Hive 中进行一些配置。 错误是...

<details>
  <summary>Error </summary>
  Py4JJavaError Traceback (most recent call last) D:\Alanuccio\Progs\spark-2.3.0-bin-hadoop2.7\python\pyspark\sql\utils.py in deco(*a, **kw) 62 try: ---> 63 return f(*a, **kw) 64 except py4j.protocol.Py4JJavaError as e: D:\Alanuccio\Progs\spark-2.3.0-bin-hadoop2.7\python\lib\py4j-0.10.6-src.zip\py4j\protocol.py
  in get_return_value(answer, gateway_client, target_id, name) 319 "An error occurred while calling 012.\n". --> 320 format(target_id, ".", name), value) 321 else: Py4JJavaError: An error occurred while calling o27.sql. : org.apache.spark.sql.AnalysisException:
  java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient; at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106) at org.apache.spark.sql.hive.HiveExternalCatalog.databaseExists(HiveExternalCatalog.scala:194)
  at org.apache.spark.sql.internal.SharedState.externalCatalog$lzycompute(SharedState.scala:114) at org.apache.spark.sql.internal.SharedState.externalCatalog(SharedState.scala:102) at org.apache.spark.sql.hive.HiveSessionStateBuilder.externalCatalog(HiveSessionStateBuilder.scala:39)
  at org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog$lzycompute(HiveSessionStateBuilder.scala:54) at org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog(HiveSessionStateBuilder.scala:52) at org.apache.spark.sql.hive.HiveSessionStateBuilder$$anon$1.
  <init>(HiveSessionStateBuilder.scala:69) at org.apache.spark.sql.hive.HiveSessionStateBuilder.analyzer(HiveSessionStateBuilder.scala:69) at org.apache.spark.sql.internal.BaseSessionStateBuilder$$anonfun$build$2.apply(BaseSessionStateBuilder.scala:293) at
    org.apache.spark.sql.internal.BaseSessionStateBuilder$$anonfun$build$2.apply(BaseSessionStateBuilder.scala:293) at org.apache.spark.sql.internal.SessionState.analyzer$lzycompute(SessionState.scala:79) at org.apache.spark.sql.internal.SessionState.analyzer(SessionState.scala:79)
    at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:57) at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:55) at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:47)
    at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:74) at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:638) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:282) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:214) at java.lang.Thread.run(Thread.java:748)
    Caused by: java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522) at org.apache.spark.sql.hive.client.HiveClientImpl.newState(HiveClientImpl.scala:180)
    at org.apache.spark.sql.hive.client.HiveClientImpl.
    <init>(HiveClientImpl.scala:114) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
      at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:264) at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:385)
      at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:287) at org.apache.spark.sql.hive.HiveExternalCatalog.client$lzycompute(HiveExternalCatalog.scala:66) at org.apache.spark.sql.hive.HiveExternalCatalog.client(HiveExternalCatalog.scala:65)
      at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply$mcZ$sp(HiveExternalCatalog.scala:195) at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply(HiveExternalCatalog.scala:195) at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply(HiveExternalCatalog.scala:195)
      at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97) ... 28 more Caused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1523)
      at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.
      <init>(RetryingMetaStoreClient.java:86) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:132) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104) at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3005)
        at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3024) at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:503) ... 43 more Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
        Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1521) ... 49 more Caused by: java.lang.OutOfMemoryError: Java heap space During handling of the above exception, another exception occurred: AnalysisException Traceback
        (most recent call last)
        <ipython-input-12-9da3198f4ab3> in
          <module>() 4 print( help(sparkSession.sql) )''' 5 ----> 6 sparkSession.sql('show databases').show() D:\Alanuccio\Progs\spark-2.3.0-bin-hadoop2.7\python\pyspark\sql\session.py in sql(self, sqlQuery) 706 [Row(f1=1, f2=u'row1'), Row(f1=2, f2=u'row2'),
            Row(f1=3, f2=u'row3')] 707 """ --> 708 return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped) 709 710 @since(2.0) D:\Alanuccio\Progs\spark-2.3.0-bin-hadoop2.7\python\lib\py4j-0.10.6-src.zip\py4j\java_gateway.py in __call__(self,
            *args) 1158 answer = self.gateway_client.send_command(command) 1159 return_value = get_return_value( -> 1160 answer, self.gateway_client, self.target_id, self.name) 1161 1162 for temp_arg in temp_args: D:\Alanuccio\Progs\spark-2.3.0-bin-hadoop2.7\python\pyspark\sql\utils.py
            in deco(*a, **kw) 67 e.java_exception.getStackTrace())) 68 if s.startswith('org.apache.spark.sql.AnalysisException: '): ---> 69 raise AnalysisException(s.split(': ', 1)[1], stackTrace) 70 if s.startswith('org.apache.spark.sql.catalyst.analysis'):
            71 raise AnalysisException(s.split(': ', 1)[1], stackTrace) AnalysisException: 'java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient;'
</details>

【问题讨论】:

请您正确格式化。谢谢 【参考方案1】:

试试这个,

config("hive.metastore.uris","thrift://serv_ip:serv_port")

默认端口是9083

【讨论】:

我试过了,但没有帮助 您使用的是哪个端口?因为上面的配置对我有用 serv_info: 172..:8080 检查 Ambari 文档 ambari.apache.org/1.2.3/installing-hadoop-using-ambari/content/…。默认端口是 9083。你是不正确的命名空间端口。 谢谢@Kishore!进行研究似乎是正确的方向。

以上是关于Jupyter Notebook 连接到远程配置单元的主要内容,如果未能解决你的问题,请参考以下文章

在 Docker 容器中将 Spyder 连接到远程 Jupyter Notebook

如何从 Jupyter Notebook 中的 PySpark 远程连接到 Greenplum 数据库?

连接到远程服务器上的 docker 中运行的 jupyter notebook

从本地 jupyter notebook 连接到 spark 集群

远程访问 WSL2 Jupyter notebook

从远程服务器持续使用 Jupyter Notebook