Jupyter Notebook 连接到远程配置单元
Posted
技术标签:
【中文标题】Jupyter Notebook 连接到远程配置单元【英文标题】:Jupyter Notebook connection to remote hive 【发布时间】:2018-06-14 04:54:15 【问题描述】:我正在尝试从我们公司远程服务器的 Hive 获取数据。我使用 Anaconda3(Windows 64 位),我的 Hadoop 在 Ambari 上运行。
我已经尝试过像这些...
import findspark
findspark.init()
from pyspark import SparkContext, SparkConf
from pyspark.sql import HiveContext, SparkSession
sparkSession = (SparkSession.builder.appName('example-pyspark-read-from-hive').config("hive.metastore.uris","http://serv_ip:serv_port").enableHiveSupport().getOrCreate())
sparkSession.sql('show databases').show()
也许是我的配置有问题?也许我应该在所有这些之前在 Hive 中进行一些配置。 错误是...
<details>
<summary>Error </summary>
Py4JJavaError Traceback (most recent call last) D:\Alanuccio\Progs\spark-2.3.0-bin-hadoop2.7\python\pyspark\sql\utils.py in deco(*a, **kw) 62 try: ---> 63 return f(*a, **kw) 64 except py4j.protocol.Py4JJavaError as e: D:\Alanuccio\Progs\spark-2.3.0-bin-hadoop2.7\python\lib\py4j-0.10.6-src.zip\py4j\protocol.py
in get_return_value(answer, gateway_client, target_id, name) 319 "An error occurred while calling 012.\n". --> 320 format(target_id, ".", name), value) 321 else: Py4JJavaError: An error occurred while calling o27.sql. : org.apache.spark.sql.AnalysisException:
java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient; at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106) at org.apache.spark.sql.hive.HiveExternalCatalog.databaseExists(HiveExternalCatalog.scala:194)
at org.apache.spark.sql.internal.SharedState.externalCatalog$lzycompute(SharedState.scala:114) at org.apache.spark.sql.internal.SharedState.externalCatalog(SharedState.scala:102) at org.apache.spark.sql.hive.HiveSessionStateBuilder.externalCatalog(HiveSessionStateBuilder.scala:39)
at org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog$lzycompute(HiveSessionStateBuilder.scala:54) at org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog(HiveSessionStateBuilder.scala:52) at org.apache.spark.sql.hive.HiveSessionStateBuilder$$anon$1.
<init>(HiveSessionStateBuilder.scala:69) at org.apache.spark.sql.hive.HiveSessionStateBuilder.analyzer(HiveSessionStateBuilder.scala:69) at org.apache.spark.sql.internal.BaseSessionStateBuilder$$anonfun$build$2.apply(BaseSessionStateBuilder.scala:293) at
org.apache.spark.sql.internal.BaseSessionStateBuilder$$anonfun$build$2.apply(BaseSessionStateBuilder.scala:293) at org.apache.spark.sql.internal.SessionState.analyzer$lzycompute(SessionState.scala:79) at org.apache.spark.sql.internal.SessionState.analyzer(SessionState.scala:79)
at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:57) at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:55) at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:47)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:74) at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:638) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:214) at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522) at org.apache.spark.sql.hive.client.HiveClientImpl.newState(HiveClientImpl.scala:180)
at org.apache.spark.sql.hive.client.HiveClientImpl.
<init>(HiveClientImpl.scala:114) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:264) at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:385)
at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:287) at org.apache.spark.sql.hive.HiveExternalCatalog.client$lzycompute(HiveExternalCatalog.scala:66) at org.apache.spark.sql.hive.HiveExternalCatalog.client(HiveExternalCatalog.scala:65)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply$mcZ$sp(HiveExternalCatalog.scala:195) at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply(HiveExternalCatalog.scala:195) at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply(HiveExternalCatalog.scala:195)
at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97) ... 28 more Caused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1523)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.
<init>(RetryingMetaStoreClient.java:86) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:132) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104) at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3005)
at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3024) at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:503) ... 43 more Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1521) ... 49 more Caused by: java.lang.OutOfMemoryError: Java heap space During handling of the above exception, another exception occurred: AnalysisException Traceback
(most recent call last)
<ipython-input-12-9da3198f4ab3> in
<module>() 4 print( help(sparkSession.sql) )''' 5 ----> 6 sparkSession.sql('show databases').show() D:\Alanuccio\Progs\spark-2.3.0-bin-hadoop2.7\python\pyspark\sql\session.py in sql(self, sqlQuery) 706 [Row(f1=1, f2=u'row1'), Row(f1=2, f2=u'row2'),
Row(f1=3, f2=u'row3')] 707 """ --> 708 return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped) 709 710 @since(2.0) D:\Alanuccio\Progs\spark-2.3.0-bin-hadoop2.7\python\lib\py4j-0.10.6-src.zip\py4j\java_gateway.py in __call__(self,
*args) 1158 answer = self.gateway_client.send_command(command) 1159 return_value = get_return_value( -> 1160 answer, self.gateway_client, self.target_id, self.name) 1161 1162 for temp_arg in temp_args: D:\Alanuccio\Progs\spark-2.3.0-bin-hadoop2.7\python\pyspark\sql\utils.py
in deco(*a, **kw) 67 e.java_exception.getStackTrace())) 68 if s.startswith('org.apache.spark.sql.AnalysisException: '): ---> 69 raise AnalysisException(s.split(': ', 1)[1], stackTrace) 70 if s.startswith('org.apache.spark.sql.catalyst.analysis'):
71 raise AnalysisException(s.split(': ', 1)[1], stackTrace) AnalysisException: 'java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient;'
</details>
【问题讨论】:
请您正确格式化。谢谢 【参考方案1】:试试这个,
config("hive.metastore.uris","thrift://serv_ip:serv_port")
默认端口是9083
【讨论】:
我试过了,但没有帮助 您使用的是哪个端口?因为上面的配置对我有用 serv_info: 172..:8080 检查 Ambari 文档 ambari.apache.org/1.2.3/installing-hadoop-using-ambari/content/…。默认端口是 9083。你是不正确的命名空间端口。 谢谢@Kishore!进行研究似乎是正确的方向。以上是关于Jupyter Notebook 连接到远程配置单元的主要内容,如果未能解决你的问题,请参考以下文章
在 Docker 容器中将 Spyder 连接到远程 Jupyter Notebook
如何从 Jupyter Notebook 中的 PySpark 远程连接到 Greenplum 数据库?
连接到远程服务器上的 docker 中运行的 jupyter notebook