Zeppelin 不加载 maven jar

Posted

技术标签:

【中文标题】Zeppelin 不加载 maven jar【英文标题】:Zeppelin does not load maven jar 【发布时间】:2020-05-16 18:50:44 【问题描述】:

Apache Zeppelin 版本 0.7.1

%dep
z.reset() // clean up previously added artifact and repository

// add maven repository
z.addRepo("Spark Cassandra Connector 2.0.10").url("https://mvnrepository.com/artifact/com.datastax.spark/spark-cassandra-connector")

// add artifact recursively
// z.load("groupId:artifactId:version")
z.load("com.datastax.spark:spark-cassandra-connector_2.11:2.0.10") 

java.lang.NullPointerException
    at org.sonatype.aether.impl.internal.DefaultRepositorySystem.resolveDependencies(DefaultRepositorySystem.java:352)
    at org.apache.zeppelin.spark.dep.SparkDependencyContext.fetchArtifactWithDep(SparkDependencyContext.java:171)
    at org.apache.zeppelin.spark.dep.SparkDependencyContext.fetch(SparkDependencyContext.java:121)
    at org.apache.zeppelin.spark.DepInterpreter.interpret(DepInterpreter.java:245)
    at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:95)
    at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:490)
    at org.apache.zeppelin.scheduler.Job.run(Job.java:175)
    at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

以下内容似乎没有帮助:

    apache zeppelin additional repository import 在https://zeppelin.apache.org/docs/0.7.1/interpreter/spark.html 中,给出了将zeppelin.dep.additionalRemoteRepository 设置为http://dl.bintray.com/spark-packages/maven 的示例,但是这个repo 没有我需要的jar 版本。 https://zeppelin.apache.org/docs/0.7.1/interpreter/spark.html#3-dynamic-dependency-loading-via-sparkdep-interpreter Zeppelin dynamic dependency loading fails on os-maven-plugin

现在我明白了

%dep
z.load("com.datastax.spark:spark-cassandra-connector_2.11:2.0.10")

org.sonatype.aether.resolution.DependencyResolutionException: Could not find artifact com.datastax.spark:spark-cassandra-connector_2.11:jar:2.0.10 in central (https://repo1.maven.org/maven2/)

【问题讨论】:

请问为什么是0.7.1?太老了…… 客户不想升级,我们继续支持。 【参考方案1】:

我需要将 zeppelin-env.sh 中的 maven 重新定位 url 从 http 更新为 https

export ZEPPELIN_INTERPRETER_DEP_MVNREPO="https://repo1.maven.org/maven2/"

还有其他特定于我们的构建系统(Jenkins 管道)的非通用更改

如果您可以控制服务器,只需下载您需要的 jar(即"groupArtifactVersion": "com.datastax.spark:spark-cassandra-connector_2.11:2.0.10",),然后从磁盘加载依赖项会更容易,例如:

%dep
z.load("/opt/zeppelin/spark-cassandra-connector-assembly/spark-cassandra-connector-assembly.jar")

【讨论】:

这对我有用:export ZEPPELIN_INTERPRETER_DEP_MVNREPO="repo.maven.apache.org/maven2"

以上是关于Zeppelin 不加载 maven jar的主要内容,如果未能解决你的问题,请参考以下文章

从 s3 将外部 jars 加载到 Zeppelin

Apache Zeppelin:每个页面都需要几分钟才能加载

pyspark 中的 K-means 在 jupyter notebook 中无限运行,在 zeppelin notebook 中运行良好

如何在 Zeppelin 中加载 hiveContext?

Zeppelin 的内联散景图

Zeppelin Spark Maxmind jackson.databind NoSuchMethodError