Spark 不读取 hive-site.xml?

Posted

技术标签:

【中文标题】Spark 不读取 hive-site.xml?【英文标题】:Spark not reading hive-site.xml? 【发布时间】:2017-08-02 11:49:18 【问题描述】:

我正在尝试访问 hive 元存储,为此我正在使用 SparkSql。我已经设置了 sparksession ,但是当我运行我的程序并查看日志时,我看到了这个异常

Caused by: java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
    at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
    at org.apache.spark.sql.hive.client.HiveClientImpl.<init>(HiveClientImpl.scala:188)
    ... 61 more
Caused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
    at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1523)
    at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:86)
    at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:132)
    at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104)
    at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3005)
    at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3024)
    at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:503)
    ... 62 more
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1521)
    ... 68 more
Caused by: javax.jdo.JDOFatalDataStoreException: Unable to open a test connection to the given database. JDBC url = jdbc:derby:;databaseName=metastore_db;create=true, username = APP. Terminating connection pool (set lazyInit to true if you expect to start your database after your app). Original Exception: ------
java.sql.SQLException: Failed to create database 'metastore_db', see the next exception for details.

我正在运行一个访问以下代码的 servlet

public class HiveReadone extends HttpServlet 
    private static final long serialVersionUID = 1L;

    /**
     * @see HttpServlet#HttpServlet()
     */
    public HiveReadone() 
        super();
        // TODO Auto-generated constructor stub
    

    /**
     * @see HttpServlet#doGet(HttpServletRequest request, HttpServletResponse response)
     */
    protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException 
        // TODO Auto-generated method stub
        response.getWriter().append("Served at: ").append(request.getContextPath());

        SparkSession spark = SparkSession
                .builder()
                .appName("Java Spark SQL basic example")
                .enableHiveSupport()
                .config("spark.sql.warehouse.dir", "hdfs://saurab:9000/user/hive/warehouse")
                .config("mapred.input.dir.recursive", true)
                .config("hive.mapred.supports.subdirectories", true)
                .config("hive.vectorized.execution.enabled", true)
                .master("local")
                .getOrCreate();
        response.getWriter().println(spark);

浏览器接受来自response.getWriter().append("Served at: ").append(request.getContextPath());(即Served at: /hiveServ)的输出,不会打印任何内容

请看我的conf/hive-site.xml

<property>
        <name>hive.metastore.schema.verification</name>
        <value>false</value>
    </property>
    <property>
        <name>hive.server2.enable.doAs</name>
        <value>true</value>
    </property>
    <property>
        <name>javax.jdo.option.ConnectionURL</name>
        <value>jdbc:mysql://saurab:3306/metastore_db?createDatabaseIfNotExist=true</value>
        <description>metadata is stored in a MySQL server</description>
    </property>
    <property>
        <name>javax.jdo.option.ConnectionDriverName</name>
        <value>com.mysql.jdbc.Driver</value>
        <description>MySQL JDBC driver class</description>
    </property>
    <property>
        <name>hive.aux.jars.path</name>
        <value>/home/saurab/hadoopec/hive/lib/hive-serde-2.1.1.jar</value>
    </property>
    <property>
        <name>spark.sql.warehouse.dir</name>
        <value>hdfs://saurab:9000/user/hive/warehouse</value>
    </property>
    <property>
        <name>hive.metastore.uris</name>
        <!--Make sure that <value> points to the Hive Metastore URI in your cluster -->
        <value>thrift://saurab:9083</value>
        <description>URI for client to contact metastore server</description>
    </property>
    <property>
        <name>hive.server2.thrift.port</name>
        <value>10001</value>
        <description>Port number of HiveServer2 Thrift interface.
            Can be overridden by setting $HIVE_SERVER2_THRIFT_PORT
        </description>
    </property>
    <property>
        <name>javax.jdo.option.ConnectionUserName</name>
        <value>hiveuser</value>
        <description>user name for connecting to mysql server</description>
    </property>
    <property>
        <name>javax.jdo.option.ConnectionPassword</name>
        <value>hivepassword</value>
        <description>password for connecting to mysql server</description>
    </property>

据我所知,如果我们配置 hive.metastore.uris,spark 将连接到 hive 元存储,但在我的情况下它不是并且给了我上述错误。

【问题讨论】:

【参考方案1】:

要在 hive 上配置 spark,请尝试将 hive-site.xml 复制到 spark/conf 目录

【讨论】:

感谢您的输入,但正如我所提到的,我已经在 /conf 目录中创建了 hive-site.xml /hive/conf 中是否也有相同的 hive-site.xml?

以上是关于Spark 不读取 hive-site.xml?的主要内容,如果未能解决你的问题,请参考以下文章

通过spark-sql快速读取hive中的数据

通过配置hive-site.xml文件实现Hive集成Spark

通过配置hive-site.xml文件实现Hive集成Spark

通过配置hive-site.xml文件实现Hive集成Spark

Spark读取Hive时schema版本不匹配的解决方法

Spark连接Hive