spark_load csv to hive via hivecontext

Posted 天天好心情

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了spark_load csv to hive via hivecontext相关的知识,希望对你有一定的参考价值。

//prepare csv

year,make,model,comment,blank
"2012","Tesla","S","No comment",
"1997","Ford,E350","Go get one now they are going fast",
"2015","Chevy","Volt"

 

//Processing and inserting data in hive without schema

import org.apache.spark.sql.hive.HiveContext
import org.apache.spark.sql.hive.orc._
val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
val df = hiveContext.read.format("com.databricks.spark.csv").option("header", "true").option("inferSchema", "true").load("/tmp/cars.csv")
val selectedData = df.select("year", "model")
selectedData.write.format("orc").option("header", "true").save("/tmp/newcars")

 

//permission issues as user hive 

// org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:java.security.AccessControlException: Permission denied: user=hive, access=WRITE, inode="/tmp/newcars":hdfs:hdfs:drwxr-xr-x
//Updated /tmp/newcars_orc_cust17 directory permissions

hiveContext.sql("create external table newcars_orc_ext_cust17(year string,model string) stored as orc location ‘/tmp/newcars‘")
hiveContext.sql("show tables").collect().foreach(println)

 

 

hiveContext.sql("select * from newcars").collect().foreach(println)

以上是关于spark_load csv to hive via hivecontext的主要内容,如果未能解决你的问题,请参考以下文章

hive导出csv乱码

如何将 Hive 表导出为 CSV 文件?

csv数据导入hive

Hive CSV 行分隔符配置

hive表数据导出到csv乱码原因及解决方案

减少 Beeline Hive CSV 的详细程度