大负十进制值在带有十进制类型的spark DataFrame中取整
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了大负十进制值在带有十进制类型的spark DataFrame中取整相关的知识,希望对你有一定的参考价值。
在以下问题上寻求帮助。我们正在使用spark从Oracle引入数据,并且列数据类型之一是number(28,5),对于较小的值,它可以正常工作,但是如果较大的负值,则数据将被截断,例如-544205937126085.125将转换为-544205937126085.100,我在本地尝试过,但问题出在,并且发出相同的问题。
import org.apache.spark.sql.SparkSession
object DecimalIssue {
def main(args: Array[String]): Unit = {
val spark = SparkSession.builder().master("local")
.appName("Decimal Issue")
.getOrCreate()
spark.sparkContext.setLogLevel("ERROR")
import spark.implicits._
val df = Seq((1234,1234.50),
(1234,-544205937126085.125 ),
(1234,200.567),
(1234,-200.567)
).toDF("smallvalue","bigvalue")
val df2 = df.select($"smallvalue",$"bigvalue".cast("decimal(28,5)"))
df2.show(10,false)
df2.printSchema()
}
}
以及上述代码的输出。
+----------+----------------------+
|smallvalue|bigvalue |
+----------+----------------------+
|1234 |1234.50000 |
|1234 |-544205937126085.10000|
|1234 |200.56700 |
|1234 |-200.56700 |
+----------+----------------------+
root
|-- smallvalue: integer (nullable = false)
|-- bigvalue: decimal(28,5) (nullable = true)
理想情况下,我正在寻找输出
+----------+----------------------+
|smallvalue|bigvalue |
+----------+----------------------+
|1234 |1234.50000 |
|1234 |-544205937126085.12500|
|1234 |200.56700 |
|1234 |-200.56700 |
+----------+----------------------+
root
|-- smallvalue: integer (nullable = false)
|-- bigvalue: decimal(28,5) (nullable = true)
编辑:即使是正值也会给出截断的结果。
[已添加数据,例如JSON消息
object DecimalIssue {
def main(args: Array[String]): Unit = {
val spark = SparkSession.builder().master("local")
.appName("Decimal Issue")
.getOrCreate()
spark.sparkContext.setLogLevel("ERROR")
// (1234, "1234.50"),
// (1234, "544205937126085.125"),
// (1234, "200.567"),
// (1234, "-200.567")
val customSchema = new StructType(Array(
StructField("smallvalue",LongType,true),
StructField("bigvalue",StringType,true)
))
import spark.implicits._
val data = "{"smallvalue":1234,"bigvalue":544205937126085.125}"
val df = Seq( data
).toDF("data")
val df1 = df.select(from_json($"data",customSchema).as("orig")).select("orig.*")
df1.show(10,false)
df1.printSchema()
val tryBigDecimal: String => BigDecimal = BigDecimal(_)
val bigUDF = udf(tryBigDecimal)
val bigDecimalUDF = udf(tryBigDecimal)
val df2 = df1.select($"smallvalue", bigUDF($"bigvalue").cast("decimal(28,5)"))
df2.show(10, false)
df2.printSchema()
}
}
但给出相同的结果
预先感谢。
答案
现在文件.json中的数字通过选项“ .option(” prefersDecimal“,true)”作为十进制值读取]
file .json
{"smallvalue":1234,"bigvalue":544205937126085.125},
{"smallvalue":1224,"bigvalue":54420593712608534.12521},
{"smallvalue":2224,"bigvalue":5420593712608534.32521},
{"smallvalue":1114,"bigvalue":950420593712608534.521}
代码
import org.apache.spark.sql.SparkSession
object DecimalIssue {
def main(args: Array[String]): Unit = {
val spark = SparkSession.builder().master("local")
.enableHiveSupport()
.appName("Decimal Issue")
.getOrCreate()
spark.sparkContext.setLogLevel("ERROR")
val sqlContext = spark.sqlContext
import org.apache.spark.sql.functions._
val input = "/home/cloudera/files/tests/decimal.json"
val data = sqlContext
.read
.option("prefersDecimal", true)
.json(input)
data.select(col("bigvalue").cast("decimal(28,5)"), col("smallvalue").cast("int")).show(truncate = false)
}
}
和预期的输出
root
|-- bigvalue: decimal(23,5) (nullable = true)
|-- smallvalue: long (nullable = true)
+------------------------+----------+
|bigvalue |smallvalue|
+------------------------+----------+
|544205937126085.12500 |1234 |
|54420593712608534.12521 |1224 |
|5420593712608534.32521 |2224 |
|950420593712608534.52100|1114 |
+------------------------+----------+
您可以在spark文档中找到所有选项:
http://spark.apache.org/docs/2.3.0/api/scala/index.html#org.apache.spark.sql.DataFrameReader
我希望它会有所帮助或为您提供一些线索,
问候
以上是关于大负十进制值在带有十进制类型的spark DataFrame中取整的主要内容,如果未能解决你的问题,请参考以下文章
十进制数据类型无法在 spark 和 Hive 中正确存储值