如何在pyspark中关闭科学记数法？

Posted 2023-04-13

技术标签:

【中文标题】如何在pyspark中关闭科学记数法？【英文标题】：How to turn off scientific notation in pyspark? 【发布时间】：2016-10-23 18:47:55 【问题描述】：

作为一些聚合的结果，我想出了以下 sparkdataframe：

 ------------+-----------------+-----------------+
|sale_user_id|gross_profit     |total_sale_volume|
+------------+-----------------+-----------------+
|       20569|       -3322960.0|     2.12569482E8|
|       24269|       -1876253.0|      8.6424626E7|
|        9583|              0.0|       1.282272E7|
|       11722|          18229.0|        5653149.0|
|       37982|           6077.0|        1181243.0|
|       20428|           1665.0|        7011588.0|
|       41157|          73227.0|        1.18631E7|
|        9993|              0.0|        1481437.0|
|        9030|           8865.0|      4.4133791E7|
|         829|              0.0|          11355.0|
+------------+-----------------+-----------------+

数据框的架构是：

root
 |-- sale_user_id: string (nullable = true)
 |-- tapp_gross_profit: double (nullable = true)
 |-- total_sale_volume: double (nullable = true)

如何在 Gross_profit 和 total_sale_volume 列中禁用科学记数法？

【问题讨论】：

【参考方案1】：

最简单的方法是将双列转换为十进制，给出适当的precision and scale：

df.withColumn('total_sale_volume', df.total_sale_volume.cast(DecimalType(18, 2)))

【讨论】：

知道如何在不通知小数位数（指数）的情况下做到这一点吗？我的意思是，让它被推断？ @BrunoAmbrozio 你总是可以.collect() 一个数据框，然后你有一个纯 python 对象，可以更好地控制这些对象的打印方式 (***.com/questions/658763/…) 现在我需要的几乎相同，但是为了将值保存在文件中，我无法设置精度。感谢有人有解决方案。这是新问题：***.com/questions/64772851/… DecimalType 也采用科学记数法，具体取决于精度和小数位数。【参考方案2】：

DecimalType 在 spark 3.0+ 中已弃用

如果是字符串类型，先转换为 Doubletype，最后转换为 BigInt 类型。无需设置精度：

df.withColumn('total_sale_volume', df.total_sale_volume.cast(StringType).cast(BigIntType))

或者无需导入：

df.withColumn('total_sale_volume', df.total_sale_volume.cast('string').cast('bigint'))

【讨论】：

DecimalType 在 spark 3.0+ 中不被弃用

以上是关于如何在pyspark中关闭科学记数法？的主要内容，如果未能解决你的问题，请参考以下文章