有没有办法像 Pyspark 显示数据框一样打印 Pandas 数据框？

Posted 2023-04-15

技术标签:

【中文标题】有没有办法像 Pyspark 显示数据框一样打印 Pandas 数据框？【英文标题】：Is there a way to print a Pandas Dataframe like how Pyspark displays Dataframes? 【发布时间】：2021-09-29 18:12:18 【问题描述】：

我想以与 pyspark 表相同的样式打印我的 pandas 数据框而不将 pandas 数据框转换为 pyspark 的数据框。像这样：

> print(df.to_string(style='pyspark'))

| Id|groupId|matchId|assists|
+---+-------+-------+-------+
|  0|     24|      0|      0|
|  1| 440875|      1|      1|
|  2| 878242|      2|      0|

代替：

> print(df.to_string())

  Id  groupId  matchId  assists
0 0        24       0         0
1 1    440875       1         1
2 2    878242       2         0

有人有一个小脚本可以重新格式化吗？

【问题讨论】：

tabulate 包可以非常接近，但我没有看到完全匹配。 【参考方案1】：

DataFrame.to_markdown 通过tabulate 提供了几个table_fmt 选项：

import pandas as pd

df = pd.DataFrame(
    'Id': [0, 1, 2], 
    'groupId': [24, 440875, 878242],
    'matchId': [0, 1, 2],
    'assists': [0, 1, 0]
)

一些类似的选项包括：

print(df.to_markdown(tablefmt="orgtbl", index=False))

|   Id |   groupId |   matchId |   assists |
|------+-----------+-----------+-----------|
|    0 |        24 |         0 |         0 |
|    1 |    440875 |         1 |         1 |
|    2 |    878242 |         2 |         0 |

print(df.to_markdown(tablefmt='pretty', index=False))

+----+---------+---------+---------+
| Id | groupId | matchId | assists |
+----+---------+---------+---------+
| 0  |   24    |    0    |    0    |
| 1  | 440875  |    1    |    1    |
| 2  | 878242  |    2    |    0    |
+----+---------+---------+---------+

print(df.to_markdown(tablefmt='psql', index=False))

+------+-----------+-----------+-----------+
|   Id |   groupId |   matchId |   assists |
|------+-----------+-----------+-----------|
|    0 |        24 |         0 |         0 |
|    1 |    440875 |         1 |         1 |
|    2 |    878242 |         2 |         0 |
+------+-----------+-----------+-----------+

【讨论】：

太棒了，正是我需要的。谢谢亨利国王：D【参考方案2】：

您可以通过tabulate 做到这一点

from tabulate import tabulate
import pandas as pd

df = pd.DataFrame('id' : [1, 2 , 3],
                   'col' : ['a', 'b', 'c'])
print(tabulate(df, headers='keys', tablefmt='psql'))

+----+-----------+-------------+
|    |   id      | col         |
|----+-----------+-------------|
|  0 |    1      | a           |
|  1 |    2      | b           |
|  2 |    3      | c           |
+----+-----------+-------------+

【讨论】：

这是一个很棒的 B 计划，谢谢@droebi

以上是关于有没有办法像 Pyspark 显示数据框一样打印 Pandas 数据框？的主要内容，如果未能解决你的问题，请参考以下文章