熊猫数据框和多行值打印为字符串？

Posted 2023-04-14

技术标签:

【中文标题】熊猫数据框和多行值打印为字符串？【英文标题】：pandas dataframe and multi line values printout as string? 【发布时间】：2020-06-26 10:24:52 【问题描述】：

我想和pandas dataframe and multi line values 一样，除了多列多行文本：

import pandas as pd

data = [
       'id': 1, 'col_one': 'very long text\ntext line 2\ntext line 3', 'col_two': 'very long text\ntext line 4\ntext line 5',
       'id': 2, 'col_one': 'short text', 'col_two': 'very long text\ntext line 6\ntext line 7'
       ]
df = pd.DataFrame(data)
df.set_index('id', inplace=True)
print(df)

这打印为：

                                     col_one                                   col_two
id
1   very long text\ntext line 2\ntext line 3  very long text\ntext line 4\ntext line 5
2                                 short text  very long text\ntext line 6\ntext line 7

...我想要的输出是：

id            col_one          col_two
1      very long text   very long text
       text line 2      text line 4
       text line 3      text line 5
2      short text       very long text
                        text line 6
                        text line 7

但是，那里的两个答案提到了.stack()，这将在id 列中添加我不想要的额外1s； ...实际上，这个：

print(df.col_one.str.split("\n", expand=True).stack())

# prints:
id
1   0    very long text
    1       text line 2
    2       text line 3
2   0        short text
dtype: object

... 可能有点工作（必须以某种方式抑制新行索引的打印输出） - 但它只有一列，我想要整个表。

而且，剩下的答案提到了这一点：

from IPython.display import display, html

def pretty_print(df):
    return display(HTML(df.to_html().replace("\\n","<br>")))

...这似乎可以满足我的要求-但问题是，display 显然是指交互式环境（例如 Jupyter notebook）。但是，我想在 PyQt5 应用程序中使用它；当我尝试上述功能时，我得到了：

<IPython.core.display.HTML object>

...打印在我运行 PyQt5 应用程序的终端中 - 应该包含此文本的 plainTextEdit 什么也没显示。

那么，我该如何做与上述pretty_print 函数相同的操作 - 但得到一个普通的、多行的、格式化的字符串作为输出，我可以在其他地方使用它？

【问题讨论】：

【参考方案1】：

好吧，走得很辛苦，为此编写了一个函数 - 需要注意的是它会丢失索引，因此列标题/名称不会打印在索引标题/名称所在的行上方 - 但很好对我来说足够了，我想。

import pandas as pd

data = [
       'id': 1, 'col_one': 'very long text\ntext line 2\ntext line 3', 'col_two': 'very long text\ntext line 4',
       'id': 2, 'col_one': 'short text', 'col_two': 'very long text\ntext line 6\ntext line 7'
       ]
df = pd.DataFrame(data)
df.set_index('id', inplace=True)

def get_df_multiline_printstring(indf_in):
  broken_dfs = []
  #orig_index_name = indf_in.index.name
  #orig_index_dtype = indf_in.index.dtype
  #print("orig index", orig_index_name, orig_index_dtype)
  indf = indf_in.reset_index() #get back the index column? if so, pd.concat will fail with 'TypeError: object of type 'int' has no len()'; only way is to cast, then
  # iterate all columns
  for icol in range(indf.shape[1]):
    # Select column by index position using iloc[]; note, dtype is 'object' for the string columns here!
    columnSeriesObj = indf.iloc[: , icol]
    #print(icol, columnSeriesObj.name, columnSeriesObj.dtype)
    #columnSeriesObj = columnSeriesObj.astype(object) # cast column does not work
    columnSeriesObj = columnSeriesObj.apply(str) # converting all elements to str does;
    broken_dfs.append( columnSeriesObj.str.split("\n", expand=True).stack() ) # "AttributeError: Can only use .str accessor with string values!" here, if we do not have strings everywhere
  # note: without keys=, column names in the concat become 0, 1
  df_concat = pd.concat( broken_dfs, axis=1, keys=indf.columns )
  # "breaking" the short text will result with NaN's - clear them
  df_concat = df_concat.fillna("")
  # do not print index with index=False
  return df_concat.to_string(index=False)

print( get_df_multiline_printstring(df) )

打印出来：

id         col_one         col_two
 1  very long text  very long text
       text line 2     text line 4
       text line 3
 2      short text  very long text
                       text line 6
                       text line 7

【讨论】：

以上是关于熊猫数据框和多行值打印为字符串？的主要内容，如果未能解决你的问题，请参考以下文章