熊猫数据框和多行值打印为字符串?
Posted
技术标签:
【中文标题】熊猫数据框和多行值打印为字符串?【英文标题】:pandas dataframe and multi line values printout as string? 【发布时间】:2020-06-26 10:24:52 【问题描述】:我想和pandas dataframe and multi line values 一样,除了多列多行文本:
import pandas as pd
data = [
'id': 1, 'col_one': 'very long text\ntext line 2\ntext line 3', 'col_two': 'very long text\ntext line 4\ntext line 5',
'id': 2, 'col_one': 'short text', 'col_two': 'very long text\ntext line 6\ntext line 7'
]
df = pd.DataFrame(data)
df.set_index('id', inplace=True)
print(df)
这打印为:
col_one col_two
id
1 very long text\ntext line 2\ntext line 3 very long text\ntext line 4\ntext line 5
2 short text very long text\ntext line 6\ntext line 7
...我想要的输出是:
id col_one col_two
1 very long text very long text
text line 2 text line 4
text line 3 text line 5
2 short text very long text
text line 6
text line 7
但是,那里的两个答案提到了.stack()
,这将在id
列中添加我不想要的额外1
s; ...实际上,这个:
print(df.col_one.str.split("\n", expand=True).stack())
# prints:
id
1 0 very long text
1 text line 2
2 text line 3
2 0 short text
dtype: object
... 可能有点工作(必须以某种方式抑制新行索引的打印输出) - 但它只有一列,我想要整个表。
而且,剩下的答案提到了这一点:
from IPython.display import display, html
def pretty_print(df):
return display(HTML(df.to_html().replace("\\n","<br>")))
...这似乎可以满足我的要求-但问题是,display
显然是指交互式环境(例如 Jupyter notebook)。但是,我想在 PyQt5 应用程序中使用它;当我尝试上述功能时,我得到了:
<IPython.core.display.HTML object>
...打印在我运行 PyQt5 应用程序的终端中 - 应该包含此文本的 plainTextEdit 什么也没显示。
那么,我该如何做与上述pretty_print
函数相同的操作 - 但得到一个普通的、多行的、格式化的字符串作为输出,我可以在其他地方使用它?
【问题讨论】:
【参考方案1】:好吧,走得很辛苦,为此编写了一个函数 - 需要注意的是它会丢失索引,因此列标题/名称不会打印在索引标题/名称所在的行上方 - 但很好对我来说足够了,我想。
import pandas as pd
data = [
'id': 1, 'col_one': 'very long text\ntext line 2\ntext line 3', 'col_two': 'very long text\ntext line 4',
'id': 2, 'col_one': 'short text', 'col_two': 'very long text\ntext line 6\ntext line 7'
]
df = pd.DataFrame(data)
df.set_index('id', inplace=True)
def get_df_multiline_printstring(indf_in):
broken_dfs = []
#orig_index_name = indf_in.index.name
#orig_index_dtype = indf_in.index.dtype
#print("orig index", orig_index_name, orig_index_dtype)
indf = indf_in.reset_index() #get back the index column? if so, pd.concat will fail with 'TypeError: object of type 'int' has no len()'; only way is to cast, then
# iterate all columns
for icol in range(indf.shape[1]):
# Select column by index position using iloc[]; note, dtype is 'object' for the string columns here!
columnSeriesObj = indf.iloc[: , icol]
#print(icol, columnSeriesObj.name, columnSeriesObj.dtype)
#columnSeriesObj = columnSeriesObj.astype(object) # cast column does not work
columnSeriesObj = columnSeriesObj.apply(str) # converting all elements to str does;
broken_dfs.append( columnSeriesObj.str.split("\n", expand=True).stack() ) # "AttributeError: Can only use .str accessor with string values!" here, if we do not have strings everywhere
# note: without keys=, column names in the concat become 0, 1
df_concat = pd.concat( broken_dfs, axis=1, keys=indf.columns )
# "breaking" the short text will result with NaN's - clear them
df_concat = df_concat.fillna("")
# do not print index with index=False
return df_concat.to_string(index=False)
print( get_df_multiline_printstring(df) )
打印出来:
id col_one col_two
1 very long text very long text
text line 2 text line 4
text line 3
2 short text very long text
text line 6
text line 7
【讨论】:
以上是关于熊猫数据框和多行值打印为字符串?的主要内容,如果未能解决你的问题,请参考以下文章