如何在 IPython 笔记本的 pandas DataFrame 列中左对齐文本

Posted 2023-03-12

技术标签:

【中文标题】如何在 IPython 笔记本的 pandas DataFrame 列中左对齐文本【英文标题】：How can I left justify text in a pandas DataFrame column in an IPython notebook 【发布时间】：2014-11-04 18:54:57 【问题描述】：

我正在尝试在 IPython 笔记本中格式化输出。我尝试使用 to_string 函数，这巧妙地让我消除了索引列。但是文本数据是正确的。

在[10]中：

import pandas as pd
columns = ['Text', 'Value']
a = pd.DataFrame ('Text': ['abcdef', 'x'], 'Value': [12.34, 4.2])
print (a.to_string (index=False))

   Text  Value
 abcdef  12.34
      x   4.20

仅打印数据帧时也是如此。

在 [12] 中：

print (a)

     Text  Value
0  abcdef  12.34
1       x   4.20

令人惊讶的是，to_string 函数中的 justify 参数只证明了列标题。

在 [13] 中：

import pandas as pd
columns = ['Text', 'Value']
a = pd.DataFrame ('Text': ['abcdef', 'x'], 'Value': [12.34, 4.2])
print (a.to_string (justify='left', index=False))
Text     Value
 abcdef  12.34
      x   4.20

如何控制各个列的对齐设置？

【问题讨论】：

附带说明：这是not currently supported 用于数据帧的 html 渲染。 【参考方案1】：

如果您愿意使用其他库，tabulate 会这样做 -

$ pip install tabulate

然后

from tabulate import tabulate
df = pd.DataFrame ('Text': ['abcdef', 'x'], 'Value': [12.34, 4.2])
print(tabulate(df, showindex=False, headers=df.columns))

Text      Value
------  -------
abcdef    12.34
x          4.2

它还具有各种其他输出格式。

【讨论】：

【参考方案2】：

您可以使用a['Text'].str.len().max() 计算a['Text'] 中最长字符串的长度，并在左对齐格式器':<Ns'.format 中使用该数字N：

In [211]: print(a.to_string(formatters='Text':':<s'.format(a['Text'].str.len().max()).format, index=False))
   Text  Value
 abcdef  12.34
 x        4.20

【讨论】：

这非常接近我想要的。它左对齐该列的行中的数据，但使列标题“突出”一个字符，至少在这种情况下，当我还使用“justify”选项时。这就是我想要的——谢谢。但这仍然很冗长。我认为应该有一个更简单的方法。谢谢，它正在工作，但它似乎仍然在开头附加了一个额外的空间。【参考方案3】：

我喜欢@unutbu 的回答（不需要任何额外的依赖项）。 @JS. 的添加是朝着（朝着可重复使用的方向）迈出的一步。

由于格式化器字典的构造是困难的部分，让我们创建一个函数，从 DataFrame 和一个可选的列列表创建格式化器字典。

def make_lalign_formatter(df, cols=None):
    """
    Construct formatter dict to left-align columns.

    Parameters
    ----------
    df : pandas.core.frame.DataFrame
        The DataFrame to format
    cols : None or iterable of strings, optional
        The columns of df to left-align. The default, cols=None, will
        left-align all the columns of dtype object

    Returns
    -------
    dict
        Formatter dictionary

    """
    if cols is None:
       cols = df.columns[df.dtypes == 'object'] 

    return col: f':<df[col].str.len().max()s'.format for col in cols

让我们创建一些示例数据来演示如何使用此函数：

import pandas as pd

# Make some data
data = 'First': ['Tom', 'Dick', 'Harry'],
        'Last': ['Thumb', 'Whittington', 'Potter'],
        'Age': [183, 667, 23]

# Make into a DataFrame
df = pd.DataFrame(data)

在我们的DataFrame中对齐所有类型对象的列：

# Left align all columns
print(df.to_string(formatters=make_lalign_formatter(df), 
                   index=False,
                   justify='left'))

仅对齐'First' 列：

# Left align 'First' column
print(df.to_string(formatters=make_lalign_formatter(df, cols=['First']), 
                   index=False,
                   justify='left'))

【讨论】：

不错的功能，但我发现它更方便且可重用，我发现只有一件事是列名仍然是outdented。谢谢@kulfi。是的，我也刚刚注意到这一点。它在使用 justify='left' 参数。我不确定是否有解决此问题的方法。我还没找到。【参考方案4】：

这适用于 Python 3.7（functools 现在是该版本的一部分）

# pylint: disable=C0103,C0200,R0205
from __future__ import print_function
import pandas as pd
import functools

@staticmethod
def displayDataFrame(dataframe, displayNumRows=True, displayIndex=True, leftJustify=True):
    # type: (pd.DataFrame, bool, bool, bool) -> None
    """
    :param dataframe: pandas DataFrame
    :param displayNumRows: If True, show the number or rows in the output.
    :param displayIndex: If True, then show the indexes
    :param leftJustify: If True, then use technique to format columns left justified.
    :return: None
    """

    if leftJustify:
        formatters = 

        for columnName in list(dataframe.columns):
            columnType = type(columnName)  # The magic!!
            # print(" =>  ".format(columnName, columnType))
            if columnType == type(bool):
                form = "!s:<8".format()
            elif columnType == type(float):
                form = "!s:<5".format()
            else:
                max = dataframe[columnName].str.len().max()
                form = ":<s".format(max)

            formatters[columnName] = functools.partial(str.format, form)

        print(dataframe.to_string(index=displayIndex, formatters=formatters), end="\n\n")
    else:
        print(dataframe.to_string(index=displayIndex), end="\n\n")

    if displayNumRows:
        print("Num Rows: ".format(len(dataframe)), end="\n\n")

【讨论】：

【参考方案5】：

我将@unutbu 的方法转换为一个函数，以便我可以左对齐我的数据框。

my_df = pd.DataFrame('StringVals': ["Text string One", "Text string Two", "Text string Three"])

def left_justified(df):
    formatters = 
    for li in list(df.columns):
        max = df[li].str.len().max()
        form = ":<s".format(max)
        formatters[li] = functools.partial(str.format, form)
    return df.to_string(formatters=formatters, index=False)

所以现在这样：

print(my_df.to_string())

          StringVals
0    Text string One
1    Text string Two
2  Text string Three

变成这样：

print(left_justified(my_df))

StringVals
Text string One  
Text string Two  
Text string Three

但是，请注意，数据框中的任何非字符串值都会给您错误：

AttributeError: Can only use .str accessor with string values, which use np.object_ dtype in pandas

如果您希望它使用非字符串值，您必须将不同的格式字符串传递给.to_string()：

my_df2 = pd.DataFrame('Booleans'  : [False, True, True],
                       'Floats'    : [1.0, 0.4, 1.5],           
                       'StringVals': ["Text string One", "Text string Two", "Text string Three"])

FLOAT_COLUMNS = ('Floats',)
BOOLEAN_COLUMNS = ('Booleans',)

def left_justified2(df):
    formatters = 

    # Pass a custom pattern to format(), based on
    # type of data
    for li in list(df.columns):
        if li in FLOAT_COLUMNS:
           form = "!s:<5".format()
        elif li in BOOLEAN_COLUMNS:
            form = "!s:<8".format()
        else:
            max = df[li].str.len().max()
            form = ":<s".format(max)
        formatters[li] = functools.partial(str.format, form)
    return df.to_string(formatters=formatters, index=False)

使用浮点数和布尔值：

print(left_justified2(my_df2))

Booleans Floats         StringVals
False     1.0    Text string One  
True      0.4    Text string Two  
True      1.5    Text string Three

请注意，这种方法有点小技巧。您不仅必须在单独的列表中维护列名，而且还必须对数据宽度进行最佳猜测。也许有更好的 Pandas-Fu 的人可以演示如何自动解析数据帧信息以自动生成格式。

【讨论】：

以上是关于如何在 IPython 笔记本的 pandas DataFrame 列中左对齐文本的主要内容，如果未能解决你的问题，请参考以下文章