pandas的dataframe样式设置

Posted bigdata.ministep.cn

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了pandas的dataframe样式设置相关的知识,希望对你有一定的参考价值。

参考: [Essential Techniques to Style Pandas DataFrames | Kaggle](https://www.kaggle.com/code/iamleonie/essential-techniques-to-style-pandas-dataframes)

原始图片样式丢失了,囧

This is the accompanying Kaggle Notebook to my Medium article "Essential Techniques to Style Pandas DataFrames"

 

Image by author

At the end of your data analysis, you need to decide how to communicate your findings. Tables can be more suitable than graphs for communicating data when you need your audience to look up individual precise values and compare them to other values. However, tables contain a lot of information that your audience processes by reading, which makes it difficult for your audience to understand your message right away. Random design of the table, such as too many colors, bold borders, or too much information, can additionally distract your audience. However, purposeful usage of formatting and styling can guide your audience’s attention to the most important number in a table.

DataFrames from the pandas library are great to visualize data as tables in Python. Additionally, the pandas library provides methods to format and style the DataFrame via the style attribute. Therefore, this article discusses essential techniques to format and style pandas DataFrames to effectively communicate data.

For this tutorial, we will be using the following small fictional dataset:

In [1]:
import pandas as pd

df = pd.read_csv("../input/sample-dataset-for-dataframe-styling/sample_dataset.csv")

df
Out[1]:
  A B C D
0 3000 8 2.324234 0.10
1 2500 -1 0.892340 -0.99
2 1200 3 1.239841 -0.23
3 4000 -4 3.923840 0.75
4 1000 -10 0.923840 0.50
5 10000 5 NaN -0.50
 

Global Display Options

Before you get started with customizing the visualizations for individual DataFrames, you can adjust the global display behavior of pandas [1]. Two common tasks you can handle are displaying all columns of a DataFrame and adjusting the width of a DataFrame column.

When your DataFrame has too many columns, pandas does not render all columns but instead omits columns in the middle. To force pandas to display all columns, you can set:

In [2]:
pd.set_option("display.max_columns", None)
 

When you are working with long texts pandas truncates the text in the column. To force pandas to display the column contents by increasing the column width, you can set:

In [3]:
pd.set_option(\'display.max_colwidth\', None)
 

General Tips

The following tips apply to all methods of the styler object.

Multiple Stylings

You can combine multiple stylings by chaining multiple functions together.

E.g. df.style.set_caption(...).format(...).bar(...).set_properties(...)

Column-wise vs. Row-wise Styling

By default, the styling is applied column-wise (axis = 0). If you want to apply the styling row-wise, use axis = 1 in the properties instead.

E.g. df.style.highlight_min(axis = 1)

In [4]:
display(df.style.set_caption("Highlight column-wise maximum with \'axis = 0\'").highlight_max(axis = 0))

display(df.style.set_caption("Highlight row-wise maximum with \'axis = 1\'").highlight_max(axis = 1))
 
Highlight column-wise maximum with \'axis = 0\'
  A B C D
0 3000 8 2.324234 0.100000
1 2500 -1 0.892340 -0.990000
2 1200 3 1.239841 -0.230000
3 4000 -4 3.923840 0.750000
4 1000 -10 0.923840 0.500000
5 10000 5 nan -0.500000
 
Highlight row-wise maximum with \'axis = 1\'
  A B C D
0 3000 8 2.324234 0.100000
1 2500 -1 0.892340 -0.990000
2 1200 3 1.239841 -0.230000
3 4000 -4 3.923840 0.750000
4 1000 -10 0.923840 0.500000
5 10000 5 nan -0.500000
 

Styling Only a Subset

By default, the styling methods are applied to all columns. If you want to apply the stylings only to one column or a selected subset of columns, use the subset parameter as follows:

E.g. df.style.text_gradient(subset = ["A", "D"])

In [5]:
display(df.style.set_caption("Background gradient applied to all columns").background_gradient())

display(df.style.set_caption("Background gradient applied to columns A and D").background_gradient(subset = ["A", "D"]))
 
Background gradient applied to all columns
  A B C D
0 3000 8 2.324234 0.100000
1 2500 -1 0.892340 -0.990000
2 1200 3 1.239841 -0.230000
3 4000 -4 3.923840 0.750000
4 1000 -10 0.923840 0.500000
5 10000 5 nan -0.500000
 
Background gradient applied to columns A and D
  A B C D
0 3000 8 2.324234 0.100000
1 2500 -1 0.892340 -0.990000
2 1200 3 1.239841 -0.230000
3 4000 -4 3.923840 0.750000
4 1000 -10 0.923840 0.500000
5 10000 5 nan -0.500000
 

Formatting

Before we begin with any specific coloring, let’s have a look at some fundamental formatting techniques to make your DataFrame look more polished.

Caption

Adding captions to a table is almost always required. You can add the caption to the DataFrame with this method.

In [6]:
df.style.set_caption("Caption Text")
Out[6]:
Caption Text
  A B C D
0 3000 8 2.324234 0.100000
1 2500 -1 0.892340 -0.990000
2 1200 3 1.239841 -0.230000
3 4000 -4 3.923840 0.750000
4 1000 -10 0.923840 0.500000
5 10000 5 nan -0.500000
 

Renaming Columns

Sometimes, the column names are variable names or abbreviated and therefore not intuitive for the audience. Similarly to adding meaningful axis labels to a plot, renaming the column names to a more intuitive version can be helpful for your audience.

If you need to work with the DataFrame later on, it might make sense to create a copy of the DataFrame for visualization purposes only.

There are two options to rename your columns:

A. You can rename all columns at once:

In [7]:
#Create a copy of the DataFrame for visualization purposes
df_viz = df.copy()

# Rename all columns
df_viz.columns = ["New Column Name A", "New Column Name  B", "New Column Name  C", "New Column Name  D"]

df_viz
Out[7]:
  New Column Name A New Column Name B New Column Name C New Column Name D
0 3000 8 2.324234 0.10
1 2500 -1 0.892340 -0.99
2 1200 3 1.239841 -0.23
3 4000 -4 3.923840 0.75
4 1000 -10 0.923840 0.50
5 10000 5 NaN -0.50
 

B. Or you can rename only a subset of columns:

In [8]:
#Create a copy of the DataFrame for visualization purposes
df_viz = df.copy()

# Rename selection of columns
df_viz.rename(columns = "A": "New Column Name A", "B": "New Column Name B", inplace=True)

df_viz
Out[8]:
  New Column Name A New Column Name B C D
0 3000 8 2.324234 0.10
1 2500 -1 0.892340 -0.99
2 1200 3 1.239841 -0.23
3 4000 -4 3.923840 0.75
4 1000 -10 0.923840 0.50
5 10000 5 NaN -0.50
 

Hiding the Index

You can hide the index with the following method if it does not add any value.

In [9]:
df.style.hide_index()
Out[9]:
A B C D
3000 8 2.324234 0.100000
2500 -1 0.892340 -0.990000
1200 3 1.239841 -0.230000
4000 -4 3.923840 0.750000
1000 -10 0.923840 0.500000
10000 5 nan -0.500000
 

Format Columns

Adding thousands-separators or truncating the floating-point numbers to fewer decimal places can increase the readability of your DataFrame. For this purpose, the Styler object can distinguish the display values from the actual values. By using the .format() method you can manipulate the display values according to a format spec string [3].

You could even add a unit before or after the number as part of the formatting. However, to not disturb the attention, I would recommend putting the unit in square brackets in the column name (see "Renaming columns"). For example, "Salary [$]".

In [10]:
df.style.format("A" : ":,.0f",
                 "B" : ":d $",
                 "C" : ":.3f",
                 "D" : ":.2f")
Out[10]:
  A B C D
0 3,000 8 $ 2.324 0.10
1 2,500 -1 $ 0.892 -0.99
2 1,200 3 $ 1.240 -0.23
3 4,000 -4 $ 3.924 0.75
4 1,000 -10 $ 0.924 0.50
5 10,000 5 $ nan -0.50
 

Styling Properties

Sometimes, all you want to do might be to highlight a single column of the DataFrame by adjusting the background and font color. For this purpose, you can use the .set_properties() method to adjust some CSS properties of a DataFrame such as colors, fonts, borders, etc.

In [11]:
df.style.set_properties(subset = ["C"],
                        **"background-color": "lightblue",  
                           "color" : "white",
                           "border" : "0.5px solid white")
Out[11]:
  A B C D
0 3000 8 2.324234 0.100000
1 2500 -1 0.892340 -0.990000
2 1200 3 1.239841 -0.230000
3 4000 -4 3.923840 0.750000
4 1000 -10 0.923840 0.500000
5 10000 5 nan -0.500000
 

Built-in Styling

The Style class has some built-in methods for common styling tasks.

Highlighting

Highlighting individual cells is an easy way to guide your audience’s attention to what you want to show. Common values you might want to highlight are minimum, maximum, and null values. For these cases, you can use the respective built-in methods.

You can adjust the highlight color with the parameter color for minimum and maximum highlighting and nullcolor for null highlighting.

In [12]:
df.style.highlight_null(null_color = "yellow")
Out[12]:
  A B C D
0 3000 8 2.324234 0.100000
1 2500 -1 0.892340 -0.990000
2 1200 3 1.239841 -0.230000
3 4000 -4 3.923840 0.750000
4 1000 -10 0.923840 0.500000
5 10000 5 nan -0.500000
 

If you want to highlight both minimum and maximum values, you can do so by chaining both functions together.

In [13]:
df.style.highlight_min(color = "red").highlight_max(color = "green")
Out[13]:
  A B C D
0 3000 8 2.324234 0.100000
1 2500 -1 0.892340 -0.990000
2 1200 3 1.239841 -0.230000
3 4000 -4 3.923840 0.750000
4 1000 -10 0.923840 0.500000
5 10000 5 nan -0.500000
 

Gradients

Adding gradient styles can help the audience understand the relationship of the numerical values within a column or a row. For example, gradients can indicate whether a value is large or small, positive or negative, or even good or bad.

There are also two techniques to add gradients to the DataFrame:

A. You can apply gradient styles either to the text [2]

In [14]:
df.style.text_gradient(subset = ["D"], 
                       cmap = "RdYlGn", 
                       vmin = -1, 
                       vmax = 1)
Out[14]:
  A B C D
0 3000 8 2.324234 0.100000
1 2500 -1 0.892340 -0.990000
2 1200 3 1.239841 -0.230000
3 4000 -4 3.923840 0.750000
4 1000 -10 0.923840 0.500000
5 10000 5 nan -0.500000
 

B. You can apply gradient styles either to the background [2].

In [15]:
df.style.background_gradient(subset = ["D"], 
                             cmap = "RdYlGn", 
                             vmin = -1, 
                             vmax = 1)
Out[15]:
  A B C D
0 3000 8 2.324234 0.100000
1 2500 -1 0.892340 -0.990000
2 1200 3 1.239841 -0.230000
3 4000 -4 3.923840 0.750000
4 1000 -10 0.923840 0.500000
5 10000 5 nan -0.500000
 

With the cmap parameter and vmin and vmax you can set the properties of the gradient.

In [16]:
df.style.background_gradient(subset = ["D"], 
                             cmap = "coolwarm", 
                             vmin = -1, 
                             vmax = 1)
Out[16]:
  A B C D
0 3000 8 2.324234 0.100000
1 2500 -1 0.892340 -0.990000
2 1200 3 1.239841 -0.230000
3 4000 -4 3.923840 0.750000
4 1000 -10 0.923840 0.500000
5 10000 5 nan -0.500000
 

Bars

Another way of visualizing the relationship and order within a column or a row is to draw bars in the cell’s background [2].

Again, there are two essential techniques to utilize bars in your DataFrames:

A. The straightforward application is to use a standard uni-colored bar:

In [17]:
df.style.bar(subset = ["A"], color = "lightblue", vmin = 0)
Out[17]:
  A B C D
0 3000 8 2.324234 0.100000
1 2500 -1 0.892340 -0.990000
2 1200 3 1.239841 -0.230000
3 4000 -4 3.923840 0.750000
4 1000 -10 0.923840 0.500000
5 10000 5 nan -0.500000
 

B. You can also create bi-colored bar charts by setting a mid value and colors for the negative and positive values. When using this method, I recommend combining it with some borders to make it clearer.

In [18]:
df.style.bar(subset = ["D"], 
             align = "mid", 
             color = ["salmon", "lightgreen"])\\
         .set_properties(**\'border\': \'0.5px solid black\')
Out[18]:
  A B C D
0 3000 8 2.324234 0.100000
1 2500 -1 0.892340 -0.990000
2 1200 3 1.239841 -0.230000
3 4000 -4 3.923840 0.750000
4 1000 -10 0.923840 0.500000
5 10000 5 nan -0.500000
 

Custom Styling

If the built-in styling methods are not sufficient for your needs, you can write your own styling function and apply it to the DataFrame. You can either apply styling element-wise with the .applymap() method or column- or row-wise with the .apply() method [2].

A popular example of this is to display negative values of a DataFrame in red color as shown below.

In [19]:
def custom_styling(val):
    color = "red" if val < 0 else "black"
    return f"color: color"

df.style.applymap(custom_styling)
Out[19]:
  A B C D

pandas之表格样式

 

在juoyter notebook中直接通过df输出DataFrame时,显示的样式为表格样式,通过sytle可对表格的样式做一些定制,类似excel的条件格式。

df = pd.DataFrame(np.random.rand(5,4),columns=[‘A‘,‘B‘,‘C‘,‘D‘])
s = df.style
print(s,type(s))
#<pandas.io.formats.style.Styler object at 0x000001CD7B409710> <class ‘pandas.io.formats.style.Styler‘>

 

对表格创建样式有两种方式,都需要额外定义一个处理样式的函数

①df.style.applymap(func,*args,**kwargs):对DataFrame中的每一个元素都按照func的逻辑处理

技术图片
# 将小于0.2的值字体设置为红色,否则设置为黑色
df = pd.DataFrame(np.random.rand(5,4),columns=[‘A‘,‘B‘,‘C‘,‘D‘])
def lt_red(val):
    if val<0.2:
        color = ‘red‘
    else:
        color = ‘black‘
#     print(color)
    return (‘color:%s‘%color)
df.style.applymap(lt_red)
技术图片

②df.style.apply(func,axis=0,subset=**,*args,**kwargs):对DataFrame的行或列按照func的逻辑处理,axis默认为0按照列处理,1按照行处理。

技术图片
# 将A、C、D列中的每一列最大值背景颜色填充为黄色
def highlight_max(s):
    is_max = s == s.max()
    l = []
    for v in is_max:
        if v:
            l.append(‘ padding: 0px; line-height: 1.8; color: rgb(128, 0, 0);">‘)
        else:
            l.append(‘‘)
#     print(l)
    return l
df.style.apply(highlight_max,axis = 1,subset = [‘A‘,‘C‘,‘D‘])
技术图片

技术图片       技术图片

 

如果在style中需要同时进行行和列的切片,需要用到pandas的IndexSlice

# 对索引为2-5行,列为A、C、D中的每一列最大值背景颜色填充为黄色
df.style.apply(highlight_max,axis=1,subset = pd.IndexSlice[2:5,[‘A‘,‘B‘,‘C‘]])
## df.loc[2:5,[‘A‘,‘C‘]].style.apply(highlight_max,axis=1)也可以实现
## 上一种方法会显示所有的DataFrame内容,然后对满足条件的行和列做格式处理;而后一种方法是只显示满足条件的行和列,再做格式处理

技术图片    技术图片

 

 格式化DataFrame中的数值

df = pd.DataFrame(np.random.rand(5,4),columns=[‘A‘,‘B‘,‘C‘,‘D‘])
# df.style.format(‘{:.2%}‘,subset=[‘B‘,‘C‘])  #对所有符合条件的采用一种格式format,整个格式用‘‘括起来
df.style.format({‘A‘:‘{:.2f}‘,‘B‘:‘{:%}‘,‘C‘:‘{:+}‘,‘D‘:‘{:.2%}‘}) #对不同的列采用不同的format,参数为一个字典,key为列名,value为格式
# A、B、C、D列的格式分别为2位小数、百分数、前面加+号,2位小数的百分数

技术图片

 

定位空值df.style.highlight_null(null_color=‘red‘),对空值设置背景颜色

对应还有highlight_max()和highlight_min(),参数(subset=None, color=‘yellow‘, axis=0)

df = pd.DataFrame(np.random.rand(5,4),columns=[‘A‘,‘B‘,‘C‘,‘D‘])
df[‘B‘][2] = np.nan
df.style.highlight_null(null_color=‘red‘)

技术图片

 

色彩映射

df = pd.DataFrame(np.random.rand(5,4),columns=[‘A‘,‘B‘,‘C‘,‘D‘])
df.style.background_gradient(cmap=‘Reds‘,axis = 1,low = 0,high = 1,subset = [‘A‘,‘C‘,‘D‘])
# 按行处理,最小值对应颜色表中的最浅色,最大值对应颜色表中的最深色,1表示按行处理

技术图片

 

 条形图

df = pd.DataFrame(np.random.rand(5,4),columns=[‘A‘,‘B‘,‘C‘,‘D‘])
df.style.bar(width=100,subset=[‘A‘,‘C‘,‘D‘],color=‘lightpink‘)

技术图片

 

分段式构建样式

df.style.    bar(width=100,subset=[‘A‘],color=‘lightpink‘).    highlight_max(axis = 1,color=‘red‘).    highlight_min(axis = 1,color=‘green‘)
#除最后一行,每一行都以.结尾

技术图片

 (转)

https://www.cnblogs.com/Forever77/p/11336981.html

# 设置宽度
pd.set_option(‘display.width‘,100)
# 设置精确度
pd.set_option(‘precision‘,4)
# 设置显示所有列
pd.set_option(‘display.max_columns‘,None)
# 设置显示所有行
pd.set_option(‘display.max_rows‘,None)
参考:https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.set_option.html

 

以上是关于pandas的dataframe样式设置的主要内容,如果未能解决你的问题,请参考以下文章

使用 Pandas DataFrame 样式为列着色(Python 3)

改进 IPython 小部件中 Pandas DataFrames 的 HTML 样式

pandas之表格样式

pandas之表格样式

用PyQt5来即时显示pandas Dataframe的数据,附qdarkstyle黑夜主题样式(美美哒的黑夜主题)

改变 pandas.DataFrame 的风格:永久?

(c)2006-2024 SYSTEM All Rights Reserved IT常识