pandas的dataframe样式设置
Posted bigdata.ministep.cn
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了pandas的dataframe样式设置相关的知识,希望对你有一定的参考价值。
参考: [Essential Techniques to Style Pandas DataFrames | Kaggle](https://www.kaggle.com/code/iamleonie/essential-techniques-to-style-pandas-dataframes)
原始图片样式丢失了,囧
Image by author
At the end of your data analysis, you need to decide how to communicate your findings. Tables can be more suitable than graphs for communicating data when you need your audience to look up individual precise values and compare them to other values. However, tables contain a lot of information that your audience processes by reading, which makes it difficult for your audience to understand your message right away. Random design of the table, such as too many colors, bold borders, or too much information, can additionally distract your audience. However, purposeful usage of formatting and styling can guide your audience’s attention to the most important number in a table.
DataFrames from the pandas library are great to visualize data as tables in Python. Additionally, the pandas library provides methods to format and style the DataFrame via the style attribute. Therefore, this article discusses essential techniques to format and style pandas DataFrames to effectively communicate data.
For this tutorial, we will be using the following small fictional dataset:
import pandas as pd
df = pd.read_csv("../input/sample-dataset-for-dataframe-styling/sample_dataset.csv")
df
Global Display Options
Before you get started with customizing the visualizations for individual DataFrames, you can adjust the global display behavior of pandas [1]. Two common tasks you can handle are displaying all columns of a DataFrame and adjusting the width of a DataFrame column.
When your DataFrame has too many columns, pandas does not render all columns but instead omits columns in the middle. To force pandas to display all columns, you can set:
pd.set_option("display.max_columns", None)
When you are working with long texts pandas truncates the text in the column. To force pandas to display the column contents by increasing the column width, you can set:
pd.set_option(\'display.max_colwidth\', None)
General Tips
The following tips apply to all methods of the styler object.
Multiple Stylings
You can combine multiple stylings by chaining multiple functions together.
E.g. df.style.set_caption(...).format(...).bar(...).set_properties(...)
Column-wise vs. Row-wise Styling
By default, the styling is applied column-wise (axis = 0
). If you want to apply the styling row-wise, use axis = 1
in the properties instead.
E.g. df.style.highlight_min(axis = 1)
display(df.style.set_caption("Highlight column-wise maximum with \'axis = 0\'").highlight_max(axis = 0))
display(df.style.set_caption("Highlight row-wise maximum with \'axis = 1\'").highlight_max(axis = 1))
Styling Only a Subset
By default, the styling methods are applied to all columns. If you want to apply the stylings only to one column or a selected subset of columns, use the subset
parameter as follows:
E.g. df.style.text_gradient(subset = ["A", "D"])
display(df.style.set_caption("Background gradient applied to all columns").background_gradient())
display(df.style.set_caption("Background gradient applied to columns A and D").background_gradient(subset = ["A", "D"]))
Formatting
Before we begin with any specific coloring, let’s have a look at some fundamental formatting techniques to make your DataFrame look more polished.
Caption
Adding captions to a table is almost always required. You can add the caption to the DataFrame with this method.
df.style.set_caption("Caption Text")
Renaming Columns
Sometimes, the column names are variable names or abbreviated and therefore not intuitive for the audience. Similarly to adding meaningful axis labels to a plot, renaming the column names to a more intuitive version can be helpful for your audience.
If you need to work with the DataFrame later on, it might make sense to create a copy of the DataFrame for visualization purposes only.
There are two options to rename your columns:
A. You can rename all columns at once:
#Create a copy of the DataFrame for visualization purposes
df_viz = df.copy()
# Rename all columns
df_viz.columns = ["New Column Name A", "New Column Name B", "New Column Name C", "New Column Name D"]
df_viz
B. Or you can rename only a subset of columns:
#Create a copy of the DataFrame for visualization purposes
df_viz = df.copy()
# Rename selection of columns
df_viz.rename(columns = "A": "New Column Name A", "B": "New Column Name B", inplace=True)
df_viz
Hiding the Index
You can hide the index with the following method if it does not add any value.
df.style.hide_index()
Format Columns
Adding thousands-separators or truncating the floating-point numbers to fewer decimal places can increase the readability of your DataFrame. For this purpose, the Styler object can distinguish the display values from the actual values. By using the .format()
method you can manipulate the display values according to a format spec string [3].
You could even add a unit before or after the number as part of the formatting. However, to not disturb the attention, I would recommend putting the unit in square brackets in the column name (see "Renaming columns"). For example, "Salary [$]".
df.style.format("A" : ":,.0f",
"B" : ":d $",
"C" : ":.3f",
"D" : ":.2f")
Styling Properties
Sometimes, all you want to do might be to highlight a single column of the DataFrame by adjusting the background and font color. For this purpose, you can use the .set_properties()
method to adjust some CSS properties of a DataFrame such as colors, fonts, borders, etc.
df.style.set_properties(subset = ["C"],
**"background-color": "lightblue",
"color" : "white",
"border" : "0.5px solid white")
Built-in Styling
The Style class has some built-in methods for common styling tasks.
Highlighting
Highlighting individual cells is an easy way to guide your audience’s attention to what you want to show. Common values you might want to highlight are minimum, maximum, and null values. For these cases, you can use the respective built-in methods.
You can adjust the highlight color with the parameter color
for minimum and maximum highlighting and nullcolor
for null highlighting.
df.style.highlight_null(null_color = "yellow")
If you want to highlight both minimum and maximum values, you can do so by chaining both functions together.
df.style.highlight_min(color = "red").highlight_max(color = "green")
Gradients
Adding gradient styles can help the audience understand the relationship of the numerical values within a column or a row. For example, gradients can indicate whether a value is large or small, positive or negative, or even good or bad.
There are also two techniques to add gradients to the DataFrame:
A. You can apply gradient styles either to the text [2]
df.style.text_gradient(subset = ["D"],
cmap = "RdYlGn",
vmin = -1,
vmax = 1)
B. You can apply gradient styles either to the background [2].
df.style.background_gradient(subset = ["D"],
cmap = "RdYlGn",
vmin = -1,
vmax = 1)
With the cmap
parameter and vmin
and vmax
you can set the properties of the gradient.
df.style.background_gradient(subset = ["D"],
cmap = "coolwarm",
vmin = -1,
vmax = 1)
Bars
Another way of visualizing the relationship and order within a column or a row is to draw bars in the cell’s background [2].
Again, there are two essential techniques to utilize bars in your DataFrames:
A. The straightforward application is to use a standard uni-colored bar:
df.style.bar(subset = ["A"], color = "lightblue", vmin = 0)
B. You can also create bi-colored bar charts by setting a mid
value and colors for the negative and positive values. When using this method, I recommend combining it with some borders to make it clearer.
df.style.bar(subset = ["D"],
align = "mid",
color = ["salmon", "lightgreen"])\\
.set_properties(**\'border\': \'0.5px solid black\')
Custom Styling
If the built-in styling methods are not sufficient for your needs, you can write your own styling function and apply it to the DataFrame. You can either apply styling element-wise with the .applymap()
method or column- or row-wise with the .apply()
method [2].
A popular example of this is to display negative values of a DataFrame in red color as shown below.
def custom_styling(val):
color = "red" if val < 0 else "black"
return f"color: color"
df.style.applymap(custom_styling)