标题
Posted 败家先森
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了标题相关的知识,希望对你有一定的参考价值。
pandas:powerful Python data analysis tookit
——Wes McKinney & PyData Development Team,Release 0.18.0,March 17, 2016
Customarily,we import as follows:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
1. Object Creation
1.1 Series
Series is a one-dimensional labeled array capable of holding any data type(integers,strings,floating point numbers,Python objects,etc.).The axis labels are collectively referred to as the index(轴标签统称为索引).The basic method to create a Series is to call:
>>> s = pd.Series(data,index = index)
Here,data can be many different things:
- a Python dict
- an ndarray
- a scalar value
The passed index is a list of axis labels.Thus,this separates into a few cases depending on what data is:
From ndarray
If data is an ndarray,index must be the same length as data.If no index is passed,one will be created havig values [0,1,3,…,len(data)-1].
>>> s = pd.Series(np.random.randn(5),index = ['a','b','c','d','e'])
>>> s
a -0.159223
b 2.317106
c -0.341460
d -1.499552
e 0.400351
dtype: float64
>>> s.index
Index([u'a', u'b', u'c', u'd', u'e'], dtype='object')
>>> pd.Series(np.random.randn(5))
0 0.785536
1 -1.014011
2 -0.120812
3 0.289870
4 0.705393
dtype: float64
From dict
If data is a dict,if index is passed the values in data corresponding to the labels in the index will be pulled out.Otherwise,an index will be constructed from the sorted keys of the dict,if possible.
>>> d = 'a':0.,'b':1.,'c':2.
>>> pd.Series(d)
a 0
b 1
c 2
dtype: float64
>>> pd.Series(d,index = list('bcda'))
b 1
c 2
d NaN
a 0
dtype: float64
**Note:**NaN(not a number) is the standard missing data marker used in pandas.
From scalar value
If data is a scalar value,an index must be provided.The value will be repeated to match the length of index.
>>> pd.Series(5.,index = ['a','b','c','d','e'])
a 5
b 5
c 5
d 5
e 5
dtype: float64
1.1.1 Series is ndarray-like
Series acts very similarly to a ndarray,and is a valid argument to most numpy functions.However,things like slicing also slice the index.
>>> s[0]
-0.15922308848832653
>>> s[:3]
a -0.159223
b 2.317106
c -0.341460
dtype: float64
>>> s[s > s.median()]
e 0.400351
b 2.317106
dtype: float64
>>> s[[4,3,2]]
b 2.317106
e 0.400351
a -0.159223
dtype: float64
>>> np.exp(s)
d 0.223230
c 0.710732
a 0.852806
e 1.492348
b 10.146272
dtype: float64
1.1.2 Series is dict-like
>>> s['a']
-0.15922308848832653
>>> s['e'] = 12
>>> s
d -1.499552
c -0.341460
a -0.159223
e 12.000000
b 2.317106
dtype: float64
>>> 'e' in s
True
>>> 'f' in s
False
>>> s['f']
KeyError: 'f'
Using the get method,a missing label will return None or specified default:
>>> s.get('f')
>>> s.get('f',np.nan)
nan
>>> s.get('e',np.nan)
12.0
1.1.3 Vectorized operations and label alignment with Series
When doing data analysis,as with raw numpy arrays looping through Series value-by-value is usually not necessary.Series can be also be passed into most numpy methods expecting an ndarray.
>>> s+s
d -2.999105
c -0.682919
a -0.318446
e 24.000000
b 4.634213
dtype: float64
>>> s * 2
>>> np.exp(s)
A key different between Series and ndarray is that operations betwee Series automatically align the data based on label.Thus,you can write computations without giving consideration to whether the Series involved have the same labels.
>>> s[1:] + s[:-1]
a -0.318446
b NaN
c -0.682919
d NaN
e 24.000000
dtype: float64
Creating a Series by passing a list of values,letting pandas create a default integer index:
>>> s = pd.Series([1,2,3,np.nan,6,8])
0 1
1 3
2 5
3 NaN
4 6
5 8
dtype: float64
1.2 DataFrame
Creating a DataFrame by passing a numpy array,with a datetime index and labeled columns:
>>> dates = pd.date_range('20130101',periods=6)
>>> dates
<class 'pandas.tseries.index.DatetimeIndex'>
[2013-01-01, ..., 2013-01-06]
Length: 6, Freq: D, Timezone: None
>>> df = pd.DataFrame(np.random.randn(6,4),index=dates,columns=list('ABCD'))
>>> df
A B C D
2013-01-01 0.716330 -1.782610 0.809990 0.319876
2013-01-02 -0.171806 -0.526268 0.206743 -1.246213
2013-01-03 -1.774970 1.890517 -0.773496 -0.930083
2013-01-04 0.537348 -0.870212 -1.227291 1.322823
2013-01-05 -0.897589 1.275171 1.064439 -2.021186
2013-01-06 0.130427 -1.067145 -1.273118 1.786337
[6 rows x 4 columns]
Creating a DataFrame by passing a dict of objects that can be converted to series-like:
>>> df2 = pd.DataFrame('A':1.,
'B':pd.Timestamp('20130102'),
'C':pd.Series(1,index=range(4),dtype='float32'),
'D':np.array([3]*4,dtype='int32'),
'E':pd.Categorical(['test','train','test','train']),
'F':'foo')
>>>df2
A B C D E F
0 1 2013-01-02 1 3 test foo
1 1 2013-01-02 1 3 train foo
2 1 2013-01-02 1 3 test foo
3 1 2013-01-02 1 3 train foo
[4 rows x 6 columns]
Having specific dtypes
>>> df2.dtypes
A float64
B datetime64[ns]
C float32
D int32
E object
F object
dtype: object
If you’re using IPython,tab completion for column names (as well as public attributes) is automatically enabled.
>>> df2.<TAB>
Display all 210 possibilities? (y or n)
df2.A df2.from_csv df2.rank
df2.B df2.from_dict df2.rdiv
df2.C df2.from_items df2.reindex
df2.D df2.from_records df2.reindex_axis
df2.E df2.ftypes df2.reindex_like
df2.F df2.ge df2.rename
df2.T df2.get df2.rename_axis
df2.abs df2.get_dtype_counts df2.reorder_levels
df2.add df2.get_ftype_counts df2.replace
df2.add_prefix df2.get_value df2.resample
df2.add_suffix df2.get_values df2.reset_index
df2.align df2.groupby df2.rfloordiv
df2.all df2.gt df2.rmod
df2.any df2.head df2.rmul
df2.append df2.hist df2.rpow
df2.apply df2.iat df2.rsub
df2.applymap df2.icol df2.rtruediv
df2.as_blocks df2.idxmax df2.save
df2.as_matrix df2.idxmin df2.select
df2.asfreq df2.iget_value df2.set_index
df2.astype df2.iloc df2.set_value
df2.at df2.index df2.shape
df2.at_time df2.info df2.shift
df2.axes df2.insert df2.skew
df2.between_time df2.interpolate df2.sort
df2.bfill df2.irow df2.sort_index
df2.blocks df2.is_copy df2.sortlevel
df2.bool df2.isin df2.squeeze
df2.boxplot df2.isnull df2.stack
df2.clip df2.iteritems df2.std
df2.clip_lower df2.iterkv df2.sub
df2.clip_upper df2.iterrows df2.subtract
df2.columns df2.itertuples df2.sum
df2.combine df2.ix df2.swapaxes
df2.combineAdd df2.join df2.swaplevel
df2.combineMult df2.keys df2.tail
df2.combine_first df2.kurt df2.take
df2.compound df2.kurtosis df2.to_clipboard
df2.consolidate df2.last df2.to_csv
df2.convert_objects df2.last_valid_index df2.to_dense
df2.copy df2.le df2.to_dict
df2.corr df2.load df2.to_excel
df2.corrwith df2.loc df2.to_gbq
df2.count df2.lookup df2.to_hdf
df2.cov df2.lt df2.to_html
df2.cummax df2.mad df2.to_json
df2.cummin df2.mask df2.to_latex
df2.cumprod df2.max df2.to_msgpack
df2.cumsum df2.mean df2.to_panel
df2.delevel df2.median df2.to_period
df2.describe df2.merge df2.to_pickle
df2.diff df2.min df2.to_records
df2.div df2.mod df2.to_sparse
df2.divide df2.mode df2.to_sql
df2.dot df2.mul df2.to_stata
df2.drop df2.multiply df2.to_string
df2.drop_duplicates df2.ndim df2.to_timestamp
df2.dropna df2.ne df2.to_wide
df2.dtypes df2.notnull df2.transpose
df2.duplicated df2.pct_change df2.truediv
df2.empty df2.pivot df2.truncate
df2.eq df2.pivot_table df2.tshift
df2.equals df2.plot df2.tz_convert
df2.eval df2.pop df2.tz_localize
df2.ffill df2.pow df2.unstack
df2.fillna df2.prod df2.update
df2.filter df2.product df2.values
df2.first df2.quantile df2.var
df2.first_valid_index df2.query df2.where
df2.floordiv df2.radd df2.xs
>>> df2.
2. Viewing Data
See the top&bottom rows of the frame
>>> df.head()
>>> df.tail()
>>> df.head(10)
>>> df.tail(10)
Display the index,columns,and the underlying numpy data
>>> df.index
<class 'pandas.tseries.index.DatetimeIndex'>
[2013-01-01, ..., 2013-01-06]
Length: 6, Freq: D, Timezone: None
>>> df.columns
Index([u'A', u'B', u'C', u'D'], dtype='object')
>>> df.values
array([[ 0.71632974, -1.78261015, 0.80999048, 0.31987599],
[-0.17180562, -0.52626809, 0.20674317, -1.24621339],
[-1.77496978, 1.89051681, -0.77349583, -0.93008323],
[ 0.53734751, -0.87021202, -1.22729091, 1.32282329],
[-0.89758898, 1.27517093, 1.06443943, -2.02118609],
[ 0.13042695, -1.06714528, -1.27311829, 1.78633711]])
Describe shows a quick statistic summary of your data
>>> df.describe()
A B C D
count 6.000000 6.000000 6.000000 6.000000
mean -0.243377 -0.180091 -0.198789 -0.128074
std 0.943312 1.439183 1.031516 1.513146
min -1.774970 -1.782610 -1.273118 -2.021186
25% -0.716143 -1.017912 -1.113842 -1.167181
50% -0.020689 -0.698240 -0.283376 -0.305104
75% 0.435617 0.824811 0.659179 1.072086
max 0.716330 1.890517 1.064439 1.786337
[8 rows x 4 columns]
Transposing your data
>>> df.T
2013-01-01 2013-01-02 2013-01-03 2013-01-04 2013-01-05 2013-01-06
A 0.716330 -0.171806 -1.774970 0.537348 -0.897589 0.130427
B -1.782610 -0.526268 1.890517 -0.870212 1.275171 -1.067145
C 0.809990 0.206743 -0.773496 -1.227291 1.064439 -1.273118
D 0.319876 -1.246213 -0.930083 1.322823 -2.021186 1.786337
[4 rows x 6 columns]
sorting by an axis(sort_index method)
>>> df.sort_index(axis=1,ascending=False)
D C B A
2013-01-01 0.319876 0.809990 -1.782610 0.716330
2013-01-02 -1.246213 0.206743 -0.526268 -0.171806
2013-01-03 -0.930083 -0.773496 1.890517 -1.774970
2013-01-04 1.322823 -1.227291 -0.870212 0.537348
2013-01-05 -2.021186 1.064439 1.275171 -0.897589
2013-01-06 1.786337 -1.273118 -1.067145 0.130427
[6 rows x 4 columns]
>>> df.sort_index(axis=0,ascending=False)
A B C D
2013-01-06 0.130427 -1.067145 -1.273118 1.786337
2013-01-05 -0.897589 1.275171 1.064439 -2.021186
2013-01-04 0.537348 -0.870212 -1.227291 1.322823
2013-01-03 -1.774970 1.890517 -0.773496 -0.930083
2013-01-02 -0.171806 -0.526268 0.206743 -1.246213
2013-01-01 0.716330 -1.782610 0.809990 0.319876
[6 rows x 4 columns]
>>> df.sort_index(axis=1,ascending=False).sort_index(axis=0,ascending=False)
D C B A
2013-01-06 1.786337 -1.273118 -1.067145 0.130427
2013-01-05 -2.021186 1.064439 1.275171 -0.897589
2013-01-04 1.322823 -1.227291 -0.870212 0.537348
2013-01-03 -0.930083 -0.773496 1.890517 -1.774970
2013-01-02 -1.246213 0.206743 -0.526268 -0.171806
2013-01-01 0.319876 0.809990 -1.782610 0.716330
[6 rows x 4 columns]
Sorting by values(sort method,0:descending,1:ascengding)
>>> df.sort(['A','B'],ascending=[0,1])
3. Selection
3.1 Getting
Selecting a single column,which yields a Series,equivalent to df.A
>>> df.A
2013-01-01 0.716330
2013-01-02 -0.171806
2013-01-03 -1.774970
2013-01-04 0.537348
2013-01-05 -0.897589
2013-01-06 0.130427
Freq: D, Name: A, dtype: float64
>>> df['A']
2013-01-01 0.716330
2013-01-02 -0.171806
2013-01-03 -1.774970
2013-01-04 0.537348
2013-01-05 -0.897589
2013-01-06 0.130427
Freq: D, Name: A, dtype: float64
Selecting via [],which slices the rows.
>>> df[0:3]
A B C D
2013-01-01 0.716330 -1.782610 0.809990 0.319876
2013-01-02 -0.171806 -0.526268 0.206743 -1.246213
2013-01-03 -1.774970 1.890517 -0.773496 -0.930083
>>>df['20130101':'20130103']
A B C D
2013-01-01 0.716330 -1.782610 0.809990 0.319876
2013-01-02 -0.171806 -0.526268 0.206743 -1.246213
2013-01-03 -1.774970 1.890517 -0.773496 -0.930083
[3 rows x 4 columns]
3.2 Selecting by Label
For getting a cross section(横截面) using a label
>>> df.loc[dates[0]]
A 0.716330
B -1.782610
C 0.809990
D 0.319876
Name: 2013-01-01 00:00:00, dtype: float64
Selecting on a multi-axis by label
>>> df.loc[:,['A','B']]
A B
2013-01-01 0.716330 -1.782610
2013-01-02 -0.171806 -0.526268
2013-01-03 -1.774970 1.890517
2013-01-04 0.537348 -0.870212
2013-01-05 -0.897589 1.275171
2013-01-06 0.130427 -1.067145
[6 rows x 2 columns]
Showing label slicing,both endpoints are included
>>> df.loc['20130102':'20130104',['A','B']]
A B
2013-01-02 -0.171806 -0.526268
2013-01-03 -1.774970 1.890517
2013-01-04 0.537348 -0.870212
[3 rows x 2 columns]
Reduction in the dimensions of the returned object
>>> df.loc['20130102',['A','B']]
A -0.171806
B -0.526268
Name: 2013-01-02 00:00:00, dtype: float64
For getting a scalar value(标量)
>>> df.loc[dates[0],'A']
0.71632974391895454
For getting fast access to a scalar(equiv to the prior method)
>>> df.at[dates[0],'A']
0.71632974391895454
3.3 Selection by Position
Select via the position of the passed integers
>>> df.iloc[3] #第四行
A 0.537348
B -0.870212
C -1.227291
D 1.322823
Name: 2013-01-04 00:00:00, dtype: float64
By integer slices,acting similar to numpy/python
>>> df.iloc[3:5,0:2]
A B
2013-01-04 0.537348 -0.870212
2013-01-05 -0.897589 1.275171
[2 rows x 2 columns]
By lists of integer position locations,similar to the numpy/python style
>>> df.iloc[[1,2,4],[0,2]]
A C
2013-01-02 -0.171806 0.206743
2013-01-03 -1.774970 -0.773496
2013-01-05 -0.897589 1.064439
[3 rows x 2 columns]
For slicing rows explicitly
>>> df.iloc[1:3,:]
A B C D
2013-01-02 -0.171806 -0.526268 0.206743 -1.246213
2013-01-03 -1.774970 1.890517 -0.773496 -0.930083
[2 rows x 4 columns]
For slicing columns explicitly
>>> df.iloc[:,1:3]
B C
2013-01-01 -1.782610 0.809990
2013-01-02 -0.526268 0.206743
2013-01-03 1.890517 -0.773496
2013-01-04 -0.870212 -1.227291
2013-01-05 1.275171 1.064439
2013-01-06 -1.067145 -1.273118
[6 rows x 2 columns]
For getting a value explicitly
>>> df.iloc[1,1]
-0.52626808513391488
For getting fast access to a scalar(equiv to the prior method)
>>> df.iat[1,1]
-0.52626808513391488
3.4 Boolean Indexing
Using a single column’s values to select data
>>> df[df.A>0]
>>> df[df['A']>0]
A B C D
2013-01-02 0.859761 0.755971 1.371420 0.271600
2013-01-03 0.606392 0.077458 0.251290 2.134013
2013-01-05 0.022155 -0.216343 -1.179598 0.431374
2013-01-06 2.676268 2.295133 -2.132639 0.702915
[4 rows x 4 columns]
A where operation for getting.
>>> df[df>0]
A B C D
2013-01-01 NaN NaN NaN 0.209321
2013-01-02 0.859761 0.755971 1.371420 0.271600
2013-01-03 0.606392 0.077458 0.251290 2.134013
2013-01-04 NaN NaN 0.946518 NaN
2013-01-05 0.022155 NaN NaN 0.431374
2013-01-06 2.676268 2.295133 NaN 0.702915
[6 rows x 4 columns]
Using the isin() method for filtering(过滤):
>>> df[df>0]
>>> df2 = df.copy()
>>> df2['E'] = ['one','one','two','three','four','three']
>>> df2
A B C D E
2013-01-01 -0.234954 -1.346601 -1.030691 0.209321 one
2013-01-02 0.859761 0.755971 1.371420 0.271600 one
2013-01-03 0.606392 0.077458 0.251290 2.134013 two
2013-01-04 -0.938926 -0.749240 0.946518 -0.248072 three
2013-01-05 0.022155 -0.216343 -1.179598 0.431374 four
2013-01-06 2.676268 2.295133 -2.132639 0.702915 three
[6 rows x 5 columns]
>>> df2[df2.E.isin(['two','four'])]
A B C D E
2013-01-03 0.606392 0.077458 0.251290 2.134013 two
2013-01-05 0.022155 -0.216343 -1.179598 0.431374 four
[2 rows x 5 columns]
3.5 Setting
Setting a new column automatically aligns the data by the indexes
>>> s1 = pd.Series([1,2,3,4,5,6],index=pd.date_range('20130102',periods=6))
>>> s1
2013-01-02 1
2013-01-03 2
2013-01-04 3
2013-01-05 4
2013-01-06 5
2013-01-07 6
Freq: D, dtype: int64
>>> df['F'] = s1
Setting values by label
>>> df.at[dates[0],'A'] = 0
Setting values by position
>>> df.iat[0,1] = 0
Setting by assigning with a numpy array
>>> df.loc[:,'D'] = np.array([5*len(df)])
The result of the prior setting operations
df
A B C D F
2013-01-01 0.000000 0.000000 -1.030691 30 NaN
2013-01-02 0.859761 0.755971 1.371420 30 1
2013-01-03 0.606392 0.077458 0.251290 30 2
2013-01-04 -0.938926 -0.749240 0.946518 30 3
2013-01-05 0.022155 -0.216343 -1.179598 30 4
2013-01-06 2.676268 2.295133 -2.132639 30 5
[6 rows x 5 columns]
A where operation with setting
>>> df2 = df.copy()
>>> df2[df2>0] = -df2
A B C D F
2013-01-01 0.000000 0.000000 -1.030691 -30 NaN
2013-01-02 -0.859761 -0.755971 -1.371420 -30 -1
2013-01-03 -0.606392 -0.077458 -0.251290 -30 -2
2013-01-04 -0.938926 -0.749240 -0.946518 -30 -3
2013-01-05 -0.022155 -0.216343 -1.179598 -30 -4
2013-01-06 -2.676268 -2.295133 -2.132639 -30 -5
[6 rows x 5 columns]
4. Missing Data
pandas primarily use the value np.nan to represent missing data.It is default not included in computations.
Reindexing allows you to change/add/delete the index on a specified axis.This returns a copy of the data.
>>> df1 = df.reindex(index = dates[0:4],columns = list(df.columns) + ['E'])
>>> df1.loc[dates[0]:dates[1],'E'] = 1
>>> df
A B C D F
2013-01-01 0.000000 0.000000 -1.030691 30 NaN
2013-01-02 0.859761 0.755971 1.371420 30 1
2013-01-03 0.606392 0.077458 0.251290 30 2
2013-01-04 -0.938926 -0.749240 0.946518 30 3
2013-01-05 0.022155 -0.216343 -1.179598 30 4
2013-01-06 2.676268 2.295133 -2.132639 30 5
[6 rows x 5 columns]
>>> df1
A B C D F E
2013-01-01 0.000000 0.000000 -1.030691 30 NaN 1
2013-01-02 0.859761 0.755971 1.371420 30 1 1
2013-01-03 0.606392 0.077458 0.251290 30 2 NaN
2013-01-04 -0.938926 -0.749240 0.946518 30 3 NaN
[4 rows x 6 columns]
To drop any rows that have missing data.
>>> df1.dropna(how='any')
A B C D F E
2013-01-02 0.859761 0.755971 1.37142 30 1 1
[1 rows x 6 columns]
>>> df1.fillna(value=5)
A B C D F E
2013-01-01 0.000000 0.000000 -1.030691 30 5 1
2013-01-02 0.859761 0.755971 1.371420 30 1 1
2013-01-03 0.606392 0.077458 0.251290 30 2 5
2013-01-04 -0.938926 -0.749240 0.946518 30 3 5
[4 rows x 6 columns]
To get the boolean mask where values are nan
>>> pd.isnull(df1)/df1.isnull()
A B C D F E
2013-01-01 False False False False True False
2013-01-02 False False False False False False
2013-01-03 False False False False False True
2013-01-04 False False False False False True
[4 rows x 6 columns]
5. Operations
5.1 Stats
Operations in general exclude missing data.
Performing a descriptive statistic
>>> df.mean()
A 0.537608
B 0.360497
C -0.295617
D 30.000000
F 3.000000
dtype: float64
Same operation on the other axis
>>> df.mean(1)
2013-01-01 7.242327
2013-01-02 6.797430
2013-01-03 6.587028
2013-01-04 6.451670
2013-01-05 6.525243
2013-01-06 7.567752
Freq: D, dtype: float64
Operating with objects that have different dimensionality and need aligment.In addition,pandas automatically broadcasts along the specified dimension.
>>> s = pd.Series([1,3,5,np.nan,6,8], index=dates).shift(2)
>>> s
2013-01-01 NaN
2013-01-02 NaN
2013-01-03 1
2013-01-04 3
2013-01-05 5
2013-01-06 NaN
Freq: D, dtype: float64
>>> df.sub(s,axis = 'index')
A B C D F
2013-01-01 NaN NaN NaN NaN NaN
2013-01-02 NaN NaN NaN NaN NaN
2013-01-03 -0.393608 -0.922542 -0.748710 29 1
2013-01-04 -3.938926 -3.749240 -2.053482 27 0
2013-01-05 -4.977845 -5.216343 -6.179598 25 -1
2013-01-06 NaN NaN NaN NaN NaN
[6 rows x 5 columns]
5.2 Apply
Applying function to the data
>>> df.apply(np.cumsum) #累计求和
A B C D F
2013-01-01 0.000000 0.000000 -1.030691 30 NaN
2013-01-02 0.859761 0.755971 0.340729 60 1
2013-01-03 1.466153 0.833429 0.592019 90 3
2013-01-04 0.527227 0.084189 1.538537 120 6
2013-01-05 0.549382 -0.132154 0.358939 150 10
2013-01-06 3.225651 2.162979 -1.773700 180 15
[6 rows x 5 columns]
>>> df.apply(lambda x:x.max() - x.min())
A 3.615194
B 3.044374
C 3.504059
D 0.000000
F 4.000000
dtype: float64
5.3 Histogramming
>>> s = pd.Series(np.random.randint(0,7,size=10))
>>> s
0 6
1 3
2 6
3 0
4 0
5 6
6 4
7 3
8 0
9 1
dtype: int64
>>> s.value_counts()
6 3
0 3
3 2
4 1
1 1
dtype: int64
5.4 String Methods
Series is equipped with a set of string processing methods in the str attribute that make it easy to operate on each element of the array, as in the code snippet below. Note that pattern-matching in str generally uses regular expressions by default (and in some cases always uses them)
>>> s = pd.Series(['A','B','C','AaBa','Baca',np.nan,'CABA','dog','cat'])
>>> s.str.lower()
0 a
1 b
2 c
3 aaba
4 baca
5 NaN
6 caba
7 dog
8 cat
dtype: object
6.Merge
6.1 Concat
以上是关于标题的主要内容,如果未能解决你的问题,请参考以下文章