数据规整
Posted yitiaodahe
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了数据规整相关的知识,希望对你有一定的参考价值。
1.丢弃指定轴上的数据
data=pd.DataFrame(np.arange(16).reshape(4,4),index=[‘Shenzhen‘,‘Guangzhou‘,‘Beijing‘,‘Shanghai‘],columns=[‘one‘,‘two‘,‘three‘,‘four‘]) data
one | two | three | four | |
---|---|---|---|---|
Shenzhen | 0 | 1 | 2 | 3 |
Guangzhou | 4 | 5 | 6 | 7 |
Beijing | 8 | 9 | 10 | 11 |
Shanghai | 12 | 13 | 14 | 15 |
data.drop([‘Shenzhen‘,‘Guangzhou‘])
one | two | three | four | |
---|---|---|---|---|
Beijing | 8 | 9 | 10 | 11 |
Shanghai | 12 | 13 | 14 | 15 |
data.drop([‘two‘],axis=1)
删除第二列
2.函数映射
Numpy的ufunc也可以用于操作pandas对象。
例如:np.fabs(frame)
DataFrame.apply
DataFrame.
apply
(func, axis=0, broadcast=None, raw=False, reduce=None, result_type=None, args=(), **kwds)[source]?
Apply a function along an axis of the DataFrame.
DataFrame.applymap
DataFrame.
applymap
(func)[source]
Series.map
Series.
map
(arg, na_action=None)[source]?
Map values of Series using input correspondence (a dict, Series, or function).
def f1(s): x=s.max()-s.min() return x f = lambda x : x.max()-x.min() frame.apply(f1)#列方向
one 3.168231 two 3.324250 three 2.111743 dtype: float64
f = lambda x: ‘%.2f‘ %x frame.applymap(f)
one | two | three | |
---|---|---|---|
Shenzhen | 1.55 | -2.59 | -1.21 |
Guangzhou | 0.42 | -0.16 | 0.17 |
Shanghai | -1.62 | 0.73 | -0.87 |
Beijing | 0.33 | 0.00 | 0.90 |
3.排序
sort_index / sort_value
4.数据合并
pandas.merge
DataFrame.
merge
(right, how=‘inner‘, on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes= (‘_x‘, ‘_y‘), copy=True, indic ator=False, validate=None)[source]?
Merge DataFrame objects by performing a database-style join operation by columns or indexes.
类似数据库表连接,左连、右连、内联、外联
例子:
df1 = pd.DataFrame({‘key1‘:[‘foo‘,‘bar‘,‘baz‘,‘foo‘],‘data1‘:list(np.arange(1,5))}) df2 = pd.DataFrame({‘key2‘:[‘foo‘,‘bar‘,‘qux‘,‘bar‘],‘data2‘:list(np.arange(5,9))}) print(df1) print(df2)
key1 data1 0 foo 1 1 bar 2 2 baz 3 3 foo 4 key2 data2 0 foo 5 1 bar 6 2 qux 7 3 bar 8
df1.merge(df2, left_on=‘key1‘, right_on=‘key2‘, how=‘right‘)#参数how代表连接方式,有‘inner‘、‘left‘、‘right’、‘outer’
key1 | data1 | key2 | data2 | |
---|---|---|---|---|
0 | foo | 1.0 | foo | 5 |
1 | foo | 4.0 | foo | 5 |
2 | bar | 2.0 | bar | 6 |
3 | bar | 2.0 | bar | 8 |
4 | NaN | NaN | qux | 7 |
pandas.concat
pandas.combine_first
5.数据重塑
DataFrame.stack/unstack
以上是关于数据规整的主要内容,如果未能解决你的问题,请参考以下文章