pandas

Posted 要坚持写博客

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了pandas相关的知识,希望对你有一定的参考价值。

food_info = pandas.read_csv("food_info.csv") //读取一个csv后缀的文件

food_info.head() //显示前5条刚刚读取到的数据
food_info.head(3) //显示前3条刚刚读取到的数据
food_info.tail() ////显示后5条刚刚读取到的数据

food_info.columns //表示列名
food_info.shape //表示维度
food_info.loc[0] //表示第一行的数据
food_info.loc[1:3] //表示切片
food_info["xuye"] //也可以通过列名找到这一列的所有值

//显示两个列
columns = ["xuye1","xuye2"]
zi = food_info[columns]

col_names = food_info.columns.tolist()  //把所有列放成一个list

gram_columns = []
for c in col_names:
    if c.endswith("(g"):
        gram_columns.append(c)
// 把每列末尾以(g) 结尾的找出来
gram_df = food_info[gram_columns]
print(gram_df) 

初始化

foot_info.sort_values("xuye",inplace=true) //用sort_values(到位=真)为就地排序 ,默认升序排序
foot_info.sort_values("xuye",inplace=true, ascending=False) //降序排序
import numpy as np
import pandas as pd
 
titanic_survival = pd.read_csv("titanic_train.csv")
print(titanic_survival.head())

age = titanic_survival['Age']
print(age.loc[0:10])
age_is_null = pd.isnull(age) # 判断是否为缺失值
# print(age_is_null)  #打印为缺失值所对应的元素
age_null_true = age[age_is_null]
# print(age_null_true)
age_null_count = len(age_null_true)
print(age_null_count)   # 打印缺失值的数量
0     22.0
1     38.0
2     26.0
3     35.0
4     35.0
5      NaN
6     54.0
7      2.0
8     27.0
9     14.0
10     4.0
Name: Age, dtype: float64
177



# 不同船舱等级分别对应的平均票价是?
passenger_classes = [1,2,3]
fares_by_class = 
for this_class in passenger_classes:
    # 先把船舱等级为1/2/3 等仓的乘客数据拿到手
    pclass_rows = titanic_survival[titanic_survival['Pclass'] == this_class] 
    # 将仓位等级数据定位到船票价格
    pclass_fares = pclass_rows['Fare']
    # 对当前列求均值
    fare_for_class = pclass_fares.mean()
    # 得到当前船舱等级的平均价格
    fares_by_class[this_class] = fare_for_class
print(fares_by_class)

1: 84.15468749999992, 2: 20.66218315217391, 3: 13.675550101832997




# 丢掉缺失值
drop_na_columns = titanic_survival.dropna(axis=1)
new_titanic_survival = titanic_survival.dropna(axis=0,subset=['Age','Sex'])
# 以Age/Sex这两列为基准,发现有缺失值就丢弃

new_titanic_survival = titanic_survival.sort_values('Age',ascending=False)
print(new_titanic_survival[0:10])
# 原来的所引值为实际的索引值,为第n大的数字
titanic_reindexed = new_titanic_survival.reset_index(drop=True)
# 重新排序后,索引值变为从0开始
print('------------')
print(titanic_reindexed.loc[0:10])
     PassengerId  Survived  Pclass                                  Name  \\
630          631         1       1  Barkworth, Mr. Algernon Henry Wilson   
851          852         0       3                   Svensson, Mr. Johan   
493          494         0       1               Artagaveytia, Mr. Ramon   
96            97         0       1             Goldschmidt, Mr. George B   
116          117         0       3                  Connors, Mr. Patrick   
672          673         0       2           Mitchell, Mr. Henry Michael   
745          746         0       1          Crosby, Capt. Edward Gifford   
33            34         0       2                 Wheadon, Mr. Edward H   
54            55         0       1        Ostby, Mr. Engelhart Cornelius   
280          281         0       3                      Duane, Mr. Frank   
 
      Sex   Age  SibSp  Parch      Ticket     Fare Cabin Embarked  
630  male  80.0      0      0       27042  30.0000   A23        S  
851  male  74.0      0      0      347060   7.7750   NaN        S  
493  male  71.0      0      0    PC 17609  49.5042   NaN        C  
96   male  71.0      0      0    PC 17754  34.6542    A5        C  
116  male  70.5      0      0      370369   7.7500   NaN        Q  
672  male  70.0      0      0  C.A. 24580  10.5000   NaN        S  
745  male  70.0      1      1   WE/P 5735  71.0000   B22        S  
33   male  66.0      0      0  C.A. 24579  10.5000   NaN        S  
54   male  65.0      0      1      113509  61.9792   B30        C  
280  male  65.0      0      0      336439   7.7500   NaN        Q  
------------
    PassengerId  Survived  Pclass                                  Name   Sex  \\
0           631         1       1  Barkworth, Mr. Algernon Henry Wilson  male   
1           852         0       3                   Svensson, Mr. Johan  male   
2           494         0       1               Artagaveytia, Mr. Ramon  male   
3            97         0       1             Goldschmidt, Mr. George B  male   
4           117         0       3                  Connors, Mr. Patrick  male   
5           673         0       2           Mitchell, Mr. Henry Michael  male   
6           746         0       1          Crosby, Capt. Edward Gifford  male   
7            34         0       2                 Wheadon, Mr. Edward H  male   
8            55         0       1        Ostby, Mr. Engelhart Cornelius  male   
9           281         0       3                      Duane, Mr. Frank  male   
10          457         0       1             Millet, Mr. Francis Davis  male   
 
     Age  SibSp  Parch      Ticket     Fare Cabin Embarked  
0   80.0      0      0       27042  30.0000   A23        S  
1   74.0      0      0      347060   7.7750   NaN        S  
2   71.0      0      0    PC 17609  49.5042   NaN        C  
3   71.0      0      0    PC 17754  34.6542    A5        C  
4   70.5      0      0      370369   7.7500   NaN        Q  
5   70.0      0      0  C.A. 24580  10.5000   NaN        S  
6   70.0      1      1   WE/P 5735  71.0000   B22        S  
7   66.0      0      0  C.A. 24579  10.5000   NaN        S  
8   65.0      0      1      113509  61.9792   B30        C  
9   65.0      0      0      336439   7.7500   NaN        Q  
10  65.0      0      0       13509  26.5500   E38        S  



def is_minor(row):
    if row['Age'] < 18:
        return True
    else:
        return False
    
minors = titanic_survival.apply(is_minor,axis=1)
print(minors)
 
def generate_age_label(row):
    age = row['Age']
    if pd.isnull(age):
        return "Unknown"
    elif age < 18:
        return "Minor"
    else:
        return "Adult"
    
age_labels = titanic_survival.apply(generate_age_label,axis=1)
print(age_labels)
0      False
1      False
2      False
3      False
4      False
       ...  
886    False
887    False
888    False
889    False
890    False
Length: 891, dtype: bool
0        Adult
1        Adult
2        Adult
3        Adult
4        Adult
        ...   
886      Adult
887      Adult
888    Unknown
889      Adult
890      Adult
Length: 891, dtype: object


前部分自己总结的:后面发现[这篇]很类似,我就没有继续总结了。先看看别人的(https://blog.csdn.net/f2157120/article/details/104109024)

以上是关于pandas的主要内容,如果未能解决你的问题,请参考以下文章

如何使用pandas groupby对一些行降序和一些行升序排序

pandas计算dataframe结束时间列和起始时间列的时间差使用sort_values函数对dataframe数据基于时间差进行排序(默认为升序排序)

深入理解Pandas数据排序

pandas读取csv数据index_col参数指定作为行索引的数据列索引列表形成复合(多层)行索引sort_index函数基于多层行索引对dataframe数据排序(默认升序排序)

Pandas_简单排序复杂排序

定性特征转化为定量特征之factorize