pandas
Posted 要坚持写博客
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了pandas相关的知识,希望对你有一定的参考价值。
food_info = pandas.read_csv("food_info.csv") //读取一个csv后缀的文件
food_info.head() //显示前5条刚刚读取到的数据
food_info.head(3) //显示前3条刚刚读取到的数据
food_info.tail() ////显示后5条刚刚读取到的数据
food_info.columns //表示列名
food_info.shape //表示维度
food_info.loc[0] //表示第一行的数据
food_info.loc[1:3] //表示切片
food_info["xuye"] //也可以通过列名找到这一列的所有值
//显示两个列
columns = ["xuye1","xuye2"]
zi = food_info[columns]
col_names = food_info.columns.tolist() //把所有列放成一个list
gram_columns = []
for c in col_names:
if c.endswith("(g"):
gram_columns.append(c)
// 把每列末尾以(g) 结尾的找出来
gram_df = food_info[gram_columns]
print(gram_df)
初始化
foot_info.sort_values("xuye",inplace=true) //用sort_values(到位=真)为就地排序 ,默认升序排序
foot_info.sort_values("xuye",inplace=true, ascending=False) //降序排序
import numpy as np
import pandas as pd
titanic_survival = pd.read_csv("titanic_train.csv")
print(titanic_survival.head())
age = titanic_survival['Age']
print(age.loc[0:10])
age_is_null = pd.isnull(age) # 判断是否为缺失值
# print(age_is_null) #打印为缺失值所对应的元素
age_null_true = age[age_is_null]
# print(age_null_true)
age_null_count = len(age_null_true)
print(age_null_count) # 打印缺失值的数量
0 22.0
1 38.0
2 26.0
3 35.0
4 35.0
5 NaN
6 54.0
7 2.0
8 27.0
9 14.0
10 4.0
Name: Age, dtype: float64
177
# 不同船舱等级分别对应的平均票价是?
passenger_classes = [1,2,3]
fares_by_class =
for this_class in passenger_classes:
# 先把船舱等级为1/2/3 等仓的乘客数据拿到手
pclass_rows = titanic_survival[titanic_survival['Pclass'] == this_class]
# 将仓位等级数据定位到船票价格
pclass_fares = pclass_rows['Fare']
# 对当前列求均值
fare_for_class = pclass_fares.mean()
# 得到当前船舱等级的平均价格
fares_by_class[this_class] = fare_for_class
print(fares_by_class)
1: 84.15468749999992, 2: 20.66218315217391, 3: 13.675550101832997
# 丢掉缺失值
drop_na_columns = titanic_survival.dropna(axis=1)
new_titanic_survival = titanic_survival.dropna(axis=0,subset=['Age','Sex'])
# 以Age/Sex这两列为基准,发现有缺失值就丢弃
new_titanic_survival = titanic_survival.sort_values('Age',ascending=False)
print(new_titanic_survival[0:10])
# 原来的所引值为实际的索引值,为第n大的数字
titanic_reindexed = new_titanic_survival.reset_index(drop=True)
# 重新排序后,索引值变为从0开始
print('------------')
print(titanic_reindexed.loc[0:10])
PassengerId Survived Pclass Name \\
630 631 1 1 Barkworth, Mr. Algernon Henry Wilson
851 852 0 3 Svensson, Mr. Johan
493 494 0 1 Artagaveytia, Mr. Ramon
96 97 0 1 Goldschmidt, Mr. George B
116 117 0 3 Connors, Mr. Patrick
672 673 0 2 Mitchell, Mr. Henry Michael
745 746 0 1 Crosby, Capt. Edward Gifford
33 34 0 2 Wheadon, Mr. Edward H
54 55 0 1 Ostby, Mr. Engelhart Cornelius
280 281 0 3 Duane, Mr. Frank
Sex Age SibSp Parch Ticket Fare Cabin Embarked
630 male 80.0 0 0 27042 30.0000 A23 S
851 male 74.0 0 0 347060 7.7750 NaN S
493 male 71.0 0 0 PC 17609 49.5042 NaN C
96 male 71.0 0 0 PC 17754 34.6542 A5 C
116 male 70.5 0 0 370369 7.7500 NaN Q
672 male 70.0 0 0 C.A. 24580 10.5000 NaN S
745 male 70.0 1 1 WE/P 5735 71.0000 B22 S
33 male 66.0 0 0 C.A. 24579 10.5000 NaN S
54 male 65.0 0 1 113509 61.9792 B30 C
280 male 65.0 0 0 336439 7.7500 NaN Q
------------
PassengerId Survived Pclass Name Sex \\
0 631 1 1 Barkworth, Mr. Algernon Henry Wilson male
1 852 0 3 Svensson, Mr. Johan male
2 494 0 1 Artagaveytia, Mr. Ramon male
3 97 0 1 Goldschmidt, Mr. George B male
4 117 0 3 Connors, Mr. Patrick male
5 673 0 2 Mitchell, Mr. Henry Michael male
6 746 0 1 Crosby, Capt. Edward Gifford male
7 34 0 2 Wheadon, Mr. Edward H male
8 55 0 1 Ostby, Mr. Engelhart Cornelius male
9 281 0 3 Duane, Mr. Frank male
10 457 0 1 Millet, Mr. Francis Davis male
Age SibSp Parch Ticket Fare Cabin Embarked
0 80.0 0 0 27042 30.0000 A23 S
1 74.0 0 0 347060 7.7750 NaN S
2 71.0 0 0 PC 17609 49.5042 NaN C
3 71.0 0 0 PC 17754 34.6542 A5 C
4 70.5 0 0 370369 7.7500 NaN Q
5 70.0 0 0 C.A. 24580 10.5000 NaN S
6 70.0 1 1 WE/P 5735 71.0000 B22 S
7 66.0 0 0 C.A. 24579 10.5000 NaN S
8 65.0 0 1 113509 61.9792 B30 C
9 65.0 0 0 336439 7.7500 NaN Q
10 65.0 0 0 13509 26.5500 E38 S
def is_minor(row):
if row['Age'] < 18:
return True
else:
return False
minors = titanic_survival.apply(is_minor,axis=1)
print(minors)
def generate_age_label(row):
age = row['Age']
if pd.isnull(age):
return "Unknown"
elif age < 18:
return "Minor"
else:
return "Adult"
age_labels = titanic_survival.apply(generate_age_label,axis=1)
print(age_labels)
0 False
1 False
2 False
3 False
4 False
...
886 False
887 False
888 False
889 False
890 False
Length: 891, dtype: bool
0 Adult
1 Adult
2 Adult
3 Adult
4 Adult
...
886 Adult
887 Adult
888 Unknown
889 Adult
890 Adult
Length: 891, dtype: object
前部分自己总结的:后面发现[这篇]很类似,我就没有继续总结了。先看看别人的(https://blog.csdn.net/f2157120/article/details/104109024)
以上是关于pandas的主要内容,如果未能解决你的问题,请参考以下文章
如何使用pandas groupby对一些行降序和一些行升序排序
pandas计算dataframe结束时间列和起始时间列的时间差使用sort_values函数对dataframe数据基于时间差进行排序(默认为升序排序)
pandas读取csv数据index_col参数指定作为行索引的数据列索引列表形成复合(多层)行索引sort_index函数基于多层行索引对dataframe数据排序(默认升序排序)