python 用于数据探索的Python代码片段(例如,在数据科学项目中)
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了python 用于数据探索的Python代码片段(例如,在数据科学项目中)相关的知识,希望对你有一定的参考价值。
### single
Y = data['TV'] # column
Y = data.TV
df[:2] # first two rows
df.ix['Maricopa'] # view a row
df.ix[:, 'coverage'] # view a column
df.ix['Yuma', 'coverage'] # view the value based on a row and column
# select rows
df.iloc[:2] # rows by row number
df.iloc[1:2] # Select the second and third row
df.iloc[2:] # Select every row after the third row
df.loc[:'Arizona'] # all rows by index label
df.ix[['Arizona', 'Texas']] # .ix is the combination of both .loc and .iloc. Integers are first considered labels, but if not found, falls back on positional indexing
df[df['coverage'] > 50] # all rows where coverage is more than 50
df[(df['deaths'] > 500) | (df['deaths'] < 50)]
df[~(df['regiment'] == 'Dragoons')] # Select all the regiments not named "Dragoons"
# select cells
df.ix['Arizona', 2] # Select the third cell in the row named Arizona
df.ix[2, 'deaths'] # Select the third cell down in the column named deaths
# select columns
df.iloc[:,:2] # Select the first 2 columns
feature_cols = ['TV','Radio','Newspaper']
x = data[feature_cols]
data[['TV','Radio','Newspaper']]
### general
data.head()
data.tail()
data.tail().transpose()
data.shape()
#List unique values in the df['name'] column
df.name.unique()
#Truncate the dataframe
df.truncate(before='1/2/2014', after='1/3/2014') #time series
####
# Stats
####
# Descriptive statistics by group
df['preTestScore'].groupby(df['company']).describe()
df['preTestScore'].describe()
# Count the number of non-NA values
df['preTestScore'].count()
df['preTestScore'].min()
# Correlation Matrix Of Values
df.corr()
#####
# Mean preTestScores grouped by regiment and company
#####
#regiment company
#Dragoons 1st 3.5
# 2nd 27.5
#Nighthawks 1st 14.0
# 2nd 16.5
#Scouts 1st 2.5
# 2nd 2.5
#dtype: float64
df['preTestScore'].groupby([df['regiment'], df['company']]).mean()
####
# Mean preTestScores grouped by regiment and company without heirarchical indexing
####
#company 1st 2nd
#regiment
#Dragoons 3.5 27.5
#Nighthawks 14.0 16.5
#Scouts 2.5 2.5
df['preTestScore'].groupby([df['regiment'], df['company']]).mean().unstack()
#####
# Group the entire dataframe by regiment and company
#####
#preTestScore postTestScore
#regiment company
#Dragoons 1st 3.5 47.5
#2nd 27.5 75.5
#Nighthawks 1st 14.0 59.5
#2nd 16.5 59.5
#Scouts 1st 2.5 66.0
#2nd 2.5 66.0
df.groupby(['regiment', 'company']).mean()
# Count the number of times each number of deaths occurs in each regiment
# Input
# guardCorps corps1 corps2 corps3 corps4 corps5 corps6 corps7 corps8 corps9 corps10 corps11 corps14 corps15
# 1875 0 0 0 0 0 0 0 1 1 0 0 0 1 0
# 1876 2 0 0 0 1 0 0 0 0 0 0 0 1 1
# 1877 2 0 0 0 0 0 1 1 0 0 1 0 2 0
result = horsekick.apply(pd.value_counts).fillna(0); result
# Create a crosstab table by company and regiment
# regiment company experience name preTestScore postTestScore
# 0 Nighthawks infantry veteran Miller 4 25
# 1 Nighthawks infantry rookie Jacobson 24 94
# 2 Nighthawks cavalry veteran Ali 31 57
# -->
# company cavalry infantry All
# regiment
# Dragoons 2 2 4
# Nighthawks 2 2 4
# Counting the number of observations by regiment and category
pd.crosstab(df.regiment, df.company, margins=True)
df['preTestScore'].idxmax() # row with max value
以上是关于python 用于数据探索的Python代码片段(例如,在数据科学项目中)的主要内容,如果未能解决你的问题,请参考以下文章
python BrickstorOS片段用于获取各种文件系统信息。
YYDS!几行Python代码,就实现了全面自动探索性数据分析
YYDS!几行Python代码,就实现了全面自动探索性数据分析