python 用于数据探索的Python代码片段(例如,在数据科学项目中)

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了python 用于数据探索的Python代码片段(例如,在数据科学项目中)相关的知识,希望对你有一定的参考价值。



### single
Y = data['TV'] # column
Y = data.TV

df[:2] # first two rows
df.ix['Maricopa'] # view a row
df.ix[:, 'coverage'] # view a column
df.ix['Yuma', 'coverage'] # view the value based on a row and column

# select rows
df.iloc[:2] # rows by row number
df.iloc[1:2] # Select the second and third row
df.iloc[2:] # Select every row after the third row
df.loc[:'Arizona'] # all rows by index label
df.ix[['Arizona', 'Texas']]   # .ix is the combination of both .loc and .iloc. Integers are first considered labels, but if not found, falls back on positional indexing
df[df['coverage'] > 50] # all rows where coverage is more than 50
df[(df['deaths'] > 500) | (df['deaths'] < 50)]
df[~(df['regiment'] == 'Dragoons')] # Select all the regiments not named "Dragoons"

# select cells
df.ix['Arizona', 2]  # Select the third cell in the row named Arizona
df.ix[2, 'deaths']  # Select the third cell down in the column named deaths

# select columns
df.iloc[:,:2] # Select the first 2 columns
feature_cols = ['TV','Radio','Newspaper']
x = data[feature_cols]
data[['TV','Radio','Newspaper']]




### general


data.head()

data.tail()
data.tail().transpose()


data.shape()


#List unique values in the df['name'] column
df.name.unique()


#Truncate the dataframe
df.truncate(before='1/2/2014', after='1/3/2014') #time series


####
# Stats
####

# Descriptive statistics by group
df['preTestScore'].groupby(df['company']).describe()
df['preTestScore'].describe()
# Count the number of non-NA values
df['preTestScore'].count()
df['preTestScore'].min()

# Correlation Matrix Of Values
df.corr()


#####
# Mean preTestScores grouped by regiment and company
#####
#regiment    company
#Dragoons    1st         3.5
#            2nd        27.5
#Nighthawks  1st        14.0
#            2nd        16.5
#Scouts      1st         2.5
#            2nd         2.5
#dtype: float64
df['preTestScore'].groupby([df['regiment'], df['company']]).mean()

####
# Mean preTestScores grouped by regiment and company without heirarchical indexing
####
#company	1st	2nd
#regiment		
#Dragoons	3.5	27.5
#Nighthawks	14.0	16.5
#Scouts	2.5	2.5
df['preTestScore'].groupby([df['regiment'], df['company']]).mean().unstack()

#####
# Group the entire dataframe by regiment and company
#####
#preTestScore	postTestScore
#regiment	company		
#Dragoons	1st	3.5	47.5
#2nd	27.5	75.5
#Nighthawks	1st	14.0	59.5
#2nd	16.5	59.5
#Scouts	1st	2.5	66.0
#2nd	2.5	66.0

df.groupby(['regiment', 'company']).mean()


# Count the number of times each number of deaths occurs in each regiment
# Input
#	guardCorps	corps1	corps2	corps3	corps4	corps5	corps6	corps7	corps8	corps9	corps10	corps11	corps14	corps15
# 1875	0	0	0	0	0	0	0	1	1	0	0	0	1	0
# 1876	2	0	0	0	1	0	0	0	0	0	0	0	1	1
# 1877	2	0	0	0	0	0	1	1	0	0	1	0	2	0


result = horsekick.apply(pd.value_counts).fillna(0); result

# Create a crosstab table by company and regiment


# regiment	company	experience	name	preTestScore	postTestScore
# 0	Nighthawks	infantry	veteran	Miller	4	25
# 1	Nighthawks	infantry	rookie	Jacobson	24	94
# 2	Nighthawks	cavalry	veteran	Ali	31	57
# -->
# company	cavalry	infantry	All
# regiment			
# Dragoons	2	2	4
# Nighthawks	2	2	4

# Counting the number of observations by regiment and category

pd.crosstab(df.regiment, df.company, margins=True)


df['preTestScore'].idxmax() # row with max value

以上是关于python 用于数据探索的Python代码片段(例如,在数据科学项目中)的主要内容,如果未能解决你的问题,请参考以下文章

常用python日期日志获取内容循环的代码片段

python BrickstorOS片段用于获取各种文件系统信息。

YYDS!几行Python代码,就实现了全面自动探索性数据分析

YYDS!几行Python代码,就实现了全面自动探索性数据分析

最简洁的Python时间序列可视化:数据科学分析价格趋势,预测价格,探索价格

区区几行代码,就能全面实现 Python 自动探索性数据分析