Python|Kaggle机器学习系列之Pandas基础练习题
Posted 海轰Pro
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Python|Kaggle机器学习系列之Pandas基础练习题相关的知识,希望对你有一定的参考价值。
前言
Hello!小伙伴!
非常感谢您阅读海轰的文章,倘若文中有错误的地方,欢迎您指出~
自我介绍 ଘ(੭ˊᵕˋ)੭
昵称:海轰
标签:程序猿|C++选手|学生
简介:因C语言结识编程,随后转入计算机专业,有幸拿过一些国奖、省奖…已保研。目前正在学习C++/Linux/Python
学习经验:扎实基础 + 多做笔记 + 多敲代码 + 多思考 + 学好英语!
初学Python 小白阶段
文章仅作为自己的学习笔记 用于知识体系建立以及复习
题不在多 学一题 懂一题
知其然 知其所以然!
Introduction
In this set of exercises we will work with the Wine Reviews dataset.
运行代码代码
导入本次练习的数据集以及相应的包
import pandas as pd
reviews = pd.read_csv("../input/wine-reviews/winemag-data-130k-v2.csv", index_col=0)
pd.set_option("display.max_rows", 5)
from learntools.core import binder; binder.bind(globals())
from learntools.pandas.indexing_selecting_and_assigning import *
print("Setup complete.")
Look at an overview of your data by running the following line.
运行代码 查看导入的数据
reviews.head()
练习使用的数据如下:
Exercises
1.
题目
Select the description column from reviews and assign the result to the variable desc.
解答
题目要求:
#review为使用的数据集 开始时已经导入
单独提取出description这一列 赋值给desc
desc = reviews.description
运行结果:
其余解答:
desc = reviews["description"]
2.
题目
Select the first value from the description column of reviews, assigning it to variable first_description.
解答
题目要求:
提取 description 列中的第一个值
first_description = reviews.description.iloc[0]
运行结果:
3.
题目
Select **the first row of data **(the first record) from reviews, assigning it to the variable first_row.
解答
题目要求:
提取数据review的第一行
first_row = reviews.iloc[0]
运行结果:
4.
题目
Select the first 10 values from the description column in reviews, assigning the result to variable first_descriptions.
Hint: format your output as a pandas Series.
解答
题目要求:
description 列的前十个元素
first_descriptions = reviews.description.iloc[:10]
运行结果:
其余解答:
first_descriptions = reviews.description.head(10)
first_descriptions = reviews.loc[:9, "description"]
5.
题目
Select the records with index labels 1, 2, 3, 5, and 8, assigning the result to the variable sample_reviews.
In other words, generate the following DataFrame:
解答
题目要求:
提取行标(索引值) 为1、2 、3、5、8 的数据 并赋值给sample_reviews
indices = [1, 2, 3, 5, 8]
sample_reviews = reviews.loc[indices]
运行结果:
6.
题目
Create a variable df
containing the country
, province
, region_1
, and region_2
columns of the records with the index labels 0
, 1
, 10
, and 100
. In other words, generate the following DataFrame:
解答
题目要求:
提取 列为
country
,province
,region_1
, andregion_2
且 行标为0 、1、10、100的数据
并赋值给df
cols = ['country', 'province', 'region_1', 'region_2']
indices = [0, 1, 10, 100]
df = reviews.loc[indices, cols]
运行结果:
7.
题目
Create a variable df
containing the country
and variety
columns of the first 100 records.
Hint: you may use
loc
oriloc
. When working on the answer this question and the several of the ones that follow, keep the following “gotcha” described in the tutorial:
iloc
uses the Python stdlib indexing scheme, where the first element of the range is included and the last one excluded.
loc
, meanwhile, indexes inclusively.
This is particularly confusing when the DataFrame index is a simple numerical list, e.g.
0,...,1000
. In this casedf.iloc[0:1000]
will return 1000 entries, whiledf.loc[0:1000]
return 1001 of them! To get 1000 elements usingloc
, you will need to go one lower and ask fordf.iloc[0:999]
.
解答
题目要求:
提取列为
country
andvariety
的前100行数据 并赋值给df
cols = ['country','variety']
df = reviews.loc[0:99,cols]
其余解答:
cols_idx = [0, 11]
df = reviews.iloc[:100, cols_idx]
8.
题目
Create a DataFrame italian_wines
containing reviews of wines made in Italy
.
Hint:
reviews.country
equals what?
解答
题目要求:
提取出 列country==‘Italy’ 的所有记录
italian_wines = reviews[reviews.country == 'Italy']
运行结果:
9.
题目
Create a DataFrame top_oceania_wines
containing all reviews with at least 95 points (out of 100) for wines from Australia or New Zealand.
解答
题目要求:
提取出 列country为Australia或者New Zealand 且 points分数大于等于95 的所有记录
top_oceania_wines = reviews.loc[
(reviews.country.isin(['Australia', 'New Zealand']))
& (reviews.points >= 95)
]
运行结果:
结语
文章仅作为学习笔记,记录从0到1的一个过程
希望对您有所帮助,如有错误欢迎小伙伴指正~
我是 海轰ଘ(੭ˊᵕˋ)੭
如果您觉得写得可以的话,请点个赞吧
谢谢支持 ❤️
以上是关于Python|Kaggle机器学习系列之Pandas基础练习题的主要内容,如果未能解决你的问题,请参考以下文章
Python|Kaggle机器学习系列之Pandas基础练习题
Python|Kaggle机器学习系列之Pandas基础练习题
Python|Kaggle机器学习系列之Pandas基础练习题
Python|Kaggle机器学习系列之Pandas基础练习题