Python|Kaggle机器学习系列之Pandas基础练习题

Posted 海轰Pro

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Python|Kaggle机器学习系列之Pandas基础练习题相关的知识,希望对你有一定的参考价值。

前言

Hello!小伙伴!
非常感谢您阅读海轰的文章,倘若文中有错误的地方,欢迎您指出~
 
自我介绍 ଘ(੭ˊᵕˋ)੭
昵称:海轰
标签:程序猿|C++选手|学生
简介:因C语言结识编程,随后转入计算机专业,有幸拿过一些国奖、省奖…已保研。目前正在学习C++/Linux/Python
学习经验:扎实基础 + 多做笔记 + 多敲代码 + 多思考 + 学好英语!
 
初学Python 小白阶段
文章仅作为自己的学习笔记 用于知识体系建立以及复习
题不在多 学一题 懂一题
知其然 知其所以然!

Introduction

In this set of exercises we will work with the Wine Reviews dataset.

运行代码代码
导入本次练习的数据集以及相应的包

import pandas as pd

reviews = pd.read_csv("../input/wine-reviews/winemag-data-130k-v2.csv", index_col=0)
pd.set_option("display.max_rows", 5)

from learntools.core import binder; binder.bind(globals())
from learntools.pandas.indexing_selecting_and_assigning import *
print("Setup complete.")

Look at an overview of your data by running the following line.

运行代码 查看导入的数据

reviews.head()

练习使用的数据如下:

Exercises

1.

题目

Select the description column from reviews and assign the result to the variable desc.

解答

题目要求:

#review为使用的数据集 开始时已经导入
单独提取出description这一列 赋值给desc

desc = reviews.description

运行结果:

其余解答:

desc = reviews["description"]

2.

题目

Select the first value from the description column of reviews, assigning it to variable first_description.

解答

题目要求:

提取 description 列中的第一个值

first_description = reviews.description.iloc[0]

运行结果:

3.

题目

Select **the first row of data **(the first record) from reviews, assigning it to the variable first_row.

解答

题目要求:

提取数据review的第一行

first_row = reviews.iloc[0]

运行结果:

4.

题目

Select the first 10 values from the description column in reviews, assigning the result to variable first_descriptions.

Hint: format your output as a pandas Series.

解答

题目要求:

description 列的前十个元素

first_descriptions = reviews.description.iloc[:10]

运行结果:

其余解答:

first_descriptions = reviews.description.head(10)
first_descriptions = reviews.loc[:9, "description"]

5.

题目

Select the records with index labels 1, 2, 3, 5, and 8, assigning the result to the variable sample_reviews.

In other words, generate the following DataFrame:

解答

题目要求:

提取行标(索引值) 为1、2 、3、5、8 的数据 并赋值给sample_reviews

indices = [1, 2, 3, 5, 8]
sample_reviews = reviews.loc[indices]

运行结果:

6.

题目

Create a variable df containing the country, province, region_1, and region_2 columns of the records with the index labels 0, 1, 10, and 100. In other words, generate the following DataFrame:

解答

题目要求:

提取 列为country, province, region_1, and region_2 且 行标为0 、1、10、100的数据
并赋值给df

cols = ['country', 'province', 'region_1', 'region_2']
indices = [0, 1, 10, 100]
df = reviews.loc[indices, cols]

运行结果:

7.

题目

Create a variable df containing the country and variety columns of the first 100 records.

Hint: you may use loc or iloc. When working on the answer this question and the several of the ones that follow, keep the following “gotcha” described in the tutorial:

iloc uses the Python stdlib indexing scheme, where the first element of the range is included and the last one excluded.
loc, meanwhile, indexes inclusively.

This is particularly confusing when the DataFrame index is a simple numerical list, e.g. 0,...,1000. In this case df.iloc[0:1000] will return 1000 entries, while df.loc[0:1000] return 1001 of them! To get 1000 elements using loc, you will need to go one lower and ask for df.iloc[0:999].

解答

题目要求:

提取列为country and variety 的前100行数据 并赋值给df

cols = ['country','variety']
df = reviews.loc[0:99,cols]

其余解答:

cols_idx = [0, 11]
df = reviews.iloc[:100, cols_idx]

8.

题目

Create a DataFrame italian_wines containing reviews of wines made in Italy.

Hint: reviews.country equals what?

解答

题目要求:

提取出 列country==‘Italy’ 的所有记录

italian_wines =  reviews[reviews.country == 'Italy']

运行结果:

9.

题目

Create a DataFrame top_oceania_wines containing all reviews with at least 95 points (out of 100) for wines from Australia or New Zealand.

解答

题目要求:

提取出 列country为Australia或者New Zealand 且 points分数大于等于95 的所有记录

top_oceania_wines = reviews.loc[
    (reviews.country.isin(['Australia', 'New Zealand']))
    & (reviews.points >= 95)
]

运行结果:

结语

文章仅作为学习笔记,记录从0到1的一个过程

希望对您有所帮助,如有错误欢迎小伙伴指正~

我是 海轰ଘ(੭ˊᵕˋ)੭

如果您觉得写得可以的话,请点个赞吧

谢谢支持 ❤️

以上是关于Python|Kaggle机器学习系列之Pandas基础练习题的主要内容,如果未能解决你的问题,请参考以下文章

Python|Kaggle机器学习系列之Pandas基础练习题

Python|Kaggle机器学习系列之Pandas基础练习题

Python|Kaggle机器学习系列之Pandas基础练习题

Python|Kaggle机器学习系列之Pandas基础练习题

机器学习英雄访谈录之 Kaggle Kernels 专家:Aakash Nain

机器学习系列_逻辑回归应用之Kaggle泰坦尼克之灾