hands-on-data-analysis 第二单元 2,3节

Posted 沧夜2021

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了hands-on-data-analysis 第二单元 2,3节相关的知识,希望对你有一定的参考价值。

hands-on-data-analysis 第二单元 2,3节

@[TOC]

第二节 数据重构

万事开头记得导入基本的库:

# 导入基本库
import numpy as np
import pandas as pd

2.1.数据合并——concat横向合并

官方文档:

pandas.concat — pandas 1.4.2 documentation (pydata.org)

text_left_up,text_right_up两张表,如果横向合并为一张表(就是列与列拼接在一起)

text_left_up

PassengerId Survived Pclass Name
0 1 0 3 Braund, Mr. Owen Harris
1 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Th...
2 3 1 3 Heikkinen, Miss. Laina
3 4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel)
4 5 0 3 Allen, Mr. William Henry

text_right_up:

Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 male 22.0 1.0 0.0 A/5 21171 7.2500 NaN S
1 female 38.0 1.0 0.0 PC 17599 71.2833 C85 C
2 female 26.0 0.0 0.0 STON/O2. 3101282 7.9250 NaN S
3 female 35.0 1.0 0.0 113803 53.1000 C123 S
4 male 35.0 0.0 0.0 373450 8.0500 NaN S
list_up = [text_left_up,text_right_up]
result_up = pd.concat(list_up,axis=1)
result_up.head()

得到:

PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 1.0 0.0 3.0 Braund, Mr. Owen Harris male 22.0 1.0 0.0 A/5 21171 7.2500 NaN S
1 2.0 1.0 1.0 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1.0 0.0 PC 17599 71.2833 C85 C
2 3.0 1.0 3.0 Heikkinen, Miss. Laina female 26.0 0.0 0.0 STON/O2. 3101282 7.9250 NaN S
3 4.0 1.0 1.0 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1.0 0.0 113803 53.1000 C123 S
4 5.0 0.0 3.0 Allen, Mr. William Henry male 35.0 0.0 0.0 373450 8.0500 NaN S

就好比,把带有小明的学号的表和带有小明成绩的表合在一起。

2.2.数据合并——concat纵向合并

官方文档:

pandas.concat — pandas 1.4.2 documentation (pydata.org)

将train-left-down和train-right-down横向合并为一张表,并保存这张表为result_down。然后将上边的result_up和result_down纵向合并为result。

text_left_down的数据为:

PassengerId Survived Pclass Name
0 440 0 2 Kvillner, Mr. Johan Henrik Johannesson
1 441 1 2 Hart, Mrs. Benjamin (Esther Ada Bloomfield)
2 442 0 3 Hampe, Mr. Leon
3 443 0 3 Petterson, Mr. Johan Emil
4 444 1 2 Reynaldo, Ms. Encarnacion

text_right_down数据为:

Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 male 31.0 0 0 C.A. 18723 10.500 NaN S
1 female 45.0 1 1 F.C.C. 13529 26.250 NaN S
2 male 20.0 0 0 345769 9.500 NaN S
3 male 25.0 1 0 347076 7.775 NaN S
4 female 28.0 0 0 230434 13.000 NaN S
list_down=[text_left_down,text_right_down]
result_down = pd.concat(list_down,axis=1)
result = pd.concat([result_up,result_down])
result.head()

合并后的表为:

PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 1.0 0.0 3.0 Braund, Mr. Owen Harris male 22.0 1.0 0.0 A/5 21171 7.2500 NaN S
1 2.0 1.0 1.0 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1.0 0.0 PC 17599 71.2833 C85 C
2 3.0 1.0 3.0 Heikkinen, Miss. Laina female 26.0 0.0 0.0 STON/O2. 3101282 7.9250 NaN S
3 4.0 1.0 1.0 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1.0 0.0 113803 53.1000 C123 S
4 5.0 0.0 3.0 Allen, Mr. William Henry male 35.0 0.0 0.0 373450 8.0500 NaN S

2.3.数据合并——join

官方文档:
pandas.DataFrame.join — pandas 1.4.2 documentation (pydata.org)

从官方文档上可以知道,join的方式比较灵活。

可以在 索引 上将 列 与其他 DataFrame 连接。 也可以通过传递一个列表,一次有效地按索引连接多个 DataFrame 对象。

参数有:

DataFrame.join(other, on=None, how=left, lsuffix=, rsuffix=, sort=False)

2.4. concat 与 join 比较

concat、join等的比较

[Merge, join, concatenate and compare — pandas 1.4.2 documentation (pydata.org)](https://pandas.pydata.org/docs/user_guide/merging.html?highlight=concat join#comparing-objects)

第三节 GroupBy 接口

官方文档:

pandas.DataFrame.groupby — pandas 1.4.2 documentation (pydata.org)

以上是关于hands-on-data-analysis 第二单元 2,3节的主要内容,如果未能解决你的问题,请参考以下文章

hands-on-data-analysis 第二单元 2,3节

hands-on-data-analysis 第二单元 2,3节

hands-on-data-analysis 第二单元 2,3节

hands-on-data-analysis 第二单元 第四节数据可视化

hands-on-data-analysis 第三单元 模型搭建和评估

hands-on-data-analysis 第三单元 模型搭建和评估