kaggle-泰坦尼克号Titanic-3

Posted Freeman耀

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了kaggle-泰坦尼克号Titanic-3相关的知识,希望对你有一定的参考价值。

根据以上两篇的分析,下面我们还要对数据进行处理,观察Age和Fare两个属性,乘客的数值变化幅度较大!根据逻辑回归和梯度下降的了解,如果属性值之间scale差距较大,将对收敛速度造成较大影响,甚至不收敛!因此,我们需要运用scikit-learn里面的preprocessing模块对Age和Fare两个属性做一个scaling,即将其数值转化为[-1,1]范围内。

1 # 接下来我们将一些变化幅度较大的特征化到[-1,1]之内,这样可以加速logistic regression的收敛
2 import sklearn.preprocessing as preprocessing
3 scaler = preprocessing.StandardScaler()
4 age_scale_param = scaler.fit(df[\'Age\'])
5 df[\'Age_scaled\'] = scaler.fit_transform(df[\'Age\'],age_scale_param)
6 fare_scale_param = scaler.fit(df[\'Fare\'])
7 df[\'Fare_scaled\'] = scaler.fit_transform(df[\'Fare\'],fare_scale_param)
8 print(df)

 

PassengerId

Survived

Age

SibSp

Parch

Fare

Cabin_No

Cabin_Yes

Embarked_C

Embarked_Q

Embarked_S

Sex_female

Sex_male

Pclass_1

Pclass_2

Pclass_3

Age_scaled

Fare_scaled

0

1

0

22.000000

1

0

7.2500

1

0

0

0

1

0

1

0

0

1

-0.561417

-0.502445

1

2

1

38.000000

1

0

71.2833

0

1

1

0

0

1

0

1

0

0

0.613177

0.786845

2

3

1

26.000000

0

0

7.9250

1

0

0

0

1

1

0

0

0

1

-0.267768

-0.488854

3

4

1

35.000000

1

0

53.1000

0

1

0

0

1

1

0

1

0

0

0.392941

0.420730

4

5

0

35.000000

0

0

8.0500

1

0

0

0

1

0

1

0

0

1

0.392941

-0.486337

5

6

0

23.828953

0

0

8.4583

1

0

0

1

0

0

1

0

0

1

-0.427149

-0.478116

6

7

0

54.000000

0

0

51.8625

0

1

0

0

1

0

1

1

0

0

1.787771

0.395814

7

8

0

2.000000

3

1

21.0750

1

0

0

0

1

0

1

0

0

1

-2.029659

-0.224083

8

9

1

27.000000

0

2

11.1333

1

0

0

0

1

1

0

0

0

1

-0.194356

-0.424256

9

10

1

14.000000

1

0

30.0708

1

0

1

0

0

1

0

0

1

0

-1.148714

-0.042956

10

11

1

4.000000

1

1

16.7000

0

1

0

0

1

1

0

0

0

1

-1.882835

-0.312172

11

12

1

58.000000

0

0

26.5500

0

1

0

0

1

1

0

1

0

0

2.081420

-0.113846

12

13

0

20.000000

0

0

8.0500

1

0

0

0

1

0

1

0

0

1

-0.708241

-0.486337

13

14

0

39.000000

1

5

31.2750

1

0

0

0

1

0

1

0

0

1

0.686589

-0.018709

14

15

0

14.000000

0

0

7.8542

1

0

0

0

1

1

0

0

0

1

-1.148714

-0.490280

15

16

1

55.000000

0

0

16.0000

1

0

0

0

1

1

0

0

1

0

1.861183

-0.326267

16

17

0

2.000000

4

1

29.1250

1

0

0

1

0

0

1

0

0

1

-2.029659

-0.061999

17

18

1

32.066493

0

0

13.0000

1

0

0

0

1

0

1

0

1

0

0.177586

-0.386671

18

19

0

31.000000

1

0

18.0000

1

0

0

0

1

1

0

0

0

1

0.099292

-0.285997

19

20

1

29.518205

0

0

7.2250

1

0

1

0

0

1

0

0

0

1

-0.009489

-0.502949

20

21

0

35.000000

0

0

26.0000

1

0

0

0

1

0

1

0

1

0

0.392941

-0.124920

21

22

1

34.000000

0

0

13.0000

0

1

0

0

1

0

1

0

1

0

0.319529

-0.386671

22

23

1

15.000000

0

0

8.0292

1

0

0

1

0

1

0

0

0

1

-1.075302

-0.486756

23

24

1

28.000000

0

0

35.5000

0

1

0

0

1

0

1

1

0

0

-0.120944

0.066360

24

25

0

8.000000

3

1

21.0750

1

0

0

0

1

1

0

0

0

1

-1.589186

-0.224083

25

26

1

38.000000

1

5

31.3875

1

0

0

0

1

1

0

0

0

1

0.613177

-0.016444

26

27

0

29.518205

0

0

7.2250

1

0

1

0

0

0

1

0

0

1

-0.009489

-0.502949

27

28

0

19.000000

3

2

263.0000

0

1

0

0

1

0

1

1

0

0

-0.781653

4.647001

28

29

1

22.380113

0

0

7.8792

1

0

0

1

0

1

0

0

0

1

-0.533512

-0.489776

29

30

0

27.947206

0

0

7.8958

1

0

0

0

1

0

1

0

0

1

-0.124820

-0.489442

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

861

862

0

21.000000

1

0

11.5000

1

0

0

0

1

0

1

0

1

0

-0.634829

-0.416873

862

863

1

48.000000

0

0

25.9292

0

1

0

0

1

1

0

1

0

0

1.347299

-0.126345

863

864

0

10.888325

8

2

69.5500

1

0

0

0

1

1

0

0

0

1

-1.377148

0.751946

864

865

0

24.000000

0

0

13.0000

1

0

0

0

1

0

1

0

1

0

-0.414592

-0.386671

865

866

1

42.000000

Kaggle经典测试,泰坦尼克号的生存预测,机器学习实验----02

Kaggle泰坦尼克-Python

Kaggle系列之预测泰坦尼克号人员的幸存与死亡(随机森林模型)

Kaggle实战入门:泰坦尼克号生还预测(进阶版)

Kaggle 泰坦尼克号

数据挖掘竞赛kaggle初战——泰坦尼克号生还预测

(c)2006-2024 SYSTEM All Rights Reserved IT常识