数据清洗之 重复值处理

Posted wx62c62b36cedf9

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了数据清洗之 重复值处理相关的知识,希望对你有一定的参考价值。


重复值处理

  • 数据清洗一般先从重复值和缺失值开始处理
  • 重复值一般采取删除法来处理
  • 但有些重复值不能删除,例如订单明细数据或交易明细数据等
import pandas as pd
import numpy as np
import
os.getcwd()
D:\\\\Jupyter\\\\notebook\\\\Python数据清洗实战\\\\数据清洗之数据预处理
os.chdir(D:\\\\Jupyter\\\\notebook\\\\Python数据清洗实战\\\\数据)
df = pd.read_csv(MotorcycleData.csv, encoding=gbk, na_values=Na)
df.head(5)



Condition

Condition_Desc

Price

Location

Model_Year

Mileage

Exterior_Color

Make

Warranty

Model

...

Vehicle_Title

OBO

Feedback_Perc

Watch_Count

N_Reviews

Seller_Status

Vehicle_Tile

Auction

Buy_Now

Bid_Count

0

Used

mint!!! very low miles

$11,412

McHenry, Illinois, United States

2013.0

16,000

Black

Harley-Davidson

Unspecified

Touring

...

NaN

FALSE

8.1

NaN

2427

Private Seller

Clear

True

FALSE

28.0

1

Used

Perfect condition

$17,200

Fort Recovery, Ohio, United States

2016.0

60

Black

Harley-Davidson

Vehicle has an existing warranty

Touring

...

NaN

FALSE

100

17

657

Private Seller

Clear

True

TRUE

0.0

2

Used

NaN

$3,872

Chicago, Illinois, United States

1970.0

25,763

Silver/Blue

BMW

Vehicle does NOT have an existing warranty

R-Series

...

NaN

FALSE

100

NaN

136

NaN

Clear

True

FALSE

26.0

3

Used

CLEAN TITLE READY TO RIDE HOME

$6,575

Green Bay, Wisconsin, United States

2009.0

33,142

Red

Harley-Davidson

NaN

Touring

...

NaN

FALSE

100

NaN

2920

Dealer

Clear

True

FALSE

11.0

4

Used

NaN

$10,000

West Bend, Wisconsin, United States

2012.0

17,800

Blue

Harley-Davidson

NO WARRANTY

Touring

...

NaN

FALSE

100

13

271

OWNER

Clear

True

TRUE

0.0

5 rows × 22 columns

def f(x):
if $ in str(x):
x = str(x).strip($)
x = str(x).replace(,, )
else:
x = str(x).replace(,, )
return float(x)
df[Price] = df[Price].apply(f)
df[Mileage] = df[Mileage].apply(f)
df.head(5)



Condition

Condition_Desc

Price

Location

Model_Year

Mileage

Exterior_Color

Make

Warranty

Model

...

Vehicle_Title

OBO

Feedback_Perc

Watch_Count

N_Reviews

Seller_Status

Vehicle_Tile

Auction

Buy_Now

Bid_Count

0

Used

mint!!! very low miles

11412.0

McHenry, Illinois, United States

2013.0

16000.0

Black

Harley-Davidson

Unspecified

Touring

...

NaN

FALSE

8.1

NaN

2427

Private Seller

Clear

True

FALSE

28.0

1

Used

Perfect condition

17200.0

Fort Recovery, Ohio, United States

2016.0

60.0

Black

Harley-Davidson

Vehicle has an existing warranty

Touring

...

NaN

FALSE

100

17

657

Private Seller

Clear

True

TRUE

0.0

2

Used

NaN

3872.0

Chicago, Illinois, United States

1970.0

25763.0

Silver/Blue

BMW

Vehicle does NOT have an existing warranty

R-Series

...

NaN

FALSE

100

NaN

136

NaN

Clear

True

FALSE

26.0

3

Used

CLEAN TITLE READY TO RIDE HOME

6575.0

Green Bay, Wisconsin, United States

2009.0

33142.0

Red

Harley-Davidson

NaN

Touring

...

NaN

FALSE

100

大数据项目2(数据挖掘之数据预处理相关概念)

《谁说菜鸟不会数据分析》数据处理 之 数据清洗--重复数据的处理

数据清洗之 缺失值处理

Pandas数据清洗方法

Pandas数据清洗方法

黑马程序员《数据清洗》学习笔记数据清洗与检验部分内容

(c)2006-2024 SYSTEM All Rights Reserved IT常识