电影数据可视化项目--数据清理

Posted 天天学点数据分析

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了电影数据可视化项目--数据清理相关的知识,希望对你有一定的参考价值。

电影数据可视化项目--数据清理

提出问题:

1:电影类型是如何随着时间的推移发生变化的?

2.Universal Pictures ParamountPictures 之间的对比情况如何?

3.改编电影和原创电影的对比情况如何?(通过keywords变量中的based on novel字段来判断)

导入需要的库

importpandas as pd
import numpy as np
import re

csv数据加载到pandas数据框

movies =pd.read_csv("movies.csv")
movies.head()


id

imdb_id

popularity

budget

revenue

original_title

cast

homepage

director

tagline

...

overview

runtime

genres

production_companies

release_date

vote_count

vote_average

release_year

budget_adj

revenue_adj

0

135397

tt0369610

32.985763

150000000

1513528810

Jurassic World

Chris Pratt|Bryce  Dallas Howard|Irrfan Khan|Vi...

http://www.jurassicworld.com/

Colin Trevorrow

The park is open.

...

Twenty-two years  after the events of Jurassic ...

124

Action|Adventure|Science  Fiction|Thriller

Universal  Studios|Amblin Entertainment|Legenda...

2015-06-09

5562

6.5

2015

1.379999e+08

1.392446e+09

1

76341

tt1392190

28.419936

150000000

378436354

Mad Max: Fury Road

Tom Hardy|Charlize  Theron|Hugh Keays-Byrne|Nic...

http://www.madmaxmovie.com/

George Miller

What a Lovely Day.

...

An apocalyptic  story set in the furthest reach...

120

Action|Adventure|Science  Fiction|Thriller

Village Roadshow  Pictures|Kennedy Miller Produ...

2015-05-13

6185

7.1

2015

1.379999e+08

3.481613e+08

2

262500

tt2908446

13.112507

110000000

295238201

Insurgent

Shailene  Woodley|Theo James|Kate Winslet|Ansel...

http://www.thedivergentseries.movie/#insurgent

Robert Schwentke

One Choice Can  Destroy You

...

Beatrice Prior  must confront her inner demons ...

119

Adventure|Science  Fiction|Thriller

Summit  Entertainment|Mandeville Films|Red Wago...

2015-03-18

2480

6.3

2015

1.012000e+08

2.716190e+08

3

140607

tt2488496

11.173104

200000000

2068178225

Star Wars: The  Force Awakens

Harrison Ford|Mark  Hamill|Carrie Fisher|Adam D...

http://www.starwars.com/films/star-wars-episod...

J.J. Abrams

Every generation  has a story.

...

Thirty years after  defeating the Galactic Empi...

136

Action|Adventure|Science  Fiction|Fantasy

Lucasfilm|Truenorth  Productions|Bad Robot

2015-12-15

5292

7.5

2015

1.839999e+08

1.902723e+09

4

168259

tt2820852

9.335014

190000000

1506249360

Furious 7

Vin Diesel|Paul  Walker|Jason Statham|Michelle ...

http://www.furious7.com/

James Wan

Vengeance Hits  Home

...

Deckard Shaw seeks  revenge against Dominic Tor...

137

Action|Crime|Thriller

Universal  Pictures|Original Film|Media Rights ...

2015-04-01

2947

7.3

2015

1.747999e+08

1.385749e+09

5 rows × 21 columns

数据清理

问题 1:电影类型是如何随着时间的推移发生变化的?

Question 1: How havemovie genres changed over time?

构建一个数据框子集movies_genres

movies_genres= movies[['id','original_title','genres']].reset_index(drop =True)
movies_genres.head()


id

original_title

genres

0

135397

Jurassic World

Action|Adventure|Science  Fiction|Thriller

1

76341

Mad Max: Fury Road

Action|Adventure|Science  Fiction|Thriller

2

262500

Insurgent

Adventure|Science  Fiction|Thriller

3

140607

Star Wars: The  Force Awakens

Action|Adventure|Science  Fiction|Fantasy

4

168259

Furious 7

Action|Crime|Thriller

使用split函数对genres分列处理

movies_genres[['genres1','genres2','genres3','genres4','genres5']]= movies_genres['genres'].str.split('|',expand = True)
del movies_genres['genres']
movies_genres.head()


id

original_title

genres1

genres2

genres3

genres4

genres5

0

135397

Jurassic World

Action

Adventure

Science Fiction

Thriller

None

1

76341

Mad Max: Fury Road

Action

Adventure

Science Fiction

Thriller

None

2

262500

Insurgent

Adventure

Science Fiction

Thriller

None

None

3

140607

Star Wars: The  Force Awakens

Action

Adventure

Science Fiction

Fantasy

None

4

168259

Furious 7

Action

Crime

Thriller

None

None

对新分出来的5genres列进行逆透视处理

movies_genres= pd.melt(movies_genres,id_vars = ['id','original_title'],value_name ='genres',var_name = 'genre_n')
movies_genres.dropna(axis = 0,subset = ['genres'],inplace =True)
movies_genres.head()


id

original_title

genre_n

genres

0

135397

Jurassic World

genres1

Action

1

76341

Mad Max: Fury Road

genres1

Action

2

262500

Insurgent

genres1

Adventure

3

140607

Star Wars: The  Force Awakens

genres1

Action

4

168259

Furious 7

genres1

Action

删除movies中的原始genres,然后合并数据框

delmovies['genres']

movies_cleaned_genres = pd.merge(movies,movies_genres,how = 'right',left_on =['id','original_title'],right_on =['id','original_title'])
movies_cleaned_genres.head()


id

imdb_id

popularity

budget

revenue

original_title

cast

homepage

director

tagline

...

runtime

production_companies

release_date

vote_count

vote_average

release_year

budget_adj

revenue_adj

genre_n

genres

0

135397

tt0369610

32.985763

150000000

1513528810

Jurassic World

Chris Pratt|Bryce  Dallas Howard|Irrfan Khan|Vi...

http://www.jurassicworld.com/

Colin Trevorrow

The park is open.

...

124

Universal  Studios|Amblin Entertainment|Legenda...

2015-06-09

5562

6.5

2015

1.379999e+08

1.392446e+09

genres1

Action

1

135397

tt0369610

32.985763

150000000

1513528810

Jurassic World

Chris Pratt|Bryce  Dallas Howard|Irrfan Khan|Vi...

http://www.jurassicworld.com/

Colin Trevorrow

The park is open.

...

124

Universal  Studios|Amblin Entertainment|Legenda...

2015-06-09

5562

6.5

2015

1.379999e+08

1.392446e+09

genres2

Adventure

2

135397

tt0369610

32.985763

150000000

1513528810

Jurassic World

Chris Pratt|Bryce  Dallas Howard|Irrfan Khan|Vi...

http://www.jurassicworld.com/

Colin Trevorrow

The park is open.

...

124

Universal  Studios|Amblin Entertainment|Legenda...

2015-06-09

5562

6.5

2015

1.379999e+08

1.392446e+09

genres3

Science Fiction

3

135397

tt0369610

32.985763

150000000

1513528810

Jurassic World

Chris Pratt|Bryce  Dallas Howard|Irrfan Khan|Vi...

http://www.jurassicworld.com/

Colin Trevorrow

The park is open.

...

124

Universal  Studios|Amblin Entertainment|Legenda...

2015-06-09

5562

6.5

2015

1.379999e+08

1.392446e+09

genres4

Thriller

4

76341

tt1392190

28.419936

150000000

378436354

Mad Max: Fury Road

Tom Hardy|Charlize  Theron|Hugh Keays-Byrne|Nic...

http://www.madmaxmovie.com/

George Miller

What a Lovely Day.

...

120

Village Roadshow  Pictures|Kennedy Miller Produ...

2015-05-13

6185

7.1

2015

1.379999e+08

3.481613e+08

genres1

Action

5 rows × 22 columns

保存对genres处理后的文件

movies_cleaned_genres.to_csv('movies_cleaned_genres.csv',index= False)

问题 2 Universal Pictures Paramount Pictures 之间的对比情况如何?

Question 2: How dothe attributes differ between Universal Pictures and Paramount Pictures?

构建数据框子集movies_production_companies

movies_production_companies= movies[['id','original_title','production_companies']].reset_index(drop =True)
movies_production_companies.head()


id

original_title

production_companies

0

135397

Jurassic World

Universal  Studios|Amblin Entertainment|Legenda...

1

76341

Mad Max: Fury Road

Village Roadshow  Pictures|Kennedy Miller Produ...

2

262500

Insurgent

Summit  Entertainment|Mandeville Films|Red Wago...

3

140607

Star Wars: The  Force Awakens

Lucasfilm|Truenorth  Productions|Bad Robot

4

168259

Furious 7

Universal  Pictures|Original Film|Media Rights ...

采用正则表达式匹配UniversalParamount

deffind_Universal(production_company):
    try:
        match =re.search("(\|{0,1}[\w\s]*Universal[\w\s]*\|{0,1})",production_company)
        if match:
            returnmatch.group(0)
        else:
            return None
    except TypeError:
        return None
   
filtering_Universal = set()
for text in movies_production_companies['production_companies'].tolist():
    results =find_Universal(text)
   filtering_Universal.add(results)
print(filtering_Universal)

{'UniversalPictures International ', '|Universal Music', '|Universal Studios HomeEntertainment|', '|Universal|', 'Universal Studios', 'Universal Studios HomeEntertainment', 'Universal Cartoon Studios|', 'Universal Productions France S','|Universal Home Video', '|Universal Family and Home Entertainment','|Universal City Studios|', None, 'Universal Pictures', '|Universal PicturesInternational ', '|Universal City Studios', 'Universal', 'NBC UniversalTelevision', '|Universal CGI|', 'Universal Studios Home Entertainment FamilyProductions|', 'Universal Home Entertainment', 'Universal PicturesCorporation', 'Universal Pictures Germany GmbH', '|Universal Television', '|NBCUniversal Global Networks|', 'Universal TV|', '|Universal InternationalPictures ', 'Universal Studios|', '|Universal Pictures', 'Universal TV','Universal Cable Productions|', '|Universal Network Television|', '|UniversalPictures|', '|Universal Studios Home Entertainment', 'Universal Pictures|','|Universal Studios Sound Facilities', '|Universal Home Entertainment','Universal Cartoon Studios', '|Universal 1440 Entertainment|', 'NBC UniversalTelevision|', '|Universal 1440 Entertainment', 'Universal 1440 Entertainment','|Universal Cartoon Studios|'}

deffind_Paramount(production_company):
    try:
        match =re.search("(\|{0,1}[\w\s]*Paramount[\w\s]*\|{0,1})",production_company)
        if match:
            returnmatch.group(0)
        else:
            return None
    except TypeError:
        return None
   
filtering_Paramount = set()
for text in movies_production_companies['production_companies'].tolist():
    results =find_Paramount(text)
   filtering_Paramount.add(results)
print(filtering_Paramount)

{'|ParamountClassics|', '|Paramount Pictures', 'Paramount Pictures|', None, 'ParamountPictures', '|Paramount Pictures Digital Entertainment|', '|ParamountTelevision|', '|Paramount Classics', 'Paramount|', 'Paramount FamousProductions', '|Paramount Home Entertainment', 'Paramount Home Entertainment','Paramount Vantage', '|Paramount Vantage|', 'Paramount Vantage|', 'ParamountPictures Digital Entertainment', '|Paramount Vantage', '|Paramount Animation','Paramount Classics'}

转化production_company列的内容

def modified_as_Universal_or_Paramount(production_company):
    try:
        Universal =re.search("(\|{0,1}[\w\s]*Universal[\w\s]*\|{0,1})",production_company)
        Paramount =re.search("(\|{0,1}[\w\s]*Paramount[\w\s]*\|{0,1})",production_company)
        if Universal:
            return 'Universal'
        elif Paramount:
            return 'Paramount'
        else:
            return None
    except TypeError:
        return None
#
直接通过数据框调用函数

movies['production_companies'] =movies['production_companies'].apply(modified_as_Universal_or_Paramount)
movies.head()


id

imdb_id

popularity

budget

revenue

original_title

cast

homepage

director

tagline

keywords

overview

runtime

production_companies

release_date

vote_count

vote_average

release_year

budget_adj

revenue_adj

0

135397

tt0369610

32.985763

150000000

1513528810

Jurassic World

Chris Pratt|Bryce  Dallas Howard|Irrfan Khan|Vi...

http://www.jurassicworld.com/

Colin Trevorrow

The park is open.

monster|dna|tyrannosaurus  rex|velociraptor|island

Twenty-two years  after the events of Jurassic ...

124

Universal

2015-06-09

5562

6.5

2015

1.379999e+08

1.392446e+09

1

76341

tt1392190

28.419936

150000000

378436354

Mad Max: Fury Road

Tom Hardy|Charlize  Theron|Hugh Keays-Byrne|Nic...

http://www.madmaxmovie.com/

George Miller

What a Lovely Day.

future|chase|post-apocalyptic|dystopia|australia

An apocalyptic  story set in the furthest reach...

120

None

2015-05-13

6185

7.1

2015

1.379999e+08

3.481613e+08

2

262500

tt2908446

13.112507

110000000

295238201

Insurgent

Shailene  Woodley|Theo James|Kate Winslet|Ansel...

http://www.thedivergentseries.movie/#insurgent

Robert Schwentke

One Choice Can  Destroy You

based on  novel|revolution|dystopia|sequel|dyst...

Beatrice Prior  must confront her inner demons ...

119

None

2015-03-18

2480

6.3

2015

1.012000e+08

2.716190e+08

3

140607

tt2488496

11.173104

200000000

2068178225

Star Wars: The  Force Awakens

Harrison Ford|Mark  Hamill|Carrie Fisher|Adam D...

http://www.starwars.com/films/star-wars-episod...

J.J. Abrams

Every generation  has a story.

android|spaceship|jedi|space  opera|3d

Thirty years after  defeating the Galactic Empi...

136

None

2015-12-15

5292

7.5

2015

1.839999e+08

1.902723e+09

4

168259

tt2820852

9.335014

190000000

1506249360

Furious 7

Vin Diesel|Paul  Walker|Jason Statham|Michelle ...

http://www.furious7.com/

James Wan

Vengeance Hits  Home

car  race|speed|revenge|suspense|car

Deckard Shaw seeks  revenge against Dominic Tor...

137

Universal

2015-04-01

2947

7.3

2015

1.747999e+08

1.385749e+09

添加计算列profitprofit_rate,profit_adj

movies['profit']= movies['revenue'] - movies['budget']
movies['profit_rate'] =(movies['revenue']-movies['budget'])*100/movies['budget']
movies['profit_adj'] = movies['revenue_adj'] -movies['budget_adj']
movies.head()


id

imdb_id

popularity

budget

revenue

original_title

cast

homepage

director

tagline

...

production_companies

release_date

vote_count

vote_average

release_year

budget_adj

revenue_adj

profit

profit_rate

profit_adj

0

135397

tt0369610

32.985763

150000000

1513528810

Jurassic World

Chris Pratt|Bryce  Dallas Howard|Irrfan Khan|Vi...

http://www.jurassicworld.com/

Colin Trevorrow

The park is open.

...

Universal

2015-06-09

5562

6.5

2015

1.379999e+08

1.392446e+09

1363528810

909.019207

1.254446e+09

1

76341

tt1392190

28.419936

150000000

378436354

Mad Max: Fury Road

Tom Hardy|Charlize  Theron|Hugh Keays-Byrne|Nic...

http://www.madmaxmovie.com/

George Miller

What a Lovely Day.

...

None

2015-05-13

6185

7.1

2015

1.379999e+08

3.481613e+08

228436354

152.290903

2.101614e+08

2

262500

tt2908446

13.112507

110000000

295238201

Insurgent

Shailene  Woodley|Theo James|Kate Winslet|Ansel...

http://www.thedivergentseries.movie/#insurgent

Robert Schwentke

One Choice Can  Destroy You

...

None

2015-03-18

2480

6.3

2015

1.012000e+08

2.716190e+08

185238201

168.398365

1.704191e+08

3

140607

tt2488496

11.173104

200000000

2068178225

Star Wars: The  Force Awakens

Harrison Ford|Mark  Hamill|Carrie Fisher|Adam D...

http://www.starwars.com/films/star-wars-episod...

J.J. Abrams

Every generation  has a story.

...

None

2015-12-15

5292

7.5

2015

1.839999e+08

1.902723e+09

1868178225

934.089113

1.718723e+09

4

168259

tt2820852

9.335014

190000000

1506249360

Furious 7

Vin Diesel|Paul  Walker|Jason Statham|Michelle ...

http://www.furious7.com/

James Wan

Vengeance Hits  Home

...

Universal

2015-04-01

2947

7.3

2015

1.747999e+08

1.385749e+09

1316249360

692.762821

1.210949e+09

5 rows × 23 columns

#movies.to_csv('movies_production_company.csv')

问题 3改编电影和原创电影的对比情况如何?(通过keywords变量中的based on novel字段来判断)

Question 3: How havemovies based on novels performed relative to movies not based on novels?

构建一个子数据框movies_novels

movies_novels= movies[['id','original_title','keywords','tagline']].reset_index(drop =True)
movies_novels.head()


id

original_title

keywords

tagline

0

135397

Jurassic World

monster|dna|tyrannosaurus  rex|velociraptor|island

The park is open.

1

76341

Mad Max: Fury Road

future|chase|post-apocalyptic|dystopia|australia

What a Lovely Day.

2

262500

Insurgent

based on  novel|revolution|dystopia|sequel|dyst...

One Choice Can  Destroy You

3

140607

Star Wars: The  Force Awakens

android|spaceship|jedi|space  opera|3d

Every generation  has a story.

4

168259

Furious 7

car  race|speed|revenge|suspense|car

Vengeance Hits  Home

通过正则表达式分别在keywordstagline两列中匹配含有novel字符的记录

deffind_novel_from_keywords(keywords):
    try:
        match =re.search('(\|{0,1}[\w\s]*novel[\w\s]*\|{0,1})',keywords)
        if match:
            returnmatch.group(0)
        else:
            return None
    except TypeError:
        return None

words_with_novel = set()
for text in movies_novels['keywords'].tolist():
    results =find_novel_from_keywords(text)
    words_with_novel.add(results)
   
print(words_with_novel)
#
用来处理错误

{'|stolennovel|', 'based on novel|', '|tell all novel|', '|based on graphic novel','|novelist', None, '|novelist|', '|based on graphic novel|', 'based on novel','based on graphic novel|', '|inspired by novel', '|based on novel|', '|based onnovel'}

deffind_novel_based_movie(keywords):
    try:
        match1 =re.search('(\|{0,1}based[\w\s]*novel\|{0,1})',keywords)
        match2 =re.search('(\|{0,1}inspired[\w\s]*novel\|{0,1})',keywords)
        if match1 or match2:
            return True
        else:
            return False
    except TypeError:
        returnNone
movies_novels['based_on_novel_0'] =movies_novels['keywords'].apply(find_novel_based_movie)
movies_novels.head()


id

original_title

keywords

tagline

based_on_novel_0

0

135397

Jurassic World

monster|dna|tyrannosaurus  rex|velociraptor|island

The park is open.

False

1

76341

Mad Max: Fury Road

future|chase|post-apocalyptic|dystopia|australia

What a Lovely Day.

False

2

262500

Insurgent

based on  novel|revolution|dystopia|sequel|dyst...

One Choice Can  Destroy You

True

3

140607

Star Wars: The  Force Awakens

android|spaceship|jedi|space  opera|3d

Every generation  has a story.

False

4

168259

Furious 7

car  race|speed|revenge|suspense|car

Vengeance Hits  Home

False

deffind_novel_from_tagline(tagline):
    try:
        match =re.search('(.*novel.*)',tagline)
        if match:
            returnmatch.group(0)
        else:
            return None
    except TypeError:
        return None

words_with_novel_tagline = set()
for text in movies_novels['tagline'].tolist():
    results =find_novel_from_tagline(text)
   words_with_novel_tagline.add(results)
   
print(words_with_novel_tagline)

{'Basedon the novel by Henry James', 'Based on the novel of Chico Xavier', None, 'The#1 novel of the year - now a motion picture!', 'Based on the best-sellingnovel'}

deffind_novel_based_movie_1(tagline):
    try:
        match =re.search('(.*novel.*)',tagline)
       
        if match:
            return True
        else:
            return False
    except TypeError:
        returnNone
movies_novels['based_on_novel_1'] =movies_novels['tagline'].apply(find_novel_based_movie_1)
movies_novels.head()


id

original_title

keywords

tagline

based_on_novel_0

based_on_novel_1

0

135397

Jurassic World

monster|dna|tyrannosaurus  rex|velociraptor|island

The park is open.

False

False

1

76341

Mad Max: Fury Road

future|chase|post-apocalyptic|dystopia|australia

What a Lovely Day.

False

False

2

262500

Insurgent

based on  novel|revolution|dystopia|sequel|dyst...

One Choice Can  Destroy You

True

False

3

140607

Star Wars: The  Force Awakens

android|spaceship|jedi|space  opera|3d

Every generation  has a story.

False

False

4

168259

Furious 7

car  race|speed|revenge|suspense|car

Vengeance Hits  Home

False

False

通过逻辑值的运算,合并taglinekeywords两列包含novel的结果

movies_novels['based_on_novel']= movies_novels['based_on_novel_0'] +movies_novels['based_on_novel_1']
movies_novels.drop(['keywords','tagline','based_on_novel_0','based_on_novel_1'],axis= 1, inplace = True)
movies_novels.head()


id

original_title

based_on_novel

0

135397

Jurassic World

0

1

76341

Mad Max: Fury Road

0

2

262500

Insurgent

1

3

140607

Star Wars: The  Force Awakens

0

4

168259

Furious 7

0

print(movies_novels['based_on_novel'].unique())

[0 1 nan2]

movies_novels[movies_novels['based_on_novel']==2]


id

original_title

based_on_novel

10660

10671

Airport

2

movies.loc[movies["original_title"]=="Airport",["original_title","keywords","tagline"]]


original_title

keywords

tagline

10660

Airport

bomb|based on  novel|airport|desperation|snow s...

The #1 novel of  the year - now a motion picture!

转换成逻辑值

defconvert_to_bool(value):
    if value==1 or value==2:
        return True
    else:
        returnFalse
movies_novels['based_on_novel'] =movies_novels['based_on_novel'].apply(convert_to_bool)
movies_novels.head()


id

original_title

based_on_novel

0

135397

Jurassic World

False

1

76341

Mad Max: Fury Road

False

2

262500

Insurgent

True

3

140607

Star Wars: The  Force Awakens

False

4

168259

Furious 7

False

合并moviesmovies_novels,合并之前删除keywordstagline

movies.drop(['keywords','tagline'],axis= 1, inplace = True)

movies = pd.merge(movies,movies_novels,how = 'left',left_on =['id','original_title'],right_on = ['id','original_title'])
movies.head()


id

imdb_id

popularity

budget

revenue

original_title

cast

homepage

director

overview

...

release_date

vote_count

vote_average

release_year

budget_adj

revenue_adj

profit

profit_rate

profit_adj

based_on_novel

0

135397

tt0369610

32.985763

150000000

1513528810

Jurassic World

Chris Pratt|Bryce  Dallas Howard|Irrfan Khan|Vi...

http://www.jurassicworld.com/

Colin Trevorrow

Twenty-two years  after the events of Jurassic ...

...

2015-06-09

5562

6.5

2015

1.379999e+08

1.392446e+09

1363528810

909.019207

1.254446e+09

False

1

76341

tt1392190

28.419936

150000000

378436354

Mad Max: Fury Road

Tom Hardy|Charlize  Theron|Hugh Keays-Byrne|Nic...

http://www.madmaxmovie.com/

George Miller

An apocalyptic  story set in the furthest reach...

...

2015-05-13

6185

7.1

2015

1.379999e+08

3.481613e+08

228436354

152.290903

2.101614e+08

False

2

262500

tt2908446

13.112507

110000000

295238201

Insurgent

Shailene  Woodley|Theo James|Kate Winslet|Ansel...

http://www.thedivergentseries.movie/#insurgent

Robert Schwentke

Beatrice Prior  must confront her inner demons ...

...

2015-03-18

2480

6.3

2015

1.012000e+08

2.716190e+08

185238201

168.398365

1.704191e+08

True

3

140607

tt2488496

11.173104

200000000

2068178225

Star Wars: The  Force Awakens

Harrison Ford|Mark  Hamill|Carrie Fisher|Adam D...

http://www.starwars.com/films/star-wars-episod...

J.J. Abrams

Thirty years after  defeating the Galactic Empi...

...

2015-12-15

5292

7.5

2015

1.839999e+08

1.902723e+09

1868178225

934.089113

1.718723e+09

False

4

168259

tt2820852

9.335014

190000000

1506249360

Furious 7

Vin Diesel|Paul  Walker|Jason Statham|Michelle ...

http://www.furious7.com/

James Wan

Deckard Shaw seeks  revenge against Dominic Tor...

...

2015-04-01

2947

7.3

2015

1.747999e+08

1.385749e+09

1316249360

692.762821

1.210949e+09

False

5 rows × 22 columns

movies.to_csv('movies_cleaned.csv',index= False)

参考文献

1.Tidydata in Python http://www.jeannicholashould.com/tidy-data-in-python.html

2.Udacity

3.https://www.youtube.com/watch?v=2CwzOjYbi-w&list=PLXbU-2B80FvCKj0aqdpudCqpif2vNuING&index=1

4.原始数据链接 https://d17h27t6h515a5.cloudfront.net/topher/2017/January/587e7057_movies/movies.csv

以上是关于电影数据可视化项目--数据清理的主要内容,如果未能解决你的问题,请参考以下文章

毕业设计之 - 题目:基于大数据的电影数据分析可视化系统

python实现的带GUI界面电影票房数据可视化程序

keggle项目 之 TMDb电影最赚钱电影有那特征

毕业设计-基于大数据的电影爬取与可视化分析系统-python

记录一次真实的接单记录:猫眼电影数据可视化,三个小时完成收入1000

python数据分析实战电影票房数据分析数据采集