Working With Data Sources 10
Posted 阿难的机器学习计划
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Working With Data Sources 10相关的知识,希望对你有一定的参考价值。
Preparing Data for SQL:
Sometimes we would like to stroe data into SQL server. However , the dataset need to be cleaned before it is sent. So here we use pandas to deal with dataset(.csv)file.
1. read_csv, set encoding:
file = pd.read_csv("academy_awards.csv",encoding = ‘ISO-8859-1‘)
2. Use str function to read first 4 letters of all the strings in the column.
file["Year"] = file["Year"].str[0:4]
3. Use .isin function to get the target rows I need:
award_categories = ["Actor -- Leading Role","Actor -- Supporting Role",‘Actress -- Leading Role‘,‘Actress -- Supporting Role‘]
nominations = later_than_2000[later_than_2000.isin(award_categories)[‘Category‘]]
4. Use .map() function to replace all the element in the column as I need:
won_dic = {
‘NO‘ : 0,
‘YES‘: 1
}
nominations.is_copy = False #Attention, here we can not directly modify the copied dataframe, we have to run this line to make copied dataframe changable.
nominations["Won?"] = nominations["Won?"].map(won_dic)
5. Use .drop() function to get rid of columns I do not need:
final_nominations = nominations.drop(delete_list,axis = 1)
6. Use vectorized string method to modify each string in a column in the dataframe:
additional_info_one = final_nominations["Additional Info"].str.rstrip("‘}") #rstrip is to get rid of all the strings on the right side of the target string in the bracket.
additional_info_two = additional_info_one.str.split("{.")
movie_names = additional_info_two.str[0]
characters = additional_info_two.str[1]
7. Use to_sql request to save the dataset into the sql:
final_nominations.to_sql("nominations",conn,index = False)
以上是关于Working With Data Sources 10的主要内容,如果未能解决你的问题,请参考以下文章