Data Cleaning 3

Posted 阿难的机器学习计划

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Data Cleaning 3相关的知识,希望对你有一定的参考价值。

1. Find correlations for each type of data by using corr()

  correlations = combined.corr(method = "pearson")
  print(correlations["sat_score"])

note: The value of correlation is from -1 to 1. If the data close to 1, they are positive correlated. If the value close to -1, they are negative correlated. If the data close to 0, they are not correlated.  

2. Then we can plot these data by using plot() function.

  %matplotlib inline

  import matplotlib.pyplot as plt

  combined.plot(‘total_enrollment‘,‘sat_score‘,kind = "scatter") #plot(x,y,kind)

3. Then we can filter the data to digging some info we need. 

4. We mapping out the school we need in certain area.

  from mpl_toolkits.basemap import Basemap

  m = Basemap(projection = "merc",llcrnrlat = 40.496044, urcrnrlat = 40.915256, llcrnrlon = -74.255735,urcrnrlon = -73.700272,resolution = "i") # urcrnrlon =  upper right corner longititude. llcrnrlon = lower left corner longitude. urcrnrlat = upper right corner latitute,llcrnrlat = lower left corner latitude.
  m.drawmapboundary(fill_color=‘#85A6D9‘)
  m.drawcoastlines(color=‘#6D5F47‘, linewidth=.4)
  m.drawrivers(color=‘#6D5F47‘, linewidth=.4)

  latitudes = combined["lat"].tolist()
  longitudes = combined["lon"].tolist()

  m.scatter(longitudes,latitudes,s = 20, zorder = 2 , latlon = True ) # scatter can only shows the list.

5. We can change the parameter of the scatter() to change the 
  plt.show

以上是关于Data Cleaning 3的主要内容,如果未能解决你的问题,请参考以下文章

Data Cleaning 3

Data Cleaning 5

data cleaning

Data Cleaning 4

importing-cleaning-data-in-r-case-studies

Data Cleaning_Chicago Air-quality Case_TBC!!!