逻辑回归--数据独热编码+数据结果可视化

Posted soyosuyang

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了逻辑回归--数据独热编码+数据结果可视化相关的知识,希望对你有一定的参考价值。

#-*- coding: utf-8 -*-
\'\'\'
在数据处理和特征工程中,经常会遇到类型数据,如性别分为[男,女](暂不考虑其他。。。。),手机运营商分为[移动,联通,电信]等,我们通常将其转为数值带入模型,如[0,1], [-1,0,1]等,但模型往往默认为连续型数值进行处理.
独热编码便是解决这个问题,其方法是使用N位状态寄存器来对N个状态进行编码,每个状态都由他独立的寄存器位,并且在任意时候,其中只有一位有效。
可以理解为对有m个取值的特征,经过独热编码处理后,转为m个二元特征(值只有0和1),每次只有一个激活。
基于树的方法是不需要进行特征的归一化,例如随机森林,bagging 和 boosting等。基于参数的模型或基于距离的模型,都是要进行特征的归一化。
@author: soyo
\'\'\'
import pandas as pd
import numpy
import matplotlib.pylab as pl  
from sklearn.linear_model import LogisticRegression 
from sklearn.cross_validation import train_test_split
from sklearn.utils.extmath import cartesian

#numpy.set_printoptions(threshold=numpy.inf)  #目的是将print省略的部分都输出
data=pd.read_csv("/home/soyo/文档/LogisticRegression.csv")
print data
print data.head(5)
data_dum=pd.get_dummies(data,prefix=\'rank\',columns=[\'rank\'],drop_first=True) #类别型变量进行独热编码,drop_first=True:删掉了本该有的rank_1
print data_dum.head(5)
print "*********************"
print data_dum.ix[:,1:].head(5)
print data_dum.ix[:,0].head(5)
x_train,x_test,y_train,y_test=train_test_split(data_dum.ix[:,1:],data_dum.ix[:,0],test_size=0.1,random_state=1)  #x:代表的是数据特征,y:代表的是类标(lable),都被随机的拆分开做交叉验证
print len(x_train),len(x_test)
print x_train
numpy.savetxt(\'/home/soyo/文档/new.csv\', x_train,fmt="%d", delimiter = \',\')  
print len(y_train),len(y_test)
numpy.savetxt(\'/home/soyo/文档/new2.csv\', y_train,fmt="%d", delimiter = \',\')  
print y_train
print "***********"
print y_test
lr=LogisticRegression()
lr.fit(x_train,y_train)
print "预测结果:"
print lr.predict(x_test)
print "真实label:"
print  numpy.array(y_test)
print "逻辑回归的准确率为:{0:.3f}%".format(lr.score(x_test, y_test))
print "根据组合数据分析数据之间的关系"
gres=numpy.linspace(data[\'gre\'].min(),data[\'gre\'].max(),20)
print gres
gpas=numpy.linspace(data[\'gpa\'].min(),data[\'gpa\'].max(),20)
print gpas
# numpy.set_printoptions(threshold=numpy.inf)  #目的是将print省略的部分都输出
print cartesian([gres,gpas,[1,2,3,4],[1.]])    #数据组合:组合后的总个数->20*20*4*1=1600个
data_new=pd.DataFrame(cartesian([gres,gpas,[1,2,3,4],[1.]]))
print data_new
data_new.columns=[\'gre\',\'gpa\',\'ranks\',\'intercept\']
print data_new
dummy_ranks=pd.get_dummies(data_new[\'ranks\'],prefix=\'ranks\') #prefix:前缀名
# print dummy_ranks
dummy_ranks.columns=[\'ranks_1\',\'ranks_2\',\'ranks_3\',\'ranks_4\']
print dummy_ranks
cols_to_keep=[\'gre\',\'gpa\']
cobs=data_new[cols_to_keep].join(dummy_ranks.ix[:,\'ranks_2\':])
print "*********6"
print cobs
print lr.predict(cobs)
data_new[\'predict_admit\']=lr.predict(cobs)
# data_new[\'predict_admit\']=numpy.linspace(5,100,1600)
print data_new
grouped=pd.pivot_table(data_new,values=[\'predict_admit\'],index=[\'gre\',\'ranks\'],aggfunc=numpy.mean)
print grouped
print "*********9"
print grouped.index.get_level_values(1)
print grouped.ix[grouped.index.get_level_values(1)==2].index.get_level_values(0)

def target_plot(x):
    grouped=pd.pivot_table(data_new,values=[\'predict_admit\'],index=[x,\'ranks\'],aggfunc=numpy.mean)  #pivot_table:数据透视表->为了聚合统计数据
    colors=\'rbgyrbgy\'
    for col in data_new.ranks.unique():
        plt_data=grouped.ix[grouped.index.get_level_values(1)==col]
        pl.plot(plt_data.index.get_level_values(0),plt_data[\'predict_admit\'],color=colors[int(col)]) 
    pl.xlabel(x)
    pl.ylabel("P(admit=1")
    pl.legend([\'1\',\'2\',\'3\',\'4\'],loc=\'upper left\',title=\'ranks\')
    pl.title("soyo")
    pl.show()
target_plot(\'gpa\')       

结果:

     admit  gre   gpa  rank
0        0  380  3.61     3
1        1  660  3.67     3
2        1  800  4.00     1
3        1  640  3.19     4
4        0  520  2.93     4
5        1  760  3.00     2
6        1  560  2.98     1
7        0  400  3.08     2
8        1  540  3.39     3
9        0  700  3.92     2
10       0  800  4.00     4
11       0  440  3.22     1
12       1  760  4.00     1
13       0  700  3.08     2
14       1  700  4.00     1
15       0  480  3.44     3
16       0  780  3.87     4
17       0  360  2.56     3
18       0  800  3.75     2
19       1  540  3.81     1
20       0  500  3.17     3
21       1  660  3.63     2
22       0  600  2.82     4
23       0  680  3.19     4
24       1  760  3.35     2
25       1  800  3.66     1
26       1  620  3.61     1
27       1  520  3.74     4
28       1  780  3.22     2
29       0  520  3.29     1
..     ...  ...   ...   ...
370      1  540  3.77     2
371      1  680  3.76     3
372      1  680  2.42     1
373      1  620  3.37     1
374      0  560  3.78     2
375      0  560  3.49     4
376      0  620  3.63     2
377      1  800  4.00     2
378      0  640  3.12     3
379      0  540  2.70     2
380      0  700  3.65     2
381      1  540  3.49     2
382      0  540  3.51     2
383      0  660  4.00     1
384      1  480  2.62     2
385      0  420  3.02     1
386      1  740  3.86     2
387      0  580  3.36     2
388      0  640  3.17     2
389      0  640  3.51     2
390      1  800  3.05     2
391      1  660  3.88     2
392      1  600  3.38     3
393      1  620  3.75     2
394      1  460  3.99     3
395      0  620  4.00     2
396      0  560  3.04     3
397      0  460  2.63     2
398      0  700  3.65     2
399      0  600  3.89     3

[400 rows x 4 columns]
   admit  gre   gpa  rank
0      0  380  3.61     3
1      1  660  3.67     3
2      1  800  4.00     1
3      1  640  3.19     4
4      0  520  2.93     4
   admit  gre   gpa  rank_2  rank_3  rank_4
0      0  380  3.61     0.0     1.0     0.0
1      1  660  3.67     0.0     1.0     0.0
2      1  800  4.00     0.0     0.0     0.0
3      1  640  3.19     0.0     0.0     1.0
4      0  520  2.93     0.0     0.0     1.0
*********************
   gre   gpa  rank_2  rank_3  rank_4
0  380  3.61     0.0     1.0     0.0
1  660  3.67     0.0     1.0     0.0
2  800  4.00     0.0     0.0     0.0
3  640  3.19     0.0     0.0     1.0
4  520  2.93     0.0     0.0     1.0
0    0
1    1
2    1
3    1
4    0
Name: admit, dtype: int64
360 40
     gre   gpa  rank_2  rank_3  rank_4
268  680  3.46     1.0     0.0     0.0
204  600  3.89     0.0     0.0     0.0
171  540  2.81     0.0     1.0     0.0
62   640  3.67     0.0     1.0     0.0
385  420  3.02     0.0     0.0     0.0
85   520  2.98     1.0     0.0     0.0
389  640  3.51     1.0     0.0     0.0
307  580  3.51     1.0     0.0     0.0
314  540  3.46     0.0     0.0     1.0
278  680  3.00     0.0     0.0     1.0
65   600  3.59     1.0     0.0     0.0
225  720  3.50     0.0     1.0     0.0
229  720  3.42     1.0     0.0     0.0
18   800  3.75     1.0     0.0     0.0
296  560  3.16     0.0     0.0     0.0
286  800  3.22     0.0     0.0     0.0
272  680  3.67     1.0     0.0     0.0
117  700  3.72     1.0     0.0     0.0
258  520  3.51     1.0     0.0     0.0
360  520  4.00     0.0     0.0     0.0
107  480  3.13     1.0     0.0     0.0
67   620  3.30     0.0     0.0     0.0
234  800  3.53     0.0     0.0     0.0
246  680  3.34     1.0     0.0     0.0
354  540  3.78     1.0     0.0     0.0
222  480  3.02     0.0     0.0     0.0
106  700  3.56     0.0     0.0     0.0
310  560  4.00     0.0     1.0     0.0
270  640  3.95     1.0     0.0     0.0
312  660  3.77     0.0     1.0     0.0
..   ...   ...     ...     ...     ...
317  780  3.63     0.0     0.0     1.0
319  540  3.28     0.0     0.0     0.0
7    400  3.08     1.0     0.0     0.0
141  700  3.52     0.0     0.0     1.0
86   600  3.32     1.0     0.0     0.0
352  580  3.12     0.0     1.0     0.0
241  520  3.81     0.0     0.0     0.0
215  660  2.91     0.0     1.0     0.0
68   580  3.69     0.0     0.0     0.0
50   640  3.86     0.0     1.0     0.0
156  560  2.52     1.0     0.0     0.0
252  520  4.00     1.0     0.0     0.0
357  720  3.31     0.0     0.0     0.0
254  740  3.52     0.0     0.0     1.0
276  460  3.77     0.0     1.0     0.0
178  620  3.33     0.0     1.0     0.0
281  360  3.27     0.0     1.0     0.0
237  480  4.00     1.0     0.0     0.0
71   300  2.92     0.0     0.0     1.0
129  460  3.15     0.0     0.0     1.0
144  580  3.40     0.0     0.0     1.0
335  620  3.71     0.0     0.0     0.0
133  500  3.08     0.0     1.0     0.0
203  420  3.92     0.0     0.0     1.0
393  620  3.75     1.0     0.0     0.0
255  640  3.35     0.0     1.0     0.0
72   480  3.39     0.0     0.0     1.0
396  560  3.04     0.0     1.0     0.0
235  620  3.05     1.0     0.0     0.0
37   520  2.90     0.0     1.0     0.0

[360 rows x 5 columns]
360 40
268    1
204    1
171    0
62     0
385    0
85     0
389    0
307    0
314    0
278    1
65     0
225    1
229    1
18     0
296    0
286    1
272    1
117    0
258    0
360    1
107    0
67     0
234    1
246    0
354    1
222    1
106    1
310    0
270    1
312    0
      ..
317    1
319    0
7      0
141    1
86     0
352    1
241    1
215    1
68     0
50     0
156    0
252    1
357    0
254    1
276    0
178    0
281    0
237    0
71     0
129    0
144    0
335    1
133    0
203    0
393    1
255    0
72     0
396    0
235    0
37     0
Name: admit, dtype: int64
***********
398    0
125    0
328    0
339    1
172    0
342    0
197    1
291    0
29     0
284    1
174    0
372    1
188    0
324    0
321    0
227    0
371    1
5      1
78     0
223    0
122    0
242    1
382    0
214    1
17     0
92     0
366    0
201    1
361    1
207    1
81     0
4      0
165    0
275    1
6      1
80     0
58     0
102    0
397    0
139    1
Name: admit, dtype: int64
预测结果:
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0
 0 0 1]
真实label:
[0 0 0 1 0 0 1 0 0 1 0 1 0 0 0 0 1 1 0 0 0 1 0 1 0 0 0 1 1 1 0 0 0 1 1 0 0
 0 0 1]
逻辑回归的准确率为:0.675%
根据组合数据分析数据之间的关系
[ 220.          250.52631579  281.05263158  311.57894737  342.10526316
  372.63157895  403.15789474  433.68421053  464.21052632  494.73684211
  525.26315789  555.78947368  586.31578947  616.84210526  647.36842105
  677.89473684  708.42105263  738.94736842  769.47368421  800.        ]
[ 2.26        2.35157895  2.44315789  2.53473684  2.62631579  2.71789474
  2.80947368  2.90105263  2.99263158  3.08421053  3.17578947  3.26736842
  3.35894737  3.45052632  3.54210526  3.63368421  3.72526316  3.81684211
  3.90842105  4.        ]
[[ 220.      2.26    1.      1.  ]
 [ 220.      2.26    2.      1.  ]
 [ 220.      2.26    3.      1.  ]
 ..., 
 [ 800.      4.      2.      1.  ]
 [ 800.      4.      3.      1.  ]
 [ 800.      4.      4.      1.  ]]
          0         1    2    3
0     220.0  2.260000  1.0  1.0
1     220.0  2.260000  2.0  1.0
2     220.0  2.260000  3.0  1.0
3     220.0  2.260000  4.0  1.0
4     220.0  2.351579  1.0  1.0
5     220.0  2.351579  2.0  1.0
6     220.0  2.351579  3.0  1.0
7     220.0  2.351579  4.0  1.0
8     220.0  2.443158  1.0  1.0
9     220.0  2.443158  2.0  1.0
10    220.0  2.443158  3.0  1.0
11    220.0  2.443158  4.0  1.0
12    220.0  2.534737  1.0  1.0
13    220.0  2.534737  2.0  1.0
14    220.0  2.534737  3.0  1.0
15    220.0  2.534737  4.0  1.0
16    220.0  2.626316  1.0  1.0
17    220.0  2.626316  2.0  1.0
18    220.0  2.626316  3.0  1.0
19    220.0  2.626316  4.0  1.0
20    220.0  2.717895  1.0  1.0
21    220.0  2.717895  2.0  1.0
22    220.0  2.717895  3.0  1.0
23    220.0  2.717895  4.0  1.0
24    220.0  2.809474  1.0  1.0
25    220.0  2.809474  2.0  1.0
26    220.0  2.809474  3.0  1.0
27    220.0  2.809474  4.0  1.0
28    220.0  2.901053  1.0  1.0
29    220.0  2.901053  2.0  1.0
...     ...       ...  ...  ...
1570  800.0  3.358947  3.0  1.0
1571  800.0  3.358947  4.0  1.0
1572  800.0  3.450526  1.0  1.0
1573  800.0  3.450526  2.0  1.0
1574  800.0  3.450526  3.0  1.0
1575  800.0  3.450526  4.0  1.0
1576  800.0  3.542105  1.0  1.0
1577  800.0  3.542105  2.0  1.0
1578  800.0  3.542105  3.0  1.0
1579  800.0  3.542105  4.0  1.0
1580  800.0  3.633684  1.0  1.0
1581  800.0  3.633684  2.0  1.0
1582  800.0  3.633684  3.0  1.0
1583  800.0  3.633684  4.0  1.0
1584  800.0  3.725263  1.0  1.0
1585  800.0  3.725263  2.0  1.0
1586  800.0  3.725263  3.0  1.0
1587  800.0  3.725263  4.0  1.0
1588  800.0  3.816842  1.0  1.0
1589  800.0  3.816842  2.0  1.0
1590  800.0  3.816842  3.0  1.0
1591  800.0  3.816842  4.0  1.0
1592  800.0  3.908421  1.0  1.0
1593  800.0  3.908421  2.0  1.0
1594  800.0  3.908421  3.0  1.0
1595  800.0  3.908421  4.0  1.0
1596  800.0  4.000000  1.0  1.0
1597  800.0  4.000000  2.0  1.0
1598  800.0  4.000000  3.0  1.0
1599  800.0  4.000000  4.0  1.0

[1600 rows x 4 columns]
        gre       gpa  ranks  intercept
0     220.0  2.260000    1.0        1.0
1     220.0  2.260000    2.0        1.0
2     220.0  2.260000    3.0        1.0
3     220.0  2.260000    4.0        1.0
4     220.0  2.351579    1.0        1.0
5     220.0  2.351579    2.0        1.0
6     220.0  2.351579    3.0        1.0
7     220.0  2.351579    4.0        1.0
8     220.0  2.443158    1.0        1.0
9     220.0  2.443158    2.0        1.0
10    220.0  2.443158    3.0        1.0
11    220.0  2.443158    4.0        1.0
12    220.0  2.534737    1.0        1.0
13    220.0  2.534737    2.0        1.0
14    220.0  2.534737    3.0        1.0
15    220.0  2.534737    4.0        1.0
16    220.0  2.626316    1.0        1.0
17    220.0  2.626316    2.0        1.0
18    220.0  2.626316    3.0        1.0
19    220.0  2.626316    4.0        1.0
20    220.0  2.717895    1.0        1.0
21    220.0  2.717895    2.0        1.0
22    220.0  2.717895    3.0        1.0
23    220.0  2.717895    4.0        1.0
24    220.0  2.809474    1.0        1.0
25    220.0  2.809474    2.0        1.0
26    220.0  2.809474    3.0        1.0
27    220.0  2.809474    4.0        1.0
28    220.0  2.901053    1.0        1.0
29    220.0  2.901053    2.0        1.0
...     ...       ...    ...        ...
1570  800.0  3.358947    3.0        1.0
1571  800.0  3.358947    4.0        1.0
1572  800.0  3.450526    1.0        1.0
1573  800.0  3.450526    2.0        1.0
1574  800.0  3.450526    3.0        1.0
1575  800.0  3.450526    4.0        1.0
1576  800.0  3.542105    1.0        1.0
1577  800.0  3.542105    2.0        1.0
1578  800.0  3.542105    3.0        1.0
1579  800.0  3.542105    4.0        1.0
1580  800.0  3.633684    1.0        1.0
1581  800.0  3.633684    2.0        1.0
1582  800.0  3.633684    3.0        1.0
1583  800.0  3.633684    4.0        1.0
1584  800.0  3.725263    1.0        1.0
1585  800.0  3.725263    2.0        1.0
1586  800.0  3.725263    3.0        1.0
1587  800.0  3.725263    4.0        1.0
1588  800.0  3.816842    1.0        1.0
1589  800.0  3.816842    2.0        1.0
1590  800.0  3.816842    3.0        1.0
1591  800.0  3.816842    4.0        1.0
1592  800.0  3.908421    1.0        1.0
1593  800.0  3.908421    2.0        1.0
1594  800.0  3.908421    3.0        1.0
1595  800.0  3.908421    4.0        1.0
1596  800.0  4.000000    1.0        1.0
1597  800.0  4.000000    2.0        1.0
1598  800.0  4.000000    3.0        1.0
1599  800.0  4.000000    4.0        1.0

[1600 rows x 4 columns]
      ranks_1  ranks_2  ranks_3  ranks_4
0         1.0      0.0      0.0      0.0
1         0.0      1.0      0.0      0.0
2         0.0      0.0      1.0      0.0
3         0.0      0.0      0.0      1.0
4         1.0      0.0      0.0      0.0
5         0.0      1.0      0.0      0.0
6         0.0      0.0      1.0      0.0
7         0.0      0.0      0.0      1.0
8         1.0      0.0      0.0      0.0
9         0.0      1.0      0.0      0.0
10        0.0      0.0      1.0      0.0
11        0.0      0.0      0.0      1.0
12        1.0      0.0      0.0      0.0
13        0.0      1.0      0.0      0.0
14        0.0      0.0      1.0      0.0
15        0.0      0.0      0.0      1.0
16        1.0      0.0      0.0      0.0
17        0.0      1.0      0.0      0.0
18        0.0      0.0      1.0      0.0
19        0.0      0.0      0.0      1.0
20        1.0      0.0      0.0      0.0
21        0.0      1.0      0.0      0.0
22        0.0      0.0      1.0      0.0
23        0.0      0.0      0.0      1.0
24        1.0      0.0      0.0      0.0
25        0.0      1.0      0.0      0.0
26        0.0      0.0      1.0      0.0
27        0.0      0.0      0.0      1.0
28        1.0      0.0      0.0      0.0
29        0.0      1.0      0.0      0.0
...       ...      ...      ...      ...
1570      0.0      0.0      1.0      0.0
1571      0.0      0.0      0.0      1.0
1572      1.0      0.0      0.0      0.0
1573      0.0      1.0      0.0      0.0
1574      0.0      0.0      1.0      0.0
1575      0.0      0.0      0.0      1.0
1576      1.0      0.0      0.0      0.0
1577      0.0      1.0      0.0      0.0
1578      0.0      0.0      1.0      0.0
1579      0.0      0.0      0.0      1.0
1580      1.0      0.0      0.0      0.0
1581      0.0      1.0      0.0      0.0
1582      0.0      0.0      1.0      0.0
1583      0.0      0.0      0.0      1.0
1584      1.0      0.0      0.0      0.0
1585      0.0      1.0      0.0      0.0
1586      0.0      0.0      1.0      0.0
1587      0.0      0.0      0.0      1.0
1588      1.0      0.0      0.0      0.0
1589      0.0      1.0      0.0      0.0
1590      0.0      0.0      1.0      0.0
1591      0.0      0.0      0.0      1.0
1592      1.0      0.0      0.0      0.0
1593      0.0      1.0      0.0      0.0
1594      0.0      0.0      1.0      0.0
1595      0.0      0.0      0.0      1.0
1596      1.0      0.0      0.0      0.0
1597      0.0      1.0      0.0      0.0
1598      0.0      0.0      1.0      0.0
1599      0.0      0.0      0.0      1.0

[1600 rows x 4 columns]
*********6
        gre       gpa  ranks_2  ranks_3  ranks_4
0     220.0  2.260000      0.0      0.0      0.0
1     220.0  2.260000      1.0      0.0      0.0
2     220.0  2.260000      0.0      1.0      0.0
3     220.0  2.260000      0.0      0.0      1.0
4     220.0  2.351579      0.0      0.0      0.0
5     220.0  2.351579      1.0      0.0      0.0
6     220.0  2.351579      0.0      1.0      0.0
7     220.0  2.351579      0.0      0.0      1.0
8     220.0  2.443158      0.0      0.0      0.0
9     220.0  2.443158      1.0      0.0      0.0
10    220.0  2.443158      0.0      1.0      0.0
11    220.0  2.443158      0.0      0.0      1.0
12    220.0  2.534737      0.0      0.0      0.0
13    220.0  2.534737      1.0      0.0      0.0
14    220.0  2.534737      0.0      1.0      0.0
15    220.0  2.534737      0.0      0.0      1.0
16    220.0  2.626316      0.0      0.0      0.0
17    220.0  2.626316      1.0      0.0      0.0
18    220.0  2.626316      0.0      1.0      0.0
19    220.0  2.626316      0.0      0.0      1.0
20    220.0  2.717895      0.0      0.0      0.0
21    220.0  2.717895      1.0      0.0      0.0
22    220.0  2.717895      0.0      1.0      0.0
23    220.0  2.717895      0.0      0.0      1.0
24    220.0  2.809474      0.0      0.0      0.0
25    220.0  2.809474      1.0      0.0      0.0
26    220.0  2.809474      0.0      1.0      0.0
27    220.0  2.809474      0.0      0.0      1.0
28    220.0  2.901053      0.0      0.0      0.0
29    220.0  2.901053      1.0      0.0      0.0
...     ...       ...      ...      ...      ...
1570  800.0  3.358947      0.0      1.0      0.0
1571  800.0  3.358947      0.0      0.0      1.0
1572  800.0  3.450526      0.0      0.0      0.0
1573  800.0  3.450526      1.0      0.0      0.0
1574  800.0  3.450526      0.0      1.0      0.0
1575  800.0  3.450526      0.0      0.0      1.0
1576  800.0  3.542105      0.0      0.0      0.0
1577  800.0  3.542105      1.0      0.0      0.0
1578  800.0  3.542105      0.0      1.0      0.0
1579  800.0  3.542105      0.0      0.0      1.0
1580  800.0  3.633684      0.0      0.0      0.0
1581  800.0  3.633684      1.0      0.0      0.0
1582  800.0  3.633684      0.0      1.0      0.0
1583  800.0  3.633684      0.0      0.0      1.0
1584  800.0  3.725263      0.0      0.0      0.0
1585  800.0  3.725263      1.0      0.0      0.0
1586  800.0  3.725263      0.0      1.0      0.0
1587  800.0  3.725263      0.0      0.0      1.0
1588  800.0  3.816842      0.0      0.0      0.0
1589  800.0  3.816842      1.0      0.0      0.0
1590  800.0  3.816842      0.0      1.0      0.0
1591  800.0  3.816842      0.0      0.0      1.0
1592  800.0  3.908421      0.0      0.0      0.0
1593  800.0  3.908421      1.0      0.0      0.0
1594  800.0  3.908421      0.0      1.0      0.0
1595  800.0  3.908421      0.0      0.0      1.0
1596  800.0  4.000000      0.0      0.0      0.0
1597  800.0  4.000000      1.0      0.0      0.0
1598  800.0  4.000000      0.0      1.0      0.0
1599  800.0  4.000000      0.0      0.0      1.0

[1600 rows x 5 columns]
[0 0 0 ..., 0 0 0]
        gre       gpa  ranks  intercept  predict_admit
0     220.0  2.260000    1.0        1.0              0
1     220.0  2.260000    2.0        1.0              0
2     220.0  2.260000    3.0        1.0              0
3     220.0  2.260000    4.0        1.0              0
4     220.0  2.351579    1.0        1.0              0
5     220.0  2.351579    2.0        1.0              0
6     220.0  2.351579    3.0        1.0              0
7     220.0  2.351579    4.0        1.0              0
8     220.0  2.443158    1.0        1.0              0
9     220.0  2.443158    2.0        1.0              0
10    220.0  2.443158    3.0        1.0              0
11    220.0  2.443158    4.0        1.0              0
12    220.0  2.534737    1.0        1.0              0
13    220.0  2.534737    2.0        1.0              0
14    220.0  2.534737    3.0        1.0              0
15    220.0  2.534737    4.0        1.0              0
16    220.0  2.626316    1.0        1.0              0
17    220.0  2.626316    2.0        1.0              0
18    220.0  2.626316    3.0        1.0              0
19    220.0  2.626316    4.0        1.0              0
20    220.0  2.717895    1.0        1.0              0
21    220.0  2.717895    2.0        1.0              0
22    220.0  2.717895    3.0        1.0              0
23    220.0  2.717895    4.0        1.0              0
24    220.0  2.809474    1.0        1.0              0
25    220.0  2.809474    2.0        1.0              0
26    220.0  2.809474    3.0        1.0              0
27    220.0  2.809474    4.0        1.0              0
28    220.0  2.901053    1.0        1.0              0
29    220.0  2.901053    2.0        1.0              0
...     ...       ...    ...        ...            ...
1570  800.0  3.358947    3.0        1.0              0
1571  800.0  3.358947    4.0        1.0              0
1572  800.0  3.450526    1.0        1.0              1
1573  800.0  3.450526    2.0        1.0              0
1574  800.0  3.450526    3.0        1.0              0
1575  800.0  3.450526    4.0        1.0              0
1576  800.0  3.542105    1.0        1.0              1
1577  800.0  3.542105    2.0        1.0              0
1578  800.0  3.542105    3.0        1.0              0
1579  800.0  3.542105    4.0        1.0              0
1580  800.0  3.633684    1.0        1.0              1
1581  800.0  3.633684    2.0        1.0              0
1582  800.0  3.633684    3.0        1.0              0
1583  800.0  3.633684    4.0        1.0              0
1584  800.0  3.725263    1.0        1.0              1
1585  800.0  3.725263    2.0        1.0              0
1586  800.0  3.725263    3.0        1.0              0
1587  800.0  3.725263    4.0        1.0              0
1588  800.0  3.816842    1.0        1.0              1
1589  800.0  3.816842    2.0        1.0              0
1590  800.0  3.816842    3.0        1.0              0
1591  800.0  3.816842    4.0        1.0              0
1592  800.0  3.908421    1.0        1.0              1
1593  800.0  3.908421    2.0        1.0              0
1594  800.0  3.908421    3.0        1.0              0
1595  800.0  3.908421    4.0        1.0              0
1596  800.0  4.000000    1.0        1.0              1
1597  800.0  4.000000    2.0        1.0              0
1598  800.0  4.000000    3.0        1.0              0
1599  800.0  4.000000    4.0        1.0              0

[1600 rows x 5 columns]
                  predict_admit
gre        ranks               
220.000000 1.0             0.00
           2.0             0.00
           3.0             0.00
           4.0             0.00
250.526316 1.0             0.00
           2.0             0.00
           3.0             0.00
           4.0             0.00
281.052632 1.0             0.00
           2.0             0.00
           3.0             0.00
           4.0             0.00
311.578947 1.0             0.00
           2.0             0.00
           3.0             0.00
           4.0             0.00
342.105263 1.0             0.00
           2.0             0.00
           3.0             0.00
           4.0             0.00
372.631579 1.0             0.00
           2.0             0.00
           3.0             0.00
           4.0             0.00
403.157895 1.0             0.00
           2.0             0.00
           3.0             0.00
           4.0             0.00
433.684211 1.0             0.00
           2.0             0.00
...                         ...
586.315789 3.0             0.00
           4.0             0.00
616.842105 1.0             0.40
           2.0             0.00
           3.0             0.00
           4.0             0.00
647.368421 1.0             0.50
           2.0             0.00
           3.0             0.00
           4.0             0.00
677.894737 1.0             0.60
           2.0             0.00
           3.0             0.00
           4.0             0.00
708.421053 1.0             0.65
           2.0             0.00
           3.0             0.00
           4.0             0.00
738.947368 1.0             0.75
           2.0             0.00
           3.0             0.00
           4.0             0.00
769.473684 1.0             0.85
           2.0             0.00
           3.0             0.00
           4.0             0.00
800.000000 1.0             0.95
           2.0             0.00
           3.0             0.00
           4.0             0.00

[80 rows x 1 columns]
*********9
Float64Index([1.0, 2.0, 3.0, 4.0, 1.0, 2.0, 3.0, 4.0, 1.0, 2.0, 3.0, 4.0, 1.0,
              2.0, 3.0, 4.0, 1.0, 2.0, 3.0, 4.0, 1.0, 2.0, 3.0, 4.0, 1.0, 2.0,
              3.0, 4.0, 1.0, 2.0, 3.0, 4.0, 1.0, 2.0, 3.0, 4.0, 1.0, 2.0, 3.0,
              4.0, 1.0, 2.0, 3.0, 4.0, 1.0, 2.0, 3.0, 4.0, 1.0, 2.0, 3.0, 4.0,
              1.0, 2.0, 3.0, 4.0, 1.0, 2.0, 3.0, 4.0, 1.0, 2.0, 3.0, 4.0, 1.0,
              2.0, 3.0, 4.0, 1.0, 2.0, 3.0, 4.0, 1.0, 2.0, 3.0, 4.0, 1.0, 2.0,
              3.0, 4.0],
             dtype=\'float64\', name=u\'ranks\')
Float64Index([        220.0, 250.526315789, 281.052631579, 311.578947368,
              342.105263158, 372.631578947, 403.157894737, 433.684210526,
              464.210526316, 494.736842105, 525.263157895, 555.789473684,
              586.315789474, 616.842105263, 647.368421053, 677.894736842,
              708.421052632, 738.947368421, 769.473684211,         800.0],
             dtype=\'float64\', name=u\'gre\')

 

以上是关于逻辑回归--数据独热编码+数据结果可视化的主要内容,如果未能解决你的问题,请参考以下文章

R语言使用xgboost构建回归模型:vtreat包为xgboost回归模型进行数据预处理(缺失值填充缺失值标识离散变量独热onehot编码)构建出生体重的xgboost模型回归模型

逻辑回归--参数解释+数据特征不独热编码+训练数据分布可视话

R语言可视化探索BRFSS数据并逻辑回归Logistic回归预测中风|附代码数据

R语言构建xgboost模型:使用xgboost模型训练tweedie回归模型,特征工程(dataframe转化到data.table独热编码缺失值删除DMatrix结构生成)

机器学习面试题

R语言随机森林RandomForest逻辑回归Logisitc预测心脏病数据和可视化分析|附代码数据