R语言——K折交叉验证之随机均分数据集
Posted 小肥羊的博客
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了R语言——K折交叉验证之随机均分数据集相关的知识,希望对你有一定的参考价值。
今天,在阅读吴喜之教授的《复杂数据统计方法》时,遇到了把一个数据集按照某个因子分成若干子集,再把若干子集随机平均分成n份的问题,吴教授的方法也比较好理解,但是我还是觉得有点繁琐,因此自己编写了一个函数,此后遇到这种问题只需要运行一下函数就可以了。
这里采用R中自带的iris数据集,
> str(iris) ‘data.frame‘: 150 obs. of 5 variables: $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ... $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ... $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ... $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ... $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
iris数据集结构如上所示,其中Species是一个因子型数据,共有三个水平,根据Species将其可以分成三个子集,对每个子集进行五折交叉验证的话,需要把每个数据集均分成五份,R语言代码如下:
fiveDivide<-function(col,data,n=5) { #col is a facotr type column,divide each group of the dataframe #into n partitions,string type #data is a data.frame type in R #n represents the numbers which you want to divide into,default 5 #the function return a list contain n data.frame #use sample(x) generate x numbers in unordered rank,then #divide the x numebr into n partitions group_num=length(levels(data[,col])) # lst1=list() #按照因子分类把原数据分成group_num份 lst2=list() #把每一个gruop分成等分的数据框 lst3=list() # for(i in 1:group_num) { lst1[[i]]=data[data[col]==levels(data[,col])[i],] #这里先把原数据集按照因子水平分成n个子集 } for(k in 1:group_num) #这个循环的目的就是把么个子集平均分成n份,并且是随机分的,需要用到sample函数 { od=sample(nrow(lst1[[k]])) newdata=lst1[[k]][od,] len=length(od) cutpoint=floor(len/n) for(j in 1:n) { if(len>=cutpoint*(1+j)) { lst2[[j]]=newdata[(cutpoint*(j-1)+1):(cutpoint*j),] } else { lst2[[j]]=newdata[(cutpoint*(j-1)+1):len,] } } lst3[[k]]=lst2 } return(lst3) #lst2=list() }
对iris进行处理:
> rep=fiveDivide("Species",iris,5) > str(rep) List of 3 $ :List of 5 ..$ :‘data.frame‘: 10 obs. of 5 variables: .. ..$ Sepal.Length: num [1:10] 4.8 5.2 4.8 4.7 5.5 5.1 4.8 4.4 4.8 4.9 .. ..$ Sepal.Width : num [1:10] 3 3.5 3.4 3.2 3.5 3.7 3.1 3 3.4 3 .. ..$ Petal.Length: num [1:10] 1.4 1.5 1.6 1.6 1.3 1.5 1.6 1.3 1.9 1.4 .. ..$ Petal.Width : num [1:10] 0.3 0.2 0.2 0.2 0.2 0.4 0.2 0.2 0.2 0.2 .. ..$ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ..$ :‘data.frame‘: 10 obs. of 5 variables: .. ..$ Sepal.Length: num [1:10] 5 4.7 4.8 5.2 5.1 5.1 4.9 5.4 5 5.5 .. ..$ Sepal.Width : num [1:10] 3.5 3.2 3 3.4 3.5 3.8 3.1 3.4 3.5 4.2 .. ..$ Petal.Length: num [1:10] 1.3 1.3 1.4 1.4 1.4 1.5 1.5 1.7 1.6 1.4 .. ..$ Petal.Width : num [1:10] 0.3 0.2 0.1 0.2 0.2 0.3 0.1 0.2 0.6 0.2 .. ..$ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ..$ :‘data.frame‘: 10 obs. of 5 variables: .. ..$ Sepal.Length: num [1:10] 5.4 4.3 4.9 5.4 4.4 4.6 5.1 5 5.1 5.1 .. ..$ Sepal.Width : num [1:10] 3.9 3 3.6 3.9 3.2 3.6 3.4 3.4 3.8 3.8 .. ..$ Petal.Length: num [1:10] 1.3 1.1 1.4 1.7 1.3 1 1.5 1.6 1.9 1.6 .. ..$ Petal.Width : num [1:10] 0.4 0.1 0.1 0.4 0.2 0.2 0.2 0.4 0.4 0.2 .. ..$ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ..$ :‘data.frame‘: 10 obs. of 5 variables: .. ..$ Sepal.Length: num [1:10] 4.4 4.5 5.3 5 5 5.1 5.4 5.2 5.1 5.4 .. ..$ Sepal.Width : num [1:10] 2.9 2.3 3.7 3.3 3.4 3.3 3.7 4.1 3.5 3.4 .. ..$ Petal.Length: num [1:10] 1.4 1.3 1.5 1.4 1.5 1.7 1.5 1.5 1.4 1.5 .. ..$ Petal.Width : num [1:10] 0.2 0.3 0.2 0.2 0.2 0.5 0.2 0.1 0.3 0.4 .. ..$ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ..$ :‘data.frame‘: 10 obs. of 5 variables: .. ..$ Sepal.Length: num [1:10] 4.6 5.8 5 5 5 4.6 5.7 4.9 5.7 4.6 .. ..$ Sepal.Width : num [1:10] 3.4 4 3.6 3.2 3 3.2 4.4 3.1 3.8 3.1 .. ..$ Petal.Length: num [1:10] 1.4 1.2 1.4 1.2 1.6 1.4 1.5 1.5 1.7 1.5 .. ..$ Petal.Width : num [1:10] 0.3 0.2 0.2 0.2 0.2 0.2 0.4 0.2 0.3 0.2 .. ..$ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 $ :List of 5 ..$ :‘data.frame‘: 10 obs. of 5 variables: .. ..$ Sepal.Length: num [1:10] 6.2 6 5.8 6.3 5.5 5.8 5.8 6.1 6.2 5.6 .. ..$ Sepal.Width : num [1:10] 2.9 3.4 2.7 3.3 2.6 2.6 2.7 3 2.2 3 .. ..$ Petal.Length: num [1:10] 4.3 4.5 3.9 4.7 4.4 4 4.1 4.6 4.5 4.1 .. ..$ Petal.Width : num [1:10] 1.3 1.6 1.2 1.6 1.2 1.2 1 1.4 1.5 1.3 .. ..$ Species : Factor w/ 3 levels "setosa","versicolor",..: 2 2 2 2 2 2 2 2 2 2 ..$ :‘data.frame‘: 10 obs. of 5 variables: .. ..$ Sepal.Length: num [1:10] 6.4 5.6 5.7 6.6 6 6.4 5.9 6.9 6.7 5.5 .. ..$ Sepal.Width : num [1:10] 3.2 2.5 2.8 3 2.2 2.9 3 3.1 3.1 2.5 .. ..$ Petal.Length: num [1:10] 4.5 3.9 4.5 4.4 4 4.3 4.2 4.9 4.4 4 .. ..$ Petal.Width : num [1:10] 1.5 1.1 1.3 1.4 1 1.3 1.5 1.5 1.4 1.3 .. ..$ Species : Factor w/ 3 levels "setosa","versicolor",..: 2 2 2 2 2 2 2 2 2 2 ..$ :‘data.frame‘: 10 obs. of 5 variables: .. ..$ Sepal.Length: num [1:10] 6.5 5.2 6.8 6 5.7 5 6.3 5.7 5.5 5.6 .. ..$ Sepal.Width : num [1:10] 2.8 2.7 2.8 2.9 2.9 2.3 2.5 2.8 2.3 3 .. ..$ Petal.Length: num [1:10] 4.6 3.9 4.8 4.5 4.2 3.3 4.9 4.1 4 4.5 .. ..$ Petal.Width : num [1:10] 1.5 1.4 1.4 1.5 1.3 1 1.5 1.3 1.3 1.5 .. ..$ Species : Factor w/ 3 levels "setosa","versicolor",..: 2 2 2 2 2 2 2 2 2 2 ..$ :‘data.frame‘: 10 obs. of 5 variables: .. ..$ Sepal.Length: num [1:10] 6.6 6.7 5 6.7 5.9 6.1 5.7 5.4 6 5.1 .. ..$ Sepal.Width : num [1:10] 2.9 3 2 3.1 3.2 2.8 2.6 3 2.7 2.5 .. ..$ Petal.Length: num [1:10] 4.6 5 3.5 4.7 4.8 4 3.5 4.5 5.1 3 .. ..$ Petal.Width : num [1:10] 1.3 1.7 1 1.5 1.8 1.3 1 1.5 1.6 1.1 .. ..$ Species : Factor w/ 3 levels "setosa","versicolor",..: 2 2 2 2 2 2 2 2 2 2 ..$ :‘data.frame‘: 10 obs. of 5 variables: .. ..$ Sepal.Length: num [1:10] 5.6 6.1 6.3 7 4.9 5.7 5.5 5.5 6.1 5.6 .. ..$ Sepal.Width : num [1:10] 2.7 2.9 2.3 3.2 2.4 3 2.4 2.4 2.8 2.9 .. ..$ Petal.Length: num [1:10] 4.2 4.7 4.4 4.7 3.3 4.2 3.8 3.7 4.7 3.6 .. ..$ Petal.Width : num [1:10] 1.3 1.4 1.3 1.4 1 1.2 1.1 1 1.2 1.3 .. ..$ Species : Factor w/ 3 levels "setosa","versicolor",..: 2 2 2 2 2 2 2 2 2 2 $ :List of 5 ..$ :‘data.frame‘: 10 obs. of 5 variables: .. ..$ Sepal.Length: num [1:10] 6.9 6.7 6.1 6.4 6.4 6.7 5.7 6.5 6.4 6.3 .. ..$ Sepal.Width : num [1:10] 3.2 2.5 2.6 2.8 3.1 3.3 2.5 3 2.7 2.9 .. ..$ Petal.Length: num [1:10] 5.7 5.8 5.6 5.6 5.5 5.7 5 5.5 5.3 5.6 .. ..$ Petal.Width : num [1:10] 2.3 1.8 1.4 2.1 1.8 2.1 2 1.8 1.9 1.8 .. ..$ Species : Factor w/ 3 levels "setosa","versicolor",..: 3 3 3 3 3 3 3 3 3 3 ..$ :‘data.frame‘: 10 obs. of 5 variables: .. ..$ Sepal.Length: num [1:10] 5.8 7.7 6.5 6.4 7.4 6.3 6.8 6 6.7 6.8 .. ..$ Sepal.Width : num [1:10] 2.8 2.8 3.2 3.2 2.8 3.3 3 2.2 3.3 3.2 .. ..$ Petal.Length: num [1:10] 5.1 6.7 5.1 5.3 6.1 6 5.5 5 5.7 5.9 .. ..$ Petal.Width : num [1:10] 2.4 2 2 2.3 1.9 2.5 2.1 1.5 2.5 2.3 .. ..$ Species : Factor w/ 3 levels "setosa","versicolor",..: 3 3 3 3 3 3 3 3 3 3 ..$ :‘data.frame‘: 10 obs. of 5 variables: .. ..$ Sepal.Length: num [1:10] 5.8 6.2 6 6.1 7.7 5.6 6.3 7.3 7.2 6.9 .. ..$ Sepal.Width : num [1:10] 2.7 2.8 3 3 2.6 2.8 2.8 2.9 3 3.1 .. ..$ Petal.Length: num [1:10] 5.1 4.8 4.8 4.9 6.9 4.9 5.1 6.3 5.8 5.4 .. ..$ Petal.Width : num [1:10] 1.9 1.8 1.8 1.8 2.3 2 1.5 1.8 1.6 2.1 .. ..$ Species : Factor w/ 3 levels "setosa","versicolor",..: 3 3 3 3 3 3 3 3 3 3 ..$ :‘data.frame‘: 10 obs. of 5 variables: .. ..$ Sepal.Length: num [1:10] 6.7 7.2 7.2 6.3 6.3 6.5 6.3 7.7 7.9 6.5 .. ..$ Sepal.Width : num [1:10] 3 3.2 3.6 2.7 2.5 3 3.4 3.8 3.8 3 .. ..$ Petal.Length: num [1:10] 5.2 6 6.1 4.9 5 5.8 5.6 6.7 6.4 5.2 .. ..$ Petal.Width : num [1:10] 2.3 1.8 2.5 1.8 1.9 2.2 2.4 2.2 2 2 .. ..$ Species : Factor w/ 3 levels "setosa","versicolor",..: 3 3 3 3 3 3 3 3 3 3 ..$ :‘data.frame‘: 10 obs. of 5 variables: .. ..$ Sepal.Length: num [1:10] 7.7 6.4 6.2 6.9 6.7 7.1 5.8 4.9 5.9 7.6 .. ..$ Sepal.Width : num [1:10] 3 2.8 3.4 3.1 3.1 3 2.7 2.5 3 3 .. ..$ Petal.Length: num [1:10] 6.1 5.6 5.4 5.1 5.6 5.9 5.1 4.5 5.1 6.6 .. ..$ Petal.Width : num [1:10] 2.3 2.2 2.3 2.3 2.4 2.1 1.9 1.7 1.8 2.1 .. ..$ Species : Factor w/ 3 levels "setosa","versicolor",..: 3 3 3 3 3 3 3 3 3 3
均分以后数据表现为:
> rep [[1]] [[1]][[1]] Sepal.Length Sepal.Width Petal.Length Petal.Width Species 46 4.8 3.0 1.4 0.3 setosa 28 5.2 3.5 1.5 0.2 setosa 12 4.8 3.4 1.6 0.2 setosa 30 4.7 3.2 1.6 0.2 setosa 37 5.5 3.5 1.3 0.2 setosa 22 5.1 3.7 1.5 0.4 setosa 31 4.8 3.1 1.6 0.2 setosa 39 4.4 3.0 1.3 0.2 setosa 25 4.8 3.4 1.9 0.2 setosa 2 4.9 3.0 1.4 0.2 setosa [[1]][[2]] Sepal.Length Sepal.Width Petal.Length Petal.Width Species 41 5.0 3.5 1.3 0.3 setosa 3 4.7 3.2 1.3 0.2 setosa 13 4.8 3.0 1.4 0.1 setosa 29 5.2 3.4 1.4 0.2 setosa 1 5.1 3.5 1.4 0.2 setosa 20 5.1 3.8 1.5 0.3 setosa 10 4.9 3.1 1.5 0.1 setosa 21 5.4 3.4 1.7 0.2 setosa 44 5.0 3.5 1.6 0.6 setosa 34 5.5 4.2 1.4 0.2 setosa [[1]][[3]] Sepal.Length Sepal.Width Petal.Length Petal.Width Species 17 5.4 3.9 1.3 0.4 setosa 14 4.3 3.0 1.1 0.1 setosa 38 4.9 3.6 1.4 0.1 setosa 6 5.4 3.9 1.7 0.4 setosa 43 4.4 3.2 1.3 0.2 setosa 23 4.6 3.6 1.0 0.2 setosa 40 5.1 3.4 1.5 0.2 setosa 27 5.0 3.4 1.6 0.4 setosa 45 5.1 3.8 1.9 0.4 setosa 47 5.1 3.8 1.6 0.2 setosa [[1]][[4]] Sepal.Length Sepal.Width Petal.Length Petal.Width Species 9 4.4 2.9 1.4 0.2 setosa 42 4.5 2.3 1.3 0.3 setosa 49 5.3 3.7 1.5 0.2 setosa 50 5.0 3.3 1.4 0.2 setosa 8 5.0 3.4 1.5 0.2 setosa 24 5.1 3.3 1.7 0.5 setosa 11 5.4 3.7 1.5 0.2 setosa 33 5.2 4.1 1.5 0.1 setosa 18 5.1 3.5 1.4 0.3 setosa 32 5.4 3.4 1.5 0.4 setosa [[1]][[5]] Sepal.Length Sepal.Width Petal.Length Petal.Width Species 7 4.6 3.4 1.4 0.3 setosa 15 5.8 4.0 1.2 0.2 setosa 5 5.0 3.6 1.4 0.2 setosa 36 5.0 3.2 1.2 0.2 setosa 26 5.0 3.0 1.6 0.2 setosa 48 4.6 3.2 1.4 0.2 setosa 16 5.7 4.4 1.5 0.4 setosa 35 4.9 3.1 1.5 0.2 setosa 19 5.7 3.8 1.7 0.3 setosa 4 4.6 3.1 1.5 0.2 setosa [[2]] [[2]][[1]] Sepal.Length Sepal.Width Petal.Length Petal.Width Species 98 6.2 2.9 4.3 1.3 versicolor 86 6.0 3.4 4.5 1.6 versicolor 83 5.8 2.7 3.9 1.2 versicolor 57 6.3 3.3 4.7 1.6 versicolor 91 5.5 2.6 4.4 1.2 versicolor 93 5.8 2.6 4.0 1.2 versicolor 68 5.8 2.7 4.1 1.0 versicolor 92 6.1 3.0 4.6 1.4 versicolor 69 6.2 2.2 4.5 1.5 versicolor 89 5.6 3.0 4.1 1.3 versicolor [[2]][[2]] Sepal.Length Sepal.Width Petal.Length Petal.Width Species 52 6.4 3.2 4.5 1.5 versicolor 70 5.6 2.5 3.9 1.1 versicolor 56 5.7 2.8 4.5 1.3 versicolor 76 6.6 3.0 4.4 1.4 versicolor 63 6.0 2.2 4.0 1.0 versicolor 75 6.4 2.9 4.3 1.3 versicolor 62 5.9 3.0 4.2 1.5 versicolor 53 6.9 3.1 4.9 1.5 versicolor 66 6.7 3.1 4.4 1.4 versicolor 90 5.5 2.5 4.0 1.3 versicolor [[2]][[3]] Sepal.Length Sepal.Width Petal.Length Petal.Width Species 55 6.5 2.8 4.6 1.5 versicolor 60 5.2 2.7 3.9 1.4 versicolor 77 6.8 2.8 4.8 1.4 versicolor 79 6.0 2.9 4.5 1.5 versicolor 97 5.7 2.9 4.2 1.3 versicolor 94 5.0 2.3 3.3 1.0 versicolor 73 6.3 2.5 4.9 1.5 versicolor 100 5.7 2.8 4.1 1.3 versicolor 54 5.5 2.3 4.0 1.3 versicolor 67 5.6 3.0 4.5 1.5 versicolor [[2]][[4]] Sepal.Length Sepal.Width Petal.Length Petal.Width Species 59 6.6 2.9 4.6 1.3 versicolor 78 6.7 3.0 5.0 1.7 versicolor 61 5.0 2.0 3.5 1.0 versicolor 87 6.7 3.1 4.7 1.5 versicolor 71 5.9 3.2 4.8 1.8 versicolor 72 6.1 2.8 4.0 1.3 versicolor 80 5.7 2.6 3.5 1.0 versicolor 85 5.4 3.0 4.5 1.5 versicolor 84 6.0 2.7 5.1 1.6 versicolor 99 5.1 2.5 3.0 1.1 versicolor [[2]][[5]] Sepal.Length Sepal.Width Petal.Length Petal.Width Species 95 5.6 2.7 4.2 1.3 versicolor 64 6.1 2.9 4.7 1.4 versicolor 88 6.3 2.3 4.4 1.3 versicolor 51 7.0 3.2 4.7 1.4 versicolor 58 4.9 2.4 3.3 1.0 versicolor 96 5.7 3.0 4.2 1.2 versicolor 81 5.5 2.4 3.8 1.1 versicolor 82 5.5 2.4 3.7 1.0 versicolor 74 6.1 2.8 4.7 1.2 versicolor 65 5.6 2.9 3.6 1.3 versicolor [[3]] [[3]][[1]] Sepal.Length Sepal.Width Petal.Length Petal.Width Species 121 6.9 3.2 5.7 2.3 virginica 109 6.7 2.5 5.8 1.8 virginica 135 6.1 2.6 5.6 1.4 virginica 129 6.4 2.8 5.6 2.1 virginica 138 6.4 3.1 5.5 1.8 virginica 125 6.7 3.3 5.7 2.1 virginica 114 5.7 2.5 5.0 2.0 virginica 117 6.5 3.0 5.5 1.8 virginica 112 6.4 2.7 5.3 1.9 virginica 104 6.3 2.9 5.6 1.8 virginica [[3]][[2]] Sepal.Length Sepal.Width Petal.Length Petal.Width Species 115 5.8 2.8 5.1 2.4 virginica 123 7.7 2.8 6.7 2.0 virginica 111 6.5 3.2 5.1 2.0 virginica 116 6.4 3.2 5.3 2.3 virginica 131 7.4 2.8 6.1 1.9 virginica 101 6.3 3.3 6.0 2.5 virginica 113 6.8 3.0 5.5 2.1 virginica 120 6.0 2.2 5.0 1.5 virginica 145 6.7 3.3 5.7 2.5 virginica 144 6.8 3.2 5.9 2.3 virginica [[3]][[3]] Sepal.Length Sepal.Width Petal.Length Petal.Width Species 143 5.8 2.7 5.1 1.9 virginica 127 6.2 2.8 4.8 1.8 virginica 139 6.0 3.0 4.8 1.8 virginica 128 6.1 3.0 4.9 1.8 virginica 119 7.7 2.6 6.9 2.3 virginica 122 5.6 2.8 4.9 2.0 virginica 134 6.3 2.8 5.1 1.5 virginica 108 7.3 2.9 6.3 1.8 virginica 130 7.2 3.0 5.8 1.6 virginica 140 6.9 3.1 5.4 2.1 virginica [[3]][[4]] Sepal.Length Sepal.Width Petal.Length Petal.Width Species 146 6.7 3.0 5.2 2.3 virginica 126 7.2 3.2 6.0 1.8 virginica 110 7.2 3.6 6.1 2.5 virginica 124 6.3 2.7 4.9 1.8 virginica 147 6.3 2.5 5.0 1.9 virginica 105 6.5 3.0 5.8 2.2 virginica 137 6.3 3.4 5.6 2.4 virginica 118 7.7 3.8 6.7 2.2 virginica 132 7.9 3.8 6.4 2.0 virginica 148 6.5 3.0 5.2 2.0 virginica [[3]][[5]] Sepal.Length Sepal.Width Petal.Length Petal.Width Species 136 7.7 3.0 6.1 2.3 virginica 133 6.4 2.8 5.6 2.2 virginica 149 6.2 3.4 5.4 2.3 virginica 142 6.9 3.1 5.1 2.3 virginica 141 6.7 3.1 5.6 2.4 virginica 103 7.1 3.0 5.9 2.1 virginica 102 5.8 2.7 5.1 1.9 virginica 107 4.9 2.5 4.5 1.7 virginica 150 5.9 3.0 5.1 1.8 virginica 106 7.6 3.0 6.6 2.1 virginica
以上是关于R语言——K折交叉验证之随机均分数据集的主要内容,如果未能解决你的问题,请参考以下文章