使用答案键替换基于计算机的测试结果数据集的值
Posted
技术标签:
【中文标题】使用答案键替换基于计算机的测试结果数据集的值【英文标题】:Replacing values of a computer based test result dataset using answer key 【发布时间】:2018-02-17 03:12:33 【问题描述】:我的数据集来自基于计算机的测试,下面给出了一个示例。
x<-data.frame(rbind(c("A","C","A","B","A"),
c("M","M","M","M","M"),
c("M","M","M","M","M"),
c("C","C","A","C","A"),
c("C","C","B","C","A"),
c("A","C","A","C","B")))
colnames(x)<-c("q1","q2","q3","q4","q5")
rownames(x)<-c("key","c1","c2","c3","c4","c5")
q1 q2 q3 q4 q5
key A C A B A
c1 M M M M M
c2 M M M M M
c3 C C A C A
c4 C C B C A
c5 A C A C B
列代表问题,行代表候选人。 第一行是答案键。 M 代表未回答。 我需要替换值,以便将 Ms 替换为“NA”,将正确答案替换为 1,将错误答案替换为 0。 前任。对于 q1,正确答案是“A”,因此候选 3 的值“C”有 用 0 代替,因为答案是错误的。
最终的数据集应该是这样的
q1 q2 q3 q4 q5
key A C A B A
c1 <NA> <NA> <NA> <NA> <NA>
c2 <NA> <NA> <NA> <NA> <NA>
c3 0 1 1 0 1
c4 0 1 0 0 1
c5 1 1 1 0 0
替换Ms相当简单。
x[x=="M"]<-NA
但我发现很难一步替换其他值。
x<-as.matrix(x)
由于数据框抛出错误,已转换为矩阵 "Ops.factor(left, right) 中的错误:因子的水平集不同"
for(i in 2:nrow(x))
for( j in 1:ncol(x))
ifelse(x[i][j]==x[1][j],x[i][j]<-1,x[i][j]<-0)
这个 for 循环只是替换第一列的值。
q1 q2 q3 q4 q5
key "A" "C" "A" "B" "A"
c1 NA NA NA NA NA
c2 NA NA NA NA NA
c3 "0" "C" "A" "C" "A"
c4 "0" "C" "B" "C" "A"
c5 "1" "C" "A" "C" "B"
如何替换整个数据集?
【问题讨论】:
【参考方案1】:您不应将数据结构中的键作为观察值(行)。从概念上讲,它不属于那里。您还应该使用矩阵而不是 data.frame。
x <- as.matrix(x)
key <- x[1,]
x <- x[-1,]
x[x == "M"] <- NA
#matrices are filled by column,
#thus we need to transpose
#unary plus turns the logical matrix into an integer matrix
y <- +(t(t(x) == key))
# q1 q2 q3 q4 q5
#c1 NA NA NA NA NA
#c2 NA NA NA NA NA
#c3 0 1 1 0 1
#c4 0 1 0 0 1
#c5 1 1 1 0 0
请注意,我更正了您数据中的拼写错误。
【讨论】:
【参考方案2】:使用 dplyr 改变所有列:
library(dplyr)
# after the NA inputation step
x %>%
mutate_all(funs(ifelse(row_number(.) == 1,
as.character(.), # leave first row unchanged
as.numeric(toupper(.) == first(.))))) #compare subsequent rows with first
q1 q2 q3 q4 q5
1 A C A B A
2 <NA> <NA> <NA> <NA> <NA>
3 <NA> <NA> <NA> <NA> <NA>
4 0 1 1 0 1
5 0 1 0 0 1
6 1 1 1 0 0
(注意:样本数据包括大小写的答案,所以我假设这两个输入都是计算机允许的。如果不是这种情况并且所有答案都是大写的,toupper()
部分可以是跳过。)
【讨论】:
那是一个错字,都是大写的。这很好用,谢谢! 请注意,结果中的所有值都是字符。这可能对数据分析的后续步骤很重要。【参考方案3】:使用 ifelse 函数,您可以:
#When working with character data, take note of this option stringsAsFactors=FALSE
# Candidate c4 data has lower key C, corrected it below
x = data.frame(rbind(c("A","C","A","B","A"),
c("M","M","M","M","M"),
c("M","M","M","M","M"),
c("C","C","A","C","A"),
c("c","c","B","C","A"),
c("A","C","A","C","B")),stringsAsFactors=FALSE)
#all upper case
x = sapply(x,toupper)
colnames(x) = c("q1","q2","q3","q4","q5")
rownames(x) = c("key","c1","c2","c3","c4","c5")
#replace M's
x[x == "M"] = NA
#Match each row with key vector x[1,], repeated 5 time to match number of rows of original dataset
x[-1,] = ifelse(x[-1,] == matrix(rep(as.matrix(x[1,]),5),nrow=5,byrow=TRUE),1,0)
x
# q1 q2 q3 q4 q5
#key "A" "C" "A" "B" "A"
#c1 NA NA NA NA NA
#c2 NA NA NA NA NA
#c3 "0" "1" "1" "0" "1"
#c4 "0" "1" "0" "0" "1"
#c5 "1" "1" "1" "0" "0"
【讨论】:
在数据集的实际场景大小未知,可能是10,000个候选人和1000个问题。因此 5 必须替换为 nrow(x)-1。 x[-1,] = ifelse(x[-1,] == 矩阵(rep(as.matrix(x[1,]),nrow(x)-1,nrow=nrow(x)-1,byrow=真),1,0)以上是关于使用答案键替换基于计算机的测试结果数据集的值的主要内容,如果未能解决你的问题,请参考以下文章
MapReduce模型中数据关联使用or语句导致计算效率低下
请问MyBatis.net 如何执行一个不需要参数但返回结果集的oracle数据库的存储过程?