根据另一列的位置从一组列中返回值
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了根据另一列的位置从一组列中返回值相关的知识,希望对你有一定的参考价值。
我正在尝试根据另一列从一组列中提取值。通过第一行示例: - 获取CodeToMatch
的值= 1 - 通过列搜索:Code.1
,Code.2
,Code.3
以找到值1的位置。在这种情况下,它在第3列,因此,从pCode.1
,pCode.2
,pCode3
返回第3列的值,这是“p4”
我下面的例子中的expected_outcome
专栏显示了我所追求的内容。
任何帮助深表感谢!
c1 <- c("1","2","3")
c2 <- c("8","1","3")
c3 <- c("4","2","4")
c4 <- c("1","3","5")
c5 <- c("p1","p2","p3")
c6 <- c("p8","p1","p3")
c7 <- c("p4","p2","p4")
c8 <- c("p4","p1","p3")
df <- data.frame(c1,c2,c3,c4,c5,c6,c7,c8)
colnames(df)[c(1:8)] <- c("CodeToMatch","Code.1","Code.2","Code.3","pCode.1","pCode.2","pCode.3","expected_output")
答案
data.table解决方案
样本数据
df <- structure(list(CodeToMatch = structure(1:3, .Label = c("1", "2",
"3"), class = "factor"), Code.1 = structure(c(3L, 1L, 2L), .Label = c("1",
"3", "8"), class = "factor"), Code.2 = structure(c(2L, 1L, 2L
), .Label = c("2", "4"), class = "factor"), Code.3 = structure(1:3, .Label = c("1",
"3", "5"), class = "factor"), pCode.1 = structure(1:3, .Label = c("p1",
"p2", "p3"), class = "factor"), pCode.2 = structure(c(3L, 1L,
2L), .Label = c("p1", "p3", "p8"), class = "factor"), pCode.3 = structure(c(2L,
1L, 2L), .Label = c("p2", "p4"), class = "factor")), class = "data.frame", row.names = c(NA,
-3L))
码
library(data.table)
#first, melt wide table to long format
df.melt <- melt( setDT(df), id.vars="CodeToMatch", measure.vars = patterns(Code="^Code\..*", pCode="^pCode.*"))
#now finding everything is easy...
df.melt[ Code == CodeToMatch, .(CodeToMatch, pCode)]
产量
# CodeToMatch pCode
# 1: 3 p3
# 2: 2 p1
# 3: 1 p4
另一答案
我不知道这种说法有多好,但这里有一个选择
nCode <- 3
df$expected_output <- apply(df, 1, function(x) x[nCode + 1 + which(x[2:(nCode + 1)] == x[1])])
df$expected_output
#[1] "p4" "p1" "p3"
请注意,“代码”列的数量是硬编码的。在你的情况下,你有3个"Code"
列与匹配的"pCode"
列。根据需要调整。这也假设第一列始终包含待匹配的代码编号。
另一答案
根据名称中的模式分隔代码和pCode列。找出CodeToMatch
的索引,它位于code_columns
的每一行,并使用pcode_columns
从中提取相应的mapply
。
code_columns <- grep("^Code\.[0-9]+", names(df))
pcode_columns <- grep("^pCode", names(df))
mapply(function(x, y) df[x, pcode_columns][df[x, code_columns]==y],
1:nrow(df), df$CodeToMatch)
#[1] "p4" "p1" "p3"
ran
df[1:4] <- lapply(df[1:4], function(x) as.numeric(as.character(x)))
将数字列保持为数字而不是因子。
以上是关于根据另一列的位置从一组列中返回值的主要内容,如果未能解决你的问题,请参考以下文章
如何计算包含一组列中的值和 Pandas 数据框中另一列中的另一个值的行数?
算法或 SQL:查找一组列的条件,确保结果集在特定列中的值始终 > 0
根据另一列中的值删除一列的重复项,Python,Pandas