根据另一列的位置从一组列中返回值

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了根据另一列的位置从一组列中返回值相关的知识,希望对你有一定的参考价值。

我正在尝试根据另一列从一组列中提取值。通过第一行示例: - 获取CodeToMatch的值= 1 - 通过列搜索:Code.1Code.2Code.3以找到值1的位置。在这种情况下,它在第3列,因此,从pCode.1pCode.2pCode3返回第3列的值,这是“p4”

我下面的例子中的expected_outcome专栏显示了我所追求的内容。

任何帮助深表感谢!

c1 <- c("1","2","3")
c2 <- c("8","1","3")
c3 <- c("4","2","4")
c4 <- c("1","3","5")
c5 <- c("p1","p2","p3")
c6 <- c("p8","p1","p3")
c7 <- c("p4","p2","p4")
c8 <- c("p4","p1","p3")
df <- data.frame(c1,c2,c3,c4,c5,c6,c7,c8)
colnames(df)[c(1:8)] <- c("CodeToMatch","Code.1","Code.2","Code.3","pCode.1","pCode.2","pCode.3","expected_output")
答案

data.table解决方案

样本数据

df <- structure(list(CodeToMatch = structure(1:3, .Label = c("1", "2", 
"3"), class = "factor"), Code.1 = structure(c(3L, 1L, 2L), .Label = c("1", 
"3", "8"), class = "factor"), Code.2 = structure(c(2L, 1L, 2L
), .Label = c("2", "4"), class = "factor"), Code.3 = structure(1:3, .Label = c("1", 
"3", "5"), class = "factor"), pCode.1 = structure(1:3, .Label = c("p1", 
"p2", "p3"), class = "factor"), pCode.2 = structure(c(3L, 1L, 
2L), .Label = c("p1", "p3", "p8"), class = "factor"), pCode.3 = structure(c(2L, 
1L, 2L), .Label = c("p2", "p4"), class = "factor")), class = "data.frame", row.names = c(NA, 
-3L))

library(data.table)
#first, melt wide table to long format
df.melt <- melt( setDT(df), id.vars="CodeToMatch", measure.vars = patterns(Code="^Code\..*", pCode="^pCode.*"))
#now finding everything is easy...
df.melt[ Code == CodeToMatch, .(CodeToMatch, pCode)]

产量

#    CodeToMatch pCode
# 1:           3    p3
# 2:           2    p1
# 3:           1    p4
另一答案

我不知道这种说法有多好,但这里有一个选择

nCode <- 3
df$expected_output <- apply(df, 1, function(x) x[nCode + 1 + which(x[2:(nCode + 1)] == x[1])])
df$expected_output
#[1] "p4" "p1" "p3"

请注意,“代码”列的数量是硬编码的。在你的情况下,你有3个"Code"列与匹配的"pCode"列。根据需要调整。这也假设第一列始终包含待匹配的代码编号。

另一答案

根据名称中的模式分隔代码和pCode列。找出CodeToMatch的索引,它位于code_columns的每一行,并使用pcode_columns从中提取相应的mapply

code_columns <- grep("^Code\.[0-9]+", names(df))
pcode_columns <- grep("^pCode", names(df))

mapply(function(x, y) df[x, pcode_columns][df[x, code_columns]==y],
                       1:nrow(df), df$CodeToMatch)

#[1] "p4" "p1" "p3"

ran

df[1:4] <- lapply(df[1:4], function(x) as.numeric(as.character(x)))

将数字列保持为数字而不是因子。

以上是关于根据另一列的位置从一组列中返回值的主要内容,如果未能解决你的问题,请参考以下文章

如何计算包含一组列中的值和 Pandas 数据框中另一列中的另一个值的行数?

根据给定条件的总和范围从一列返回值

算法或 SQL:查找一组列的条件,确保结果集在特定列中的值始终 > 0

根据另一列中的值删除一列的重复项,Python,Pandas

Google BigQuery - 根据另一列中的值减去一列的 SUM

EXCEL中在某列中查找指定文本,返回行对应另一列的数据用啥函数