仅从向量列表中获取最常出现的值
Posted
技术标签:
【中文标题】仅从向量列表中获取最常出现的值【英文标题】:Get only the most often occurring values from a list of vectors 【发布时间】:2021-12-23 05:00:10 【问题描述】:我的数据如下:
dat <- list(nr1 = list(list_of_account_numbers = " 0000000000",
" NL11BANKO0111111111", " NL11BANKO0111111111", " NL11BANKO0111111111",
" NL11BANKO0111111111", " NL11BANKO0111111111", " NL11BANKO0111111113",
" NL11BANKO0111111111", " NL11BANKO0111111112", " NL11BANKO0111111113",
" NL11BANKO0111111111", " NL11BANKO0111111112", " NL11BANKO0111111113",
" NL11BANKO0111111111", " NL11BANKO0111111111", " 0000000000",
" 0000000000"), nr2 = list(list_of_account_numbers = " NL30ABNA0111111111",
" NL31RABO0111111111", " NL30ABNA0111111111", " NL30ABNA0111111111",
" NL30ABNA0111111111", " NL31RABO0111111111", " NL31RABO0111111111",
" NL52RABO0111111111", " NL74INGB0111111111", " NL74INGB0111111111",
" NL30ABNA0111111111", " NL30ABNA0111111111", " NL30ABNA0111111111",
" NL74INGB0111111111", " NL74INGB0111111111", " NL74INGB0111111111",
" NL74INGB0111111111", " NL74INGB0111111111", " NL74INGB0111111111",
" NL16DEUT0111111111"), nr3 = list(
list_of_account_numbers = " NL11BANKO0111111111", " NL11BANKO0111111111",
" NL11BANKO0111111111", " NL11BANKO0111111111", " NL11BANKO0111111113",
" NL11BANKO0111111111", " NL11BANKO0111111111", " NL11BANKO0111111113",
" NL11BANKO0111111111", " NL11BANKO0111111111", " NL11BANKO0111111113",
" NL11BANKO0111111111", " NL11BANKO0111111111"))
我正在尝试为每个列表项 (nr1
,nr2
,nr3
) 编写代码,获取前 3 个最常出现的值。还有两个额外的问题。
-
一些列表项的值是
0000000000
,应该排除。
有些列表项没有 3 个值,而只有 1 个或 2 个。
我认为首先要做的是取消列出项目并删除出现的 0000000000
;
IBAN_numbers <- list()
y <- " 0000000000"
for (i in 1:length(dat))
IBAN_numbers[[i]] <- unlist(dat[i])
IBAN_numbers[[i]] = IBAN_numbers[[i]][! IBAN_numbers[[i]] %in% y]
但我不确定如何达到最后一点。
table(IBAN_numbers[[1]])
# NL11BANKO0111111111 NL11BANKO0111111112 NL11BANKO0111111113
# 9 2 3
table(IBAN_numbers[[2]])
# NL16DEUT0111111111 NL30ABNA0111111111 NL31RABO0111111111 NL52RABO0111111111 NL74INGB0111111111
# 1 7 3 1 8
table(IBAN_numbers[[3]])
# NL11BANKO0111111111 NL11BANKO0111111113
# 10 3
我可以这样做:
IBAN_numbers <- list()
y <- " 0000000000"
for (i in 1:length(dat))
IBAN_numbers[[i]] <- unlist(dat[i])
IBAN_numbers[[i]] = IBAN_numbers[[i]][! IBAN_numbers[[i]] %in% y]
IBAN_numbers[[i]] = table(IBAN_numbers[[i]])
所以对于中间表,我只想要三个条目(我不关心它需要哪个选项,只要它不会崩溃)。
谁能帮我完成最后一步?
【问题讨论】:
【参考方案1】:您可以使用lapply
执行此操作-
y <- " 0000000000"
lapply(dat, function(x)
x <- unlist(x)
head(sort(table(x[x != y]), decreasing = TRUE), 3)
)
#$nr1
#NL11BANKO0111111111 NL11BANKO0111111113 NL11BANKO0111111112
# 9 3 2
#$nr2
# NL74INGB0111111111 NL30ABNA0111111111 NL31RABO0111111111
# 8 7 3
#$nr3
# NL11BANKO0111111111 NL11BANKO0111111113
# 10 3
如果您只对名称感兴趣,可以使用names(head(sort(table(x[x != y]), decreasing = TRUE), 3))
。
【讨论】:
【参考方案2】:使用tidyverse
library(dplyr)
library(purrr)
map(dat, ~ tibble(col1 = flatten_chr(.x)) %>%
filter(col1 != y) %>%
count(col1) %>%
slice_max(n = 3, order_by = n))
-输出
$nr1
# A tibble: 3 × 2
col1 n
<chr> <int>
1 " NL11BANKO0111111111" 9
2 " NL11BANKO0111111113" 3
3 " NL11BANKO0111111112" 2
$nr2
# A tibble: 3 × 2
col1 n
<chr> <int>
1 " NL74INGB0111111111" 8
2 " NL30ABNA0111111111" 7
3 " NL31RABO0111111111" 3
$nr3
# A tibble: 2 × 2
col1 n
<chr> <int>
1 " NL11BANKO0111111111" 10
2 " NL11BANKO0111111113" 3
【讨论】:
以上是关于仅从向量列表中获取最常出现的值的主要内容,如果未能解决你的问题,请参考以下文章
如何获取数据列表,特定日期仅从每个日期获取 6 条记录而不是更多
仅从 Azure 存储 [Azure-Blob][REST] 中的 Blob 列表获取特定元数据
iOS 应用程序中的 Twitter 提要 - 我如何仅从我自己的帐户获取列表提要 [无需应用用户登录 Twitter]