仅从向量列表中获取最常出现的值

Posted

技术标签:

【中文标题】仅从向量列表中获取最常出现的值【英文标题】:Get only the most often occurring values from a list of vectors 【发布时间】:2021-12-23 05:00:10 【问题描述】:

我的数据如下:

dat <- list(nr1 = list(list_of_account_numbers = " 0000000000", 
    " NL11BANKO0111111111", " NL11BANKO0111111111", " NL11BANKO0111111111", 
    " NL11BANKO0111111111", " NL11BANKO0111111111", " NL11BANKO0111111113", 
    " NL11BANKO0111111111", " NL11BANKO0111111112", " NL11BANKO0111111113", 
    " NL11BANKO0111111111", " NL11BANKO0111111112", " NL11BANKO0111111113", 
    " NL11BANKO0111111111", " NL11BANKO0111111111", " 0000000000", 
    " 0000000000"), nr2 = list(list_of_account_numbers = " NL30ABNA0111111111", 
    " NL31RABO0111111111", " NL30ABNA0111111111", " NL30ABNA0111111111", 
    " NL30ABNA0111111111", " NL31RABO0111111111", " NL31RABO0111111111", 
    " NL52RABO0111111111", " NL74INGB0111111111", " NL74INGB0111111111", 
    " NL30ABNA0111111111", " NL30ABNA0111111111", " NL30ABNA0111111111", 
    " NL74INGB0111111111", " NL74INGB0111111111", " NL74INGB0111111111", 
    " NL74INGB0111111111", " NL74INGB0111111111", " NL74INGB0111111111", 
    " NL16DEUT0111111111"), nr3 = list(
        list_of_account_numbers = " NL11BANKO0111111111", " NL11BANKO0111111111", 
        " NL11BANKO0111111111", " NL11BANKO0111111111", " NL11BANKO0111111113", 
        " NL11BANKO0111111111", " NL11BANKO0111111111", " NL11BANKO0111111113", 
        " NL11BANKO0111111111", " NL11BANKO0111111111", " NL11BANKO0111111113", 
        " NL11BANKO0111111111", " NL11BANKO0111111111"))

我正在尝试为每个列表项 (nr1,nr2,nr3) 编写代码,获取前 3 个最常出现的值。还有两个额外的问题。

    一些列表项的值是 0000000000,应该排除。 有些列表项没有 3 个值,而只有 1 个或 2 个。

我认为首先要做的是取消列出项目并删除出现的 0000000000

IBAN_numbers <- list()
y <- " 0000000000"
for (i in 1:length(dat))  
  IBAN_numbers[[i]] <- unlist(dat[i])
  IBAN_numbers[[i]] = IBAN_numbers[[i]][! IBAN_numbers[[i]] %in% y]
 

但我不确定如何达到最后一点。

table(IBAN_numbers[[1]])

#  NL11BANKO0111111111  NL11BANKO0111111112  NL11BANKO0111111113 
#                    9                    2                    3 

table(IBAN_numbers[[2]])
    #  NL16DEUT0111111111  NL30ABNA0111111111  NL31RABO0111111111  NL52RABO0111111111  NL74INGB0111111111 
#                   1                   7                   3                   1                   8 

table(IBAN_numbers[[3]])
    #  NL11BANKO0111111111  NL11BANKO0111111113 
#                   10                    3 

我可以这样做:

IBAN_numbers <- list()
y <- " 0000000000"
for (i in 1:length(dat))  
  IBAN_numbers[[i]] <- unlist(dat[i])
  IBAN_numbers[[i]] = IBAN_numbers[[i]][! IBAN_numbers[[i]] %in% y]
  IBAN_numbers[[i]] = table(IBAN_numbers[[i]])
 

所以对于中间表,我只想要三个条目(我不关心它需要哪个选项,只要它不会崩溃)。

谁能帮我完成最后一步?

【问题讨论】:

【参考方案1】:

您可以使用lapply 执行此操作-

y <- " 0000000000"
lapply(dat, function(x) 
  x <- unlist(x)
  head(sort(table(x[x != y]), decreasing = TRUE), 3)
)

#$nr1

#NL11BANKO0111111111  NL11BANKO0111111113  NL11BANKO0111111112 
#                  9                    3                    2 

#$nr2

# NL74INGB0111111111  NL30ABNA0111111111  NL31RABO0111111111 
#                  8                   7                   3 

#$nr3

# NL11BANKO0111111111  NL11BANKO0111111113 
#                  10                    3 

如果您只对名称感兴趣,可以使用names(head(sort(table(x[x != y]), decreasing = TRUE), 3))

【讨论】:

【参考方案2】:

使用tidyverse

library(dplyr)
library(purrr)
map(dat, ~ tibble(col1 = flatten_chr(.x)) %>%
     filter(col1 != y) %>% 
     count(col1) %>%
     slice_max(n = 3, order_by = n))

-输出

$nr1
# A tibble: 3 × 2
  col1                       n
  <chr>                  <int>
1 " NL11BANKO0111111111"     9
2 " NL11BANKO0111111113"     3
3 " NL11BANKO0111111112"     2

$nr2
# A tibble: 3 × 2
  col1                      n
  <chr>                 <int>
1 " NL74INGB0111111111"     8
2 " NL30ABNA0111111111"     7
3 " NL31RABO0111111111"     3

$nr3
# A tibble: 2 × 2
  col1                       n
  <chr>                  <int>
1 " NL11BANKO0111111111"    10
2 " NL11BANKO0111111113"     3

【讨论】:

以上是关于仅从向量列表中获取最常出现的值的主要内容,如果未能解决你的问题,请参考以下文章

如何获取数据列表,特定日期仅从每个日期获取 6 条记录而不是更多

仅从 Azure 存储 [Azure-Blob][REST] 中的 Blob 列表获取特定元数据

iOS 应用程序中的 Twitter 提要 - 我如何仅从我自己的帐户获取列表提要 [无需应用用户登录 Twitter]

获取列表中出现的值,并按降序进行排列

查找列表的模式