比较dplyr中的字符串
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了比较dplyr中的字符串相关的知识,希望对你有一定的参考价值。
我的数据框看起来像:
df <- tibble::tribble(
~order_id, ~user_id, ~comp_order, ~comp_rec,
1164320, 32924, "4-6-22-11-37-5", "4-5-6-11-22-36-37",
1169182, 33128, "9-4-15-28-8-7", "4-7-8-9-28-37-38",
1166014, 33003, "27-22-4-6-5", "4-5-6-22-27-36-37",
1166019, 32996, "27-22-4-6-8", "4-6-8-22-27-36-38"
)
我想知道在comp_order列而不是comp_rec中存在什么数字。
最终输出应类似于:
order_id user_id comp_rec comp_order is_equal elements_removed_from_rec elements_added_to_order
<dbl> <dbl> <chr> <chr> <chr> <chr> <chr>
1 1164320 32924 4-5-6-11-22-36-37 4-6-22-11-37-5 no 36 none
2 1169182 33128 4-7-8-9-28-37-38 9-4-15-28-8-7 no 37 none
3 1166014 33003 4-5-6-22-27-36-37 27-22-4-6-5 no 36-37 none
4 1166019 32996 4-6-8-22-27-36-38 27-22-4-6-8 no 36-38 none
5 1166012 32922 27-22-4-6-8 27-22-4-6-8 yes none none
6 1166033 32911 27-22-4-6-8 27-22-4-6-8-33 no none 33
df_output <- tibble::tribble(
~order_id, ~user_id, ~comp_rec, ~comp_order, ~is_equal, ~elements_removed_from_rec, ~elements_added_to_order,
1164320, 32924, "4-5-6-11-22-36-37", "4-6-22-11-37-5", "no", "36", "none",
1169182, 33128, "4-7-8-9-28-37-38", "9-4-15-28-8-7", "no", "37", "none",
1166014, 33003, "4-5-6-22-27-36-37", "27-22-4-6-5", "no", "36-37", "none",
1166019, 32996, "4-6-8-22-27-36-38", "27-22-4-6-8", "no", "36-38", "none",
1166012, 32922, "27-22-4-6-8", "27-22-4-6-8", "yes", "none", "none",
1166033, 32911, "27-22-4-6-8", "27-22-4-6-8-33", "no", "none", "33"
)
我需要知道:
- 已从记录中删除的内容
- 已添加到订单中的内容
根据字符串中的数字。
字符串中数字的顺序不一定相同的问题...
如何比较这两个字符串?
答案
[第一次尝试:
df$digits_order <- str_split(df$comp_order, "-")
df$digits_rec <- str_split(df$comp_rec, "-")
df$in_order_but_not_rec <- apply(df, 1, function(row) paste(setdiff(row$digits_rec, row$digits_order), collapse = "-"))
> df
# A tibble: 4 x 7
order_id user_id comp_order comp_rec digits_order digits_rec in_order_but_not_rec
<dbl> <dbl> <chr> <chr> <list> <list> <chr>
1 1164320 32924 4-6-22-11-37-5 4-5-6-11-22-36-37 <chr [6]> <chr [7]> 36
2 1169182 33128 9-4-15-28-8-7 4-7-8-9-28-37-38 <chr [6]> <chr [7]> 37-38
3 1166014 33003 27-22-4-6-5 4-5-6-22-27-36-37 <chr [5]> <chr [7]> 36-37
4 1166019 32996 27-22-4-6-8 4-6-8-22-27-36-38 <chr [5]> <chr [7]> 36-38
以上是关于比较dplyr中的字符串的主要内容,如果未能解决你的问题,请参考以下文章