django的group_by

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了django的group_by相关的知识,希望对你有一定的参考价值。

参考技术A

django 的ORM中并没有单独的group_by方法,而是通过values + annotate的方式来实现group_by.

eg. 假如我们有个visit_record表. 记录网页每天的访问记录。
id | page_url | domain | pv | uv | date
通过(page_url, domain)唯一确定一个系统。

当values和annotate一起用的时候,values的字段就自动承担起了group_by的作用。 这个语句相当于:

需要注意与order_by一起用的时候,如果order_by的字段不在所选字段中或order_by字段是无效的会导致group_by不生效。
例如上述语句的order_by改成 order_by(\'-pv\', \'id\') 会导致语句变成 ... group by id 。
想一下应该可以理解,因为group_by按照page_url和domain聚合之后,得到的记录是多条记录的pv之和,不存在对应的是哪个id的记录,所以没法按照id排序,所以会导致group_by失效。
参见官方文档:

带示例的条件 group_by

【中文标题】带示例的条件 group_by【英文标题】:Conditional group_by with example 【发布时间】:2021-10-31 12:27:43 【问题描述】:

我的任务是识别数据集中的唯一试验 (1,2,3,...)。这是一个例子:

"source","ID","cultivar","design"
"PDMR_vol_12","CF027","Ambassador","RCBD"
"PDMR_vol_12","CF027","Ambassador","RCBD"
"PDMR_vol_12","CF027","Ambassador","RCBD"
"PDMR_vol_12","CF027","Ambassador","RCBD"
"PDMR_vol_7","CF026","ASG2000","RCBD"
"PDMR_vol_7","CF026","ASG2000","RCBD"
"PDMR_vol_7","CF026","ASG2000","RCBD"
"PDMR_vol_7","CF026","P26R61","RCBD"
"PDMR_vol_7","CF026","P26R61","RCBD"
"PDMR_vol_7","CF026","P26R61","RCBD"
"PDMR_vol_4","CF011","Roane","SP"
"PDMR_vol_4","CF011","Roane","SP"
"PDMR_vol_4","CF011","Tomahawk","SP"
"PDMR_vol_4","CF011","Tomahawk","SP"
"PDMR_vol_4","CF011","Everest","SP"
"PDMR_vol_4","CF011","Everest","SP"

条件列是:

unique_trials_RCBD<- ("source","ID","cultivar","design")

unique_trials_SP<-unique_trials_RCBD[-3]

使用基于几列的条件 group_by,我们几乎可以得到正确的结果,只是它没有正确地将 (PDMR_vol_7 CF026) 识别为两次试验。

doAGroupBy <- function(data, some_condition) 

 if (some_condition == TRUE) 

   group_args <- unique_trials_RCBD

   else 

   group_args <- unique_trials_SP

 

  data %>%
    group_by_at(vars(group_args))



 a<-doAGroupBy(data, FALSE) %>% 
   mutate(trial_number=cur_group_id())

总共应该有 4 次试验。关于如何改进此代码的任何想法?谢谢

【问题讨论】:

为什么要将PDMR_vol_7 CF026标识为2次试验?同样在unique_trials_SP 中,您正在从中删除“栽培品种”。对吗? 【参考方案1】:

如果我正确理解了这个问题,这应该可以:

数据

df <-
tibble::tribble(~`source`, ~`ID`,~`cultivar`,~`design`,
  "PDMR_vol_12", "CF027", "Ambassador",  "RCBD",
  "PDMR_vol_12", "CF027", "Ambassador",  "RCBD",
  "PDMR_vol_12", "CF027", "Ambassador",  "RCBD",
  "PDMR_vol_12", "CF027", "Ambassador",  "RCBD",
   "PDMR_vol_7", "CF026",    "ASG2000",  "RCBD",
   "PDMR_vol_7", "CF026",    "ASG2000",  "RCBD",
   "PDMR_vol_7", "CF026",    "ASG2000",  "RCBD",
   "PDMR_vol_7", "CF026",     "P26R61",  "RCBD",
   "PDMR_vol_7", "CF026",     "P26R61",  "RCBD",
   "PDMR_vol_7", "CF026",     "P26R61",  "RCBD",
   "PDMR_vol_4", "CF011",      "Roane",    "SP",
   "PDMR_vol_4", "CF011",      "Roane",    "SP",
   "PDMR_vol_4", "CF011",   "Tomahawk",    "SP",
   "PDMR_vol_4", "CF011",   "Tomahawk",    "SP",
   "PDMR_vol_4", "CF011",    "Everest",    "SP",
   "PDMR_vol_4", "CF011",    "Everest",    "SP"
  ) 

代码

df %>% 
  # Creating auxiliar variable, consdering cultivar only for a RCBD design
  mutate(aux = if_else(design == "RCBD", cultivar,NA_character_)) %>%
  # Groupinp by source,ID,design and aux
  group_by(source,ID,design,aux) %>% 
  # Creating index grouped by variables above
  mutate(trial = group_indices())

结果

# A tibble: 16 x 6
# Groups:   source, ID, design, aux [4]
   source      ID    cultivar   design aux        trial
   <chr>       <chr> <chr>      <chr>  <chr>      <int>
 1 PDMR_vol_12 CF027 Ambassador RCBD   Ambassador     1
 2 PDMR_vol_12 CF027 Ambassador RCBD   Ambassador     1
 3 PDMR_vol_12 CF027 Ambassador RCBD   Ambassador     1
 4 PDMR_vol_12 CF027 Ambassador RCBD   Ambassador     1
 5 PDMR_vol_7  CF026 ASG2000    RCBD   ASG2000        3
 6 PDMR_vol_7  CF026 ASG2000    RCBD   ASG2000        3
 7 PDMR_vol_7  CF026 ASG2000    RCBD   ASG2000        3
 8 PDMR_vol_7  CF026 P26R61     RCBD   P26R61         4
 9 PDMR_vol_7  CF026 P26R61     RCBD   P26R61         4
10 PDMR_vol_7  CF026 P26R61     RCBD   P26R61         4
11 PDMR_vol_4  CF011 Roane      SP     NA             2
12 PDMR_vol_4  CF011 Roane      SP     NA             2
13 PDMR_vol_4  CF011 Tomahawk   SP     NA             2
14 PDMR_vol_4  CF011 Tomahawk   SP     NA             2
15 PDMR_vol_4  CF011 Everest    SP     NA             2
16 PDMR_vol_4  CF011 Everest    SP     NA             2

【讨论】:

以上是关于django的group_by的主要内容,如果未能解决你的问题,请参考以下文章

django的文档

Django 大神带你飞系列~走进Django

Django:启动django

django的单元测试怎么用

django-admin 和django-admin.py的区别

mac电脑安装django ,运行django报错解决