查找分组的最大计数

Posted

技术标签:

【中文标题】查找分组的最大计数【英文标题】:Find max of count on group by 【发布时间】:2017-06-30 21:58:27 【问题描述】:

我在回答以下问题时遇到了一些问题。

对于学生表中出现的每个年龄值,找到 最常出现。例如,如果 18 岁的 FR 水平学生比 SR 学生多, JR 或 18 岁的 SO 学生,您应该打印这对 (18, FR)。

来自这个数据集。

create database university;
use university;

create table student(
    snum decimal(9) primary key,
    sname varchar(30),
    major varchar(25),
    level varchar(2),
    age int
    );
create table faculty(
    fid decimal(9) primary key,
    fname varchar(30),
    deptid decimal(2)
    );
create table class(
    cname varchar(40) primary key,
    meets_at varchar(20),
    room varchar(10),
    fid decimal(9),
    foreign key(fid) references faculty(fid)
    );
create table enrolled(
    snum decimal(9),
    cname varchar(40),
    primary key(snum,cname),
    foreign key(snum) references student(snum),
    foreign key(cname) references class(cname)
    );

insert into student values(051135593,'Maria White','English','SR',21);
insert into student values(060839453,'Charles Harris','Architecture','SR',22);
insert into student values(099354543,'Susan Martin','Law','JR',20);
insert into student values(112348546,'Joseph Thompson','Computer Science','SO',19);
insert into student values(115987938,'Christopher Garcia','Computer Science','JR',20);
insert into student values(132977562,'Angela Martinez','History','SR',20);
insert into student values(269734834,'Thomas Robinson','Psychology','SO',18);
insert into student values(280158572,'Margaret Clark','Animal Science','FR',18);
insert into student values(301221823,'Juan Rodriguez','Psychology','JR',20);
insert into student values(318548912,'Dorthy Lewis','Finance','FR',18);
insert into student values(320874981,'Daniel Lee','Electrical Engineering','FR',17);
insert into student values(322654189,'Lisa Walker','Computer Science','SO',17);
insert into student values(348121549,'Paul Hall','Computer Science','JR',18);
insert into student values(351565322,'Nancy Allen','Accounting','JR',19);
insert into student values(451519864,'Mark Young','Finance','FR',18);
insert into student values(455798411,'Luis Hernandez','Electrical Engineering','FR',17);
insert into student values(462156489,'Donald King','Mechanical Engineering','SO',19);
insert into student values(550156548,'George Wright','Education','SR',21);
insert into student values(552455318,'Ana Lopez','Computer Engineering','SR',19);
insert into student values(556784565,'Kenneth Hill','Civil Engineering','SR',21);
insert into student values(567354612,'Karen Scott','Computer Engineering','FR',18);
insert into student values(573284895,'Steven Green','Kinesiology','SO',19);
insert into student values(574489456,'Betty Adams','Economics','JR',20);
insert into student values(578875478,'Edward Baker','Veterinary Medicine','SR',21);
insert into faculty values(142519864,'Ivana Teach',20);
insert into faculty values(242518965,'James Smith',68);
insert into faculty values(141582651,'Mary Johnson',20);
insert into faculty values(011564812,'John Williams',68);
insert into faculty values(254099823,'Patricia Jones',68);
insert into faculty values(356187925,'Robert Brown',12);
insert into faculty values(489456522,'Linda Davis',20);
insert into faculty values(287321212,'Michael Miller',12);
insert into faculty values(248965255,'Barbara Wilson',12);
insert into faculty values(159542516,'William Moore',33);
insert into faculty values(090873519,'Elizabeth Taylor',11);
insert into faculty values(486512566,'David Anderson',20);
insert into faculty values(619023588,'Jennifer Thomas',11);
insert into faculty values(489221823,'Richard Jackson',33);
insert into faculty values(548977562,'Ulysses Teach',20);
insert into class values('Data Structures','MWF 10','R128',489456522);
insert into class values('Database Systems','MWF 12:30-1:45','1320 DCL',142519864);
insert into class values('Operating System Design','TuTh 12-1:20','20 AVW',489456522);
insert into class values('Archaeology of the Incas','MWF 3-4:15','R128',248965255);
insert into class values('Aviation Accident Investigation','TuTh 1-2:50','Q3',011564812);
insert into class values('Air Quality Engineering','TuTh 10:30-11:45','R15',011564812);
insert into class values('Introductory Latin','MWF 3-4:15','R12',248965255);
insert into class values('American Political Parties','TuTh 2-3:15','20 AVW',619023588);
insert into class values('Social Cognition','Tu 6:30-8:40','R15',159542516);
insert into class values('Perception','MTuWTh 3','Q3',489221823);
insert into class values('Multivariate Analysis','TuTh 2-3:15','R15',090873519);
insert into class values('Patent Law','F 1-2:50','R128',090873519);
insert into class values('Urban Economics','MWF 11','20 AVW',489221823);
insert into class values('Organic Chemistry','TuTh 12:30-1:45','R12',489221823);
insert into class values('Marketing Research','MW 10-11:15','1320 DCL',489221823);
insert into class values('Seminar in American Art','M 4','R15',489221823);
insert into class values('Orbital Mechanics','MWF 8','1320 DCL',011564812);
insert into class values('Dairy Herd Management','TuTh 12:30-1:45','R128',356187925);
insert into class values('Communication Networks','MW 9:30-10:45','20 AVW',141582651);
insert into class values('Optical Electronics','TuTh 12:30-1:45','R15',254099823);
insert into class values('Intoduction to Math','TuTh 8-9:30','R128',489221823);
insert into enrolled values(112348546,'Database Systems');
insert into enrolled values(115987938,'Database Systems');
insert into enrolled values(348121549,'Database Systems');
insert into enrolled values(322654189,'Database Systems');
insert into enrolled values(552455318,'Database Systems');
insert into enrolled values(455798411,'Operating System Design');
insert into enrolled values(552455318,'Operating System Design');
insert into enrolled values(567354612,'Operating System Design');
insert into enrolled values(112348546,'Operating System Design');
insert into enrolled values(115987938,'Operating System Design');
insert into enrolled values(322654189,'Operating System Design');
insert into enrolled values(567354612,'Data Structures');
insert into enrolled values(552455318,'Communication Networks');
insert into enrolled values(455798411,'Optical Electronics');
insert into enrolled values(301221823,'Perception');
insert into enrolled values(301221823,'Social Cognition');
insert into enrolled values(301221823,'American Political Parties');
insert into enrolled values(556784565,'Air Quality Engineering');
insert into enrolled values(099354543,'Patent Law');
insert into enrolled values(574489456,'Urban Economics');

我的最佳尝试是正确的,即:

select age, level from (select age, level, count(level) as levelCount from student group by age, level order by age, levelCount desc) as counts group by age;

但它打破了标准 sql 的规则,其中每个选定的值都必须分组,我不使用最外层的 select 语句 level。我正在利用 mysql 的功能来返回非聚合数据。

问题是如何在符合标准 sql 最佳实践的同时返回从 group by 创建的每个组的最大计数。

select age, level, count(level) as levelCount from student group by age, level order by age, levelCount desc;

请,谢谢

【问题讨论】:

这是作业吗? 出于好奇,这有关系吗? 是的,因为取决于此,答案会有所不同。见meta.***.com/questions/334822/… 是的,这是作业。 伪代码对我来说工作得很好,如果这是你所关心的。 【参考方案1】:

这对 MySQL 来说是个痛点。一种方法是使用变量。但是,最简单的方法是使用group_concat()/substring_index()技巧:

select age,
       substring_index(group_concat(level order by levelCount desc), ',', 1) as mode_level
from (select age, level, count(level) as levelCount
      from student
      group by age, level
      order by age, levelCount desc
     ) as counts
group by age;

(统计上,您要查找的内容称为mode。)

注意:这是一个技巧。 group_concat() 的中间空间默认为 1,024 个字符,因此可能导致空间不足/溢出错误。这个限制很容易提高。

【讨论】:

我寻找了一些关于内置函数或其他东西的信息,但很高兴知道这一点。有点像 group by 中 max(count/sum/etc) 的限制 1。 那么一般来说,回答这个问题的另一种方法是什么?我觉得总体上可能有一种更简单的方法,但我掉进了 1 个兔子洞,然后就坚持了下来。 @BryanBirchmeier。 . .下一个最简单的方法是使用变量。最正确但最痛苦的方法是运行两次聚合。 出于好奇,这与我最初的做法有何不同?我汇总了两次,我假设您指的是与我所做的不同的结构。 @BryanBirchmeier。 . .您的第二种方法错误地使用了group by,因为select 中有未聚合的列。

以上是关于查找分组的最大计数的主要内容,如果未能解决你的问题,请参考以下文章

Python - 熊猫,分组和最大计数

sql如何求分组计数之后计数的最大值

与计数、最大值和分组依据相关

根据给定日期的最大状态计数,并包含分组数据

LINQ的求和 平均 最大 最小 分组 计数 等等

Mongoose - 查找具有最大计数的文档