文章目录
  1. 1. 题目
  2. 2. 解题思路
  3. 3. 注意事项
  4. 4. 构造测试数据
  5. 5. 一种答案
  6. 6. 另一种答案

题目

#182 Duplicate Emails

解题思路

可以想到类似与 WHERE count(Email) > 1 之类的条件,但实际上 count 是一个聚合函数需要和 HAVING 配合使用;此外还需要 GROUP BY 分组。

注意事项

如果没有 GROUP BY 那么查询语句是 SELECT COUNT(Email), Email FROM Person,其结果是:

COUNT(Email) Email
3 a@b.com

因为聚合函数 COUNT 只会返回一行,所以结果只有一行。如果需要每组的 Email 个数,则需要 GROUP BY 先分组,而事实上使用聚合函数一般都会 GROUP BY

还需要注意的是标准 SQL 中,含有 GROUP BY 的语句中 SELECT 的列只能包含聚合函数(例如 MAX/MIN/SUM/COUNT) 和在 GROUP BY 中的列。由于 MySQL 对这个规则进行了扩展不会报错,但可能得到意想不到的结果。MySQL Handling of GROUP BY 中是这么说的:

MySQL extends the use of GROUP BY so that the select list can refer to nonaggregated columns not named in the GROUP BY clause. This means that the preceding query is legal in MySQL. You can use this feature to get better performance by avoiding unnecessary column sorting and grouping. However, this is useful primarily when all values in each nonaggregated column not named in the GROUP BY are the same for each group. The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate.

构造测试数据

1
2
3
4
5
6
7
8
9
10
11
12
CREATE TABLE IF NOT EXISTS Person (
Id INT,
Email VARCHAR(50)
);

DELETE FROM Person;

INSERT INTO Person VALUES
(1, 'a@b.com'),
(2, 'c@d.com'),
(3, 'a@b.com'),
(4, 'a@b.com');

预期结果:

1
2
3
4
5
+---------+
| Email |
+---------+
| a@b.com |
+---------+

一种答案

完整答案

1
2
-- Runtime: 984 ms
SELECT Email FROM Person GROUP BY Email HAVING count(Email) > 1

另一种答案

换个思路,第一步先找到某个 Email 中 Id 最小的一个。 查询语句及结果是:

1
2
3
4
5
SELECT 
MIN(Id) AS Id, Email
FROM
Person
GROUP BY Email
1
2
3
4
5
6
7
+----+---------+  
| Id | Email |
+----+---------+
| 1 | a@b.com |
+----+---------+
| 2 | c@d.com |
+----+---------+

第二步原来的 Person 表和第一步得到的临时表 LEFT JOIN,相当于从原理的 Person 表里减去所有 DISTINCT。如果 Email 是非重复的在 LEFT JOIN 之后就不存在了;如果是重复的,那么其重复值会少一个。得到结果的过程是:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
+----+---------+   减去   +----+---------+  
| Id | Email | | Id | Email |
+----+---------+ +----+---------+
| 1 | a@b.com | | 1 | a@b.com |
+----+---------+ +----+---------+
| 2 | c@d.com | | 2 | c@d.com |
+----+---------+ +----+---------+
| 3 | a@b.com |
+----+---------+
| 4 | a@b.com |
+----+---------+

等于

+----+---------+
| Id | Email |
+----+---------+
| 3 | a@b.com |
+----+---------+
| 4 | a@b.com |
+----+---------+

最后加一个 DISTINCT a.Email,把多个重复值去重。最后答案是

1
2
3
4
5
6
7
8
9
10
11
12
13
-- Runtime: 1128 ms
SELECT DISTINCT
a.Email
FROM
Person a
LEFT JOIN
(SELECT
MIN(Id) AS Id, Email
FROM
Person
GROUP BY Email) tmp ON a.Id = tmp.Id
WHERE
tmp.Id IS NULL

本博客微信公众号

文章目录
  1. 1. 题目
  2. 2. 解题思路
  3. 3. 注意事项
  4. 4. 构造测试数据
  5. 5. 一种答案
  6. 6. 另一种答案