SQL核心四剑客:WHERE、ORDER BY、GROUP BY、HAVING 实战全解析
在数据库操作中,SQL查询是最基础也是最重要的技能。无论是数据分析师、后端开发工程师还是数据库管理员,都需要熟练掌握SQL的各种查询语法。本文将深入解析SQL中最常用的四种查询语法:WHERE、ORDER BY、GROUP BY和HAVING,通过实际例子帮助大家彻底掌握这些核心技能。
数据准备
在开始详细讲解之前,我们先创建一个示例数据表,用于后续的演示:
-- 创建员工表
CREATE TABLE employees (
id SERIAL PRIMARY KEY,
name VARCHAR(100) NOT NULL,
department VARCHAR(50),
salary DECIMAL(10, 2),
hire_date DATE,
performance_rating INTEGER
);
-- 插入示例数据
INSERT INTO employees (name, department, salary, hire_date, performance_rating) VALUES
('张三', '技术部', 15000.00, '2020-03-15', 4),
('李四', '技术部', 18000.00, '2019-07-20', 5),
('王五', '销售部', 12000.00, '2021-01-10', 3),
('赵六', '销售部', 14000.00, '2020-11-05', 4),
('钱七', '人事部', 10000.00, '2022-03-01', 3),
('孙八', '技术部', 16000.00, '2020-08-12', 4),
('周九', '财务部', 13000.00, '2021-06-18', 4),
('吴十', '销售部', 11000.00, '2022-02-28', 2);
WHERE:数据筛选的艺术
WHERE子句是SQL中最基础也是使用最频繁的筛选工具,它允许我们从表中筛选出满足特定条件的记录。
基本用法
-- 查询技术部的所有员工
SELECT * FROM employees
WHERE department = '技术部';
-- 查询薪资超过15000的员工
SELECT name, salary FROM employees
WHERE salary > 15000;
-- 查询绩效评级为4或5的员工
SELECT name, performance_rating FROM employees
WHERE performance_rating IN (4, 5);
复杂条件组合
WHERE子句支持多种逻辑运算符,可以构建复杂的查询条件:
-- 查询技术部且薪资大于15000的员工
SELECT name, department, salary
FROM employees
WHERE department = '技术部' AND salary > 15000;
-- 查询2020年之后入职且绩效评级在3以上的员工
SELECT name, hire_date, performance_rating
FROM employees
WHERE hire_date >= '2020-01-01' AND performance_rating >= 3;
-- 查询销售部或人事部的员工
SELECT name, department
FROM employees
WHERE department IN ('销售部', '人事部');
-- 使用LIKE进行模糊查询
SELECT name, department
FROM employees
WHERE name LIKE '张%'; -- 查询姓张的员工
范围查询和空值处理
-- 查询薪资在12000到16000之间的员工
SELECT name, salary
FROM employees
WHERE salary BETWEEN 12000 AND 16000;
-- 查询没有绩效评级的员工(假设有些员工performance_rating为NULL)
SELECT name
FROM employees
WHERE performance_rating IS NULL;
ORDER BY:让数据有序排列
ORDER BY子句用于对查询结果进行排序,可以按照一个或多个字段进行升序或降序排列。
单字段排序
-- 按薪资从低到高排序
SELECT name, salary
FROM employees
ORDER BY salary ASC;
-- 按入职日期从新到旧排序
SELECT name, hire_date
FROM employees
ORDER BY hire_date DESC;
-- 默认是升序排序,ASC可以省略
SELECT name, salary
FROM employees
ORDER BY salary;
多字段排序
在实际应用中,经常需要按照多个字段进行排序:
-- 先按部门升序,再按薪资降序
SELECT name, department, salary
FROM employees
ORDER BY department ASC, salary DESC;
-- 按绩效评级降序,相同评级按入职日期升序
SELECT name, performance_rating, hire_date
FROM employees
ORDER BY performance_rating DESC, hire_date ASC;
结合WHERE使用
ORDER BY经常与WHERE子句结合使用,先筛选再排序:
-- 查询技术部员工,按薪资降序排列
SELECT name, salary
FROM employees
WHERE department = '技术部'
ORDER BY salary DESC;
-- 查询薪资超过13000的员工,按部门和薪资排序
SELECT name, department, salary
FROM employees
WHERE salary > 13000
ORDER BY department, salary DESC;
GROUP BY:数据聚合分析
GROUP BY子句用于将数据按照一个或多个字段进行分组,通常与聚合函数一起使用。
基本分组统计
-- 统计每个部门的员工数量
SELECT department, COUNT(*) as employee_count
FROM employees
GROUP BY department;
-- 计算每个部门的平均薪资
SELECT department, AVG(salary) as avg_salary
FROM employees
GROUP BY department;
-- 统计每个部门的最高和最低薪资
SELECT department,
MAX(salary) as max_salary,
MIN(salary) as min_salary
FROM employees
GROUP BY department;
多字段分组
-- 按部门和绩效评级分组统计
SELECT department, performance_rating, COUNT(*) as count
FROM employees
GROUP BY department, performance_rating
ORDER BY department, performance_rating;
结合WHERE使用
可以在分组前先进行数据筛选:
-- 统计2020年之后入职的员工在每个部门的数量
SELECT department, COUNT(*) as count
FROM employees
WHERE hire_date >= '2020-01-01'
GROUP BY department;
HAVING:分组后的筛选
HAVING子句用于对分组后的结果进行筛选,它与WHERE的区别在于:WHERE在分组前筛选记录,HAVING在分组后筛选分组。
基本用法
-- 查询员工数量超过2人的部门
SELECT department, COUNT(*) as employee_count
FROM employees
GROUP BY department
HAVING COUNT(*) > 2;
-- 查询平均薪资超过14000的部门
SELECT department, AVG(salary) as avg_salary
FROM employees
GROUP BY department
HAVING AVG(salary) > 14000;
复杂条件筛选
-- 查询平均薪资超过13000且员工数量至少2人的部门
SELECT department,
AVG(salary) as avg_salary,
COUNT(*) as employee_count
FROM employees
GROUP BY department
HAVING AVG(salary) > 13000 AND COUNT(*) >= 2;
-- 查询绩效评级平均值在3.5以上的部门
SELECT department, AVG(performance_rating) as avg_rating
FROM employees
GROUP BY department
HAVING AVG(performance_rating) > 3.5;
综合实战案例
现在让我们把这些语法组合起来,解决一些实际的业务问题。
案例1:部门绩效分析
-- 分析每个部门的绩效情况,只显示有2名以上员工且平均绩效在3.5以上的部门
SELECT
department,
COUNT(*) as total_employees,
AVG(performance_rating) as avg_performance,
AVG(salary) as avg_salary
FROM employees
GROUP BY department
HAVING COUNT(*) > 2 AND AVG(performance_rating) > 3.5
ORDER BY avg_performance DESC;
案例2:员工薪资分布分析
-- 分析不同绩效评级下的薪资分布
SELECT
performance_rating,
COUNT(*) as employee_count,
ROUND(AVG(salary), 2) as avg_salary,
MIN(salary) as min_salary,
MAX(salary) as max_salary
FROM employees
WHERE performance_rating IS NOT NULL
GROUP BY performance_rating
HAVING COUNT(*) >= 1
ORDER BY performance_rating DESC;
案例3:年度招聘分析
-- 分析每年的招聘情况和员工绩效
SELECT
EXTRACT(YEAR FROM hire_date) as hire_year,
COUNT(*) as hired_count,
AVG(performance_rating) as avg_performance,
AVG(salary) as avg_salary
FROM employees
GROUP BY EXTRACT(YEAR FROM hire_date)
HAVING COUNT(*) >= 1
ORDER BY hire_year;
性能优化建议
在使用这些查询语法时,需要注意性能优化:
- 索引优化:为WHERE、ORDER BY、GROUP BY中经常使用的字段创建索引
- 避免在WHERE子句中对字段进行函数操作:这会导致索引失效
- 合理使用HAVING:尽量在WHERE中完成筛选,减少HAVING的处理数据量
- 选择合适的分组字段:分组字段越多,查询性能越低
-- 不推荐的写法(在WHERE中使用函数)
SELECT * FROM employees
WHERE YEAR(hire_date) = 2020;
-- 推荐的写法
SELECT * FROM employees
WHERE hire_date >= '2020-01-01' AND hire_date < '2021-01-01';
总结
WHERE、ORDER BY、GROUP BY和HAVING是SQL查询中的四个核心语法,它们各自承担着不同的功能:
- WHERE:数据筛选,在分组前过滤记录
- ORDER BY:结果排序,决定数据的显示顺序
- GROUP BY:数据分组,用于聚合分析
- HAVING:分组后筛选,对分组结果进行过滤
掌握这些语法的正确使用方法和适用场景,能够大大提高我们处理数据的效率和准确性。在实际工作中,这些语法往往会组合使用,形成复杂的查询语句来解决各种业务问题。
希望通过本文的详细讲解和丰富示例,大家能够对这些核心SQL语法有更深入的理解,并在实际工作中灵活运用。