插件窝 干货文章 掌握 SQL GROUP BY:组织和汇总数据

掌握 SQL GROUP BY:组织和汇总数据

department 分组 GROUP SELECT 274    来源:    2025-03-29

SQL GROUP BY 详解:数据分组与汇总

GROUP BY 基础概念

GROUP BY 是 SQL 中用于将数据按照一个或多个列进行分组的子句,通常与聚合函数(如 COUNT, SUM, AVG 等)一起使用。

基本语法

SELECT column1, aggregate_function(column2)
FROM table_name
WHERE condition
GROUP BY column1;

GROUP BY 核心功能

1. 数据分组

按照指定列将数据分成多个组:

-- 按部门分组
SELECT department, COUNT(*) as employee_count
FROM employees
GROUP BY department;

2. 多列分组

可以按多个列进行分组:

-- 按部门和职位分组
SELECT department, job_title, COUNT(*) as count
FROM employees
GROUP BY department, job_title;

3. 与聚合函数结合

GROUP BY 常与以下聚合函数配合使用:

  • COUNT() - 计数
  • SUM() - 求和
  • AVG() - 平均值
  • MAX() - 最大值
  • MIN() - 最小值
-- 计算每个部门的平均薪资
SELECT department, AVG(salary) as avg_salary
FROM employees
GROUP BY department;

高级 GROUP BY 用法

1. HAVING 子句

HAVING 用于过滤分组后的结果:

-- 筛选员工数超过5人的部门
SELECT department, COUNT(*) as employee_count
FROM employees
GROUP BY department
HAVING COUNT(*) > 5;

2. GROUP BY 与 ORDER BY 结合

-- 按部门平均薪资降序排列
SELECT department, AVG(salary) as avg_salary
FROM employees
GROUP BY department
ORDER BY avg_salary DESC;

3. ROLLUP 和 CUBE

  • ROLLUP 生成分组的小计和总计行
  • CUBE 生成所有可能的分组组合
-- 使用 ROLLUP 生成小计
SELECT department, job_title, SUM(salary) as total_salary
FROM employees
GROUP BY ROLLUP(department, job_title);

常见问题与解决方案

问题1:SELECT 列表中的非聚合列

错误示例:

-- 错误:name 不在 GROUP BY 子句中
SELECT department, name, AVG(salary)
FROM employees
GROUP BY department;

解决方案:

-- 正确:要么包含在 GROUP BY 中,要么使用聚合函数
SELECT department, STRING_AGG(name, ', ') as employees, AVG(salary)
FROM employees
GROUP BY department;

问题2:WHERE 与 HAVING 混淆

-- WHERE 在分组前过滤行
SELECT department, AVG(salary)
FROM employees
WHERE hire_date > '2020-01-01'
GROUP BY department;

-- HAVING 在分组后过滤组
SELECT department, AVG(salary)
FROM employees
GROUP BY department
HAVING AVG(salary) > 50000;

问题3:处理 NULL 值分组

GROUP BY 会将 NULL 值视为相同的分组:

-- NULL 部门会被分到同一组
SELECT department, COUNT(*)
FROM employees
GROUP BY department;

性能优化建议

  1. 索引优化:为 GROUP BY 使用的列创建索引
  2. 减少分组列:只选择必要的分组列
  3. 过滤数据:先使用 WHERE 减少数据量再进行分组
  4. 避免复杂计算:不要在 GROUP BY 中使用复杂表达式

实际应用示例

销售数据分析

-- 按产品和年份分析销售额
SELECT 
    product_id,
    EXTRACT(YEAR FROM sale_date) as year,
    SUM(amount) as total_sales,
    COUNT(*) as transaction_count
FROM sales
GROUP BY product_id, EXTRACT(YEAR FROM sale_date)
ORDER BY product_id, year;

网站访问统计

-- 按日期和页面统计访问量
SELECT 
    visit_date,
    page_url,
    COUNT(DISTINCT user_id) as unique_visitors,
    COUNT(*) as page_views
FROM web_visits
GROUP BY visit_date, page_url;

GROUP BY 是 SQL 中强大的数据汇总工具,掌握它能显著提高数据分析的效率和质量。