Grouping and aggregation in SQL are used to group rows and perform calculations on them to summarize data. These operations are often performed together, especially when you're interested in summarizing large sets of data by certain categories, such as total sales per department or average salary per employee.
SQL provides several important functions and clauses to perform these operations, such as GROUP BY, aggregation functions, and filtering with HAVING.
The GROUP BY clause in SQL is used to group rows that have the same values in specified columns into summary rows, like finding the total number of employees in each department or calculating the average salary per department.
Syntax:
SELECT column1, column2, aggregation_function(column3)
FROM table_name
GROUP BY column1, column2;
Example:
Suppose we have an Employees table:
| EmployeeID | Name | DepartmentID | Salary |
|---|---|---|---|
| 1 | Alice | 1 | 50000 |
| 2 | Bob | 1 | 60000 |
| 3 | Charlie | 2 | 70000 |
| 4 | David | 2 | 80000 |
To find the total salary for each department, we would use the GROUP BY clause:
SELECT DepartmentID, SUM(Salary) AS TotalSalary
FROM Employees
GROUP BY DepartmentID;
Result:
| DepartmentID | TotalSalary |
|---|---|
| 1 | 110000 |
| 2 | 150000 |
Explanation:
DepartmentID.SUM(Salary) function calculates the total salary for each department.SQL provides several aggregate functions to perform calculations on data. These are often used in conjunction with GROUP BY to summarize data.
COUNT():
SELECT DepartmentID, COUNT(EmployeeID) AS NumberOfEmployees
FROM Employees
GROUP BY DepartmentID;
SUM():
SELECT DepartmentID, SUM(Salary) AS TotalSalary
FROM Employees
GROUP BY DepartmentID;
AVG():
SELECT DepartmentID, AVG(Salary) AS AverageSalary
FROM Employees
GROUP BY DepartmentID;
MIN():
SELECT DepartmentID, MIN(Salary) AS MinimumSalary
FROM Employees
GROUP BY DepartmentID;
MAX():
SELECT DepartmentID, MAX(Salary) AS MaximumSalary
FROM Employees
GROUP BY DepartmentID;
While the WHERE clause is used to filter rows before the aggregation, the HAVING clause is used to filter the results after aggregation. This allows you to apply conditions to the aggregated results.
Syntax:
SELECT column1, aggregation_function(column2)
FROM table_name
GROUP BY column1
HAVING aggregation_function(column2) condition;
Example: If you want to find departments where the total salary exceeds 100,000:
SELECT DepartmentID, SUM(Salary) AS TotalSalary
FROM Employees
GROUP BY DepartmentID
HAVING SUM(Salary) > 100000;
Result:
| DepartmentID | TotalSalary |
|---|---|
| 2 | 150000 |
Explanation:
HAVING clause filters out departments where the total salary is not greater than 100,000 after aggregation.You can group data by multiple columns. In such cases, the result will be grouped by the combination of the values in those columns.
Example:
Suppose we add a JobTitle column to the Employees table:
| EmployeeID | Name | DepartmentID | JobTitle | Salary |
|---|---|---|---|---|
| 1 | Alice | 1 | Manager | 50000 |
| 2 | Bob | 1 | Developer | 60000 |
| 3 | Charlie | 2 | Manager | 70000 |
| 4 | David | 2 | Developer | 80000 |
To find the total salary in each department for each job title:
SELECT DepartmentID, JobTitle, SUM(Salary) AS TotalSalary
FROM Employees
GROUP BY DepartmentID, JobTitle;
Result:
| DepartmentID | JobTitle | TotalSalary |
|---|---|---|
| 1 | Manager | 50000 |
| 1 | Developer | 60000 |
| 2 | Manager | 70000 |
| 2 | Developer | 80000 |
Explanation:
DepartmentID and JobTitle, so you get a summary of salary for each job title within each department.You can use the ORDER BY clause to sort the results after applying GROUP BY and aggregation functions.
Example: To get departments ordered by their total salary in descending order:
SELECT DepartmentID, SUM(Salary) AS TotalSalary
FROM Employees
GROUP BY DepartmentID
ORDER BY TotalSalary DESC;
Result:
| DepartmentID | TotalSalary |
|---|---|
| 2 | 150000 |
| 1 | 110000 |
Explanation:
ORDER BY clause sorts the results based on the aggregated TotalSalary in descending order.You can also use the DISTINCT keyword to remove duplicates from the results before performing aggregation. This is particularly useful if you want to count unique items.
Example: To find the number of distinct job titles in each department:
SELECT DepartmentID, COUNT(DISTINCT JobTitle) AS NumberOfDistinctJobTitles
FROM Employees
GROUP BY DepartmentID;
Result:
| DepartmentID | NumberOfDistinctJobTitles |
|---|---|
| 1 | 2 |
| 2 | 2 |
Explanation:
COUNT(DISTINCT JobTitle) counts the unique job titles in each department.In addition to GROUP BY, SQL also supports ROLLUP and CUBE operators, which allow you to compute multiple levels of aggregation.
SELECT DepartmentID, JobTitle, SUM(Salary) AS TotalSalary
FROM Employees
GROUP BY DepartmentID, JobTitle
WITH ROLLUP;
Result:
| DepartmentID | JobTitle | TotalSalary |
|---|---|---|
| 1 | Manager | 50000 |
| 1 | Developer | 60000 |
| 1 | NULL | 110000 |
| 2 | Manager | 70000 |
| 2 | Developer | 80000 |
| 2 | NULL | 150000 |
| NULL | NULL | 260000 |
Explanation:
ROLLUP adds subtotals for each department and a grand total at the bottom.SELECT DepartmentID, JobTitle, SUM(Salary) AS TotalSalary
FROM Employees
GROUP BY DepartmentID, JobTitle
WITH CUBE;
Result:
| DepartmentID | JobTitle | TotalSalary |
|---|---|---|
| 1 | Manager | 50000 |
| 1 | Developer | 60000 |
| 1 | NULL | 110000 |
| 2 | Manager | 70000 |
| 2 | Developer | 80000 |
| 2 | NULL | 150000 |
| NULL | Manager | 120000 |
| NULL | Developer | 140000 |
| NULL | NULL | 260000 |
Explanation:
CUBE produces all possible combinations of aggregations across both DepartmentID and JobTitle, including subtotals for each job title and each department, as well as the grand total.SQL grouping and aggregation are essential for summarizing and analyzing large sets of data. By using the GROUP BY clause with aggregation functions such as COUNT(), SUM(), AVG(), MIN(), and MAX(), you can generate meaningful insights from your data. Additionally, filtering grouped results with HAVING, ordering results with ORDER BY, and performing advanced operations like ROLLUP and CUBE enhance your ability to analyze and report data efficiently.
Open this section to load past papers