COMP2114›Grouping and Aggregation in SQL

Database SystemsTopic 30 of 34

Grouping and Aggregation in SQL

6 minread

1,070words

Intermediatelevel

Grouping and Aggregation in SQL

Grouping and aggregation in SQL are used to group rows and perform calculations on them to summarize data. These operations are often performed together, especially when you're interested in summarizing large sets of data by certain categories, such as total sales per department or average salary per employee.

SQL provides several important functions and clauses to perform these operations, such as GROUP BY, aggregation functions, and filtering with HAVING.

1. The GROUP BY Clause

The GROUP BY clause in SQL is used to group rows that have the same values in specified columns into summary rows, like finding the total number of employees in each department or calculating the average salary per department.

Syntax:

SELECT column1, column2, aggregation_function(column3)
FROM table_name
GROUP BY column1, column2;

Example: Suppose we have an Employees table:

EmployeeID	Name	DepartmentID	Salary
1	Alice	1	50000
2	Bob	1	60000
3	Charlie	2	70000
4	David	2	80000

To find the total salary for each department, we would use the GROUP BY clause:

SELECT DepartmentID, SUM(Salary) AS TotalSalary
FROM Employees
GROUP BY DepartmentID;

Result:

DepartmentID	TotalSalary
1	110000
2	150000

Explanation:

The query groups rows by DepartmentID.
The SUM(Salary) function calculates the total salary for each department.

2. Aggregation Functions

SQL provides several aggregate functions to perform calculations on data. These are often used in conjunction with GROUP BY to summarize data.

Common Aggregate Functions:

COUNT():
- Purpose: Returns the number of rows in a group.
- Example: To count how many employees are in each department:
```
SELECT DepartmentID, COUNT(EmployeeID) AS NumberOfEmployees
FROM Employees
GROUP BY DepartmentID;
```
SUM():
- Purpose: Returns the sum of values in a specified column.
- Example: To calculate the total salary in each department:
```
SELECT DepartmentID, SUM(Salary) AS TotalSalary
FROM Employees
GROUP BY DepartmentID;
```
AVG():
- Purpose: Returns the average value of a specified column.
- Example: To find the average salary per department:
```
SELECT DepartmentID, AVG(Salary) AS AverageSalary
FROM Employees
GROUP BY DepartmentID;
```
MIN():
- Purpose: Returns the minimum value in a specified column.
- Example: To find the lowest salary in each department:
```
SELECT DepartmentID, MIN(Salary) AS MinimumSalary
FROM Employees
GROUP BY DepartmentID;
```
MAX():
- Purpose: Returns the maximum value in a specified column.
- Example: To find the highest salary in each department:
```
SELECT DepartmentID, MAX(Salary) AS MaximumSalary
FROM Employees
GROUP BY DepartmentID;
```

3. Filtering Groups with HAVING Clause

While the WHERE clause is used to filter rows before the aggregation, the HAVING clause is used to filter the results after aggregation. This allows you to apply conditions to the aggregated results.

Syntax:

SELECT column1, aggregation_function(column2)
FROM table_name
GROUP BY column1
HAVING aggregation_function(column2) condition;

Example: If you want to find departments where the total salary exceeds 100,000:

SELECT DepartmentID, SUM(Salary) AS TotalSalary
FROM Employees
GROUP BY DepartmentID
HAVING SUM(Salary) > 100000;

Result:

DepartmentID	TotalSalary
2	150000

Explanation:

The HAVING clause filters out departments where the total salary is not greater than 100,000 after aggregation.

4. GROUP BY with Multiple Columns

You can group data by multiple columns. In such cases, the result will be grouped by the combination of the values in those columns.

Example: Suppose we add a JobTitle column to the Employees table:

EmployeeID	Name	DepartmentID	JobTitle	Salary
1	Alice	1	Manager	50000
2	Bob	1	Developer	60000
3	Charlie	2	Manager	70000
4	David	2	Developer	80000

To find the total salary in each department for each job title:

SELECT DepartmentID, JobTitle, SUM(Salary) AS TotalSalary
FROM Employees
GROUP BY DepartmentID, JobTitle;

Result:

DepartmentID	JobTitle	TotalSalary
1	Manager	50000
1	Developer	60000
2	Manager	70000
2	Developer	80000

Explanation:

The query groups the data by both DepartmentID and JobTitle, so you get a summary of salary for each job title within each department.

5. Ordering Grouped Results

You can use the ORDER BY clause to sort the results after applying GROUP BY and aggregation functions.

Example: To get departments ordered by their total salary in descending order:

SELECT DepartmentID, SUM(Salary) AS TotalSalary
FROM Employees
GROUP BY DepartmentID
ORDER BY TotalSalary DESC;

Result:

DepartmentID	TotalSalary
2	150000
1	110000

Explanation:

The ORDER BY clause sorts the results based on the aggregated TotalSalary in descending order.

6. Combining GROUP BY with DISTINCT

You can also use the DISTINCT keyword to remove duplicates from the results before performing aggregation. This is particularly useful if you want to count unique items.

Example: To find the number of distinct job titles in each department:

SELECT DepartmentID, COUNT(DISTINCT JobTitle) AS NumberOfDistinctJobTitles
FROM Employees
GROUP BY DepartmentID;

Result:

DepartmentID	NumberOfDistinctJobTitles
1	2
2	2

Explanation:

The COUNT(DISTINCT JobTitle) counts the unique job titles in each department.

7. The ROLLUP and CUBE Operators

In addition to GROUP BY, SQL also supports ROLLUP and CUBE operators, which allow you to compute multiple levels of aggregation.

ROLLUP: Provides subtotals and a grand total by grouping values progressively from the most detailed to the least detailed.
CUBE: Computes all possible combinations of aggregates for the grouped columns, producing a more exhaustive set of summaries.

ROLLUP Example:

SELECT DepartmentID, JobTitle, SUM(Salary) AS TotalSalary
FROM Employees
GROUP BY DepartmentID, JobTitle
WITH ROLLUP;

Result:

DepartmentID	JobTitle	TotalSalary
1	Manager	50000
1	Developer	60000
1	NULL	110000
2	Manager	70000
2	Developer	80000
2	NULL	150000
NULL	NULL	260000

Explanation:

The ROLLUP adds subtotals for each department and a grand total at the bottom.

CUBE Example:

SELECT DepartmentID, JobTitle, SUM(Salary) AS TotalSalary
FROM Employees
GROUP BY DepartmentID, JobTitle
WITH CUBE;

Result:

DepartmentID	JobTitle	TotalSalary
1	Manager	50000
1	Developer	60000
1	NULL	110000
2	Manager	70000
2	Developer	80000
2	NULL	150000
NULL	Manager	120000
NULL	Developer	140000
NULL	NULL	260000

Explanation:

The CUBE produces all possible combinations of aggregations across both DepartmentID and JobTitle, including subtotals for each job title and each department, as well as the grand total.

Conclusion

SQL grouping and aggregation are essential for summarizing and analyzing large sets of data. By using the GROUP BY clause with aggregation functions such as COUNT(), SUM(), AVG(), MIN(), and MAX(), you can generate meaningful insights from your data. Additionally, filtering grouped results with HAVING, ordering results with ORDER BY, and performing advanced operations like ROLLUP and CUBE enhance your ability to analyze and report data efficiently.

Past Papers

Open this section to load past papers

Click on Show Past Papers to see past papers.

EmployeeID

Name

DepartmentID

Salary

Alice

50000

Bob

60000

Charlie

70000

David

80000

DepartmentID

TotalSalary

110000

150000

DepartmentID

TotalSalary

150000

EmployeeID

Name

DepartmentID

JobTitle

Salary

Alice

Manager

50000

Bob

Developer

60000

Charlie

Manager

70000

David

Developer

80000

DepartmentID

JobTitle

TotalSalary

Manager

50000

Developer

60000

Manager

70000

Developer

80000

DepartmentID

TotalSalary

150000

110000

DepartmentID

NumberOfDistinctJobTitles

DepartmentID

JobTitle

TotalSalary

Manager

50000

Developer

60000

NULL

110000

Manager

70000

Developer

80000

NULL

150000

NULL

260000

DepartmentID

JobTitle

TotalSalary

Manager

50000

Developer

60000

NULL

110000

Manager

70000

Developer

80000

NULL

150000

NULL

Manager

120000

NULL

Developer

140000

NULL

260000