PostgreSQL GROUP BY

When you need to summarize your data you will often need to use the GROUP BY clause in your SQL statement. The GROUP BY clause simply divide your data into groups and allows you to alow apply aggregates. Let's take a look at a very simple SQL statement with a GROUP BY.

SELECT dept FROM employee GROUP BY dept;

+-------------+
|    dept     |
+-------------+
| Finance     |
| Engineering |
| Sales       |
+-------------+

The same results can also be obtained by using the DISTINCT clause. In this case the GROUP BY is simply giving you a distinct list of departments. What's probably more helpful is counting the number of employees per department. Adding a count(*) will provide this information.

SELECT COUNT(*) AS cnt, dept 
FROM employee GROUP BY dept;

+-------+-------------+
| count |    dept     |
+-------+-------------+
|     4 | Finance     |
|     2 | Engineering |
|     3 | Sales       |
+-------+-------------+

You can combine as many aggregate functions as you like without needing to change your GROUP BY columns.

SELECT COUNT(*) AS cnt, 
SUM(salary) as total_salary, 
MAX(salary) as highest_salary, 
dept 
FROM employee GROUP BY dept;

+-------+--------------+----------------+-------------+
| count | totla_salary | highest_salary |    dept     |
+-------+--------------+----------------+-------------+
|     4 |      8697.30 |        3638.42 | Finance     |
|     2 |      4666.10 |        2496.78 | Engineering |
|     3 |      3388.55 |        1195.98 | Sales       |
+-------+--------------+----------------+-------------+

If you add additional columns (without an aggregate) to your SELECT statement you will also need to add them to your GROUP BY clause.

SELECT COUNT(*), 
SUM(salary) as total_salary, 
MAX(salary) as highest_salary, 
dept, 
last_name 
FROM employee 
GROUP BY dept;
ERROR:  42803: column "employee.last_name" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: ...otla_salary, MAX(salary) as highest_salary, dept, last_name ...
                                                             ^
LOCATION:  check_ungrouped_columns_walker, parse_agg.c:1388
Time: 2.162 ms

SELECT COUNT(*), 
SUM(salary) as total_salary, 
MAX(salary) as highest_salary, 
dept, 
last_name 
FROM employee 
GROUP BY dept, 
last_name;

+-------+--------------+----------------+-------------+-----------+
| count | totla_salary | highest_salary |    dept     | last_name |
+-------+--------------+----------------+-------------+-----------+
|     1 |      1195.98 |        1195.98 | Sales       | Smith     |
|     1 |      1034.23 |        1034.23 | Sales       | Jones     |
|     2 |      3113.46 |        1658.00 | Finance     | Lopez     |
|     1 |      2169.32 |        2169.32 | Engineering | Greenburg |
|     1 |      3638.42 |        3638.42 | Finance     | Thompson  |
|     1 |      1945.42 |        1945.42 | Finance     | Doe       |
|     1 |      1158.34 |        1158.34 | Sales       | Anderson  |
|     1 |      2496.78 |        2496.78 | Engineering | Teal      |
+-------+--------------+----------------+-------------+-----------+

Next: PostgreSQL HAVING