Log In

Don't have an account? Sign up now

Lost Password?

Sign Up

Prev Next

Grouping Data in SQL

When working with large datasets, it is often more useful to analyze data in groups rather than viewing individual records. SQL provides the GROUP BY clause to group rows that share common values and apply aggregate functions such as COUNT, SUM, AVG, MIN, and MAX to each group. Grouping is a core concept in SQL reporting, analytics, and business intelligence.


GROUP BY Clause

The GROUP BY clause is used to group rows based on one or more columns. After grouping, aggregate functions operate on each group separately and return one summary row per group.


Example: Count Employees per Department

SELECT 
    department,
    COUNT(*) AS employee_count
FROM employees
GROUP BY department;

Explanation

  • The employees table is grouped by the department column.
  • All employees in the same department form one group.
  • COUNT(*) counts how many employees exist in each department.
  • The result shows one row per department, not individual employees.

Important GROUP BY Rule (Very Important for Interviews)

👉 Every column in the SELECT list must either:

  • Be included in the GROUP BY clause
    OR
  • Be used inside an aggregate function

Incorrect example:

SELECT department, salary
FROM employees
GROUP BY department;  -- ❌ ERROR

Correct example:

SELECT department, AVG(salary)
FROM employees
GROUP BY department;

GROUP BY with Multiple Columns

GROUP BY can use more than one column. In such cases, SQL creates groups based on unique combinations of the specified columns.


Example: Average Salary by Department and Job Title

SELECT 
    department,
    job_title,
    AVG(salary) AS avg_salary
FROM employees
GROUP BY department, job_title;

Explanation

  • Employees are grouped first by department
  • Then further grouped by job title within each department
  • Each department–job title combination produces one row
  • This gives more detailed insights than single-column grouping

HAVING Clause

The HAVING clause is used to filter groups, not individual rows. It is always used after GROUP BY and works with aggregate functions.


Example: Departments with More Than 5 Employees

SELECT 
    department,
    COUNT(*) AS employee_count
FROM employees
GROUP BY department
HAVING COUNT(*) > 5;

Explanation

  • GROUP BY creates department-wise groups
  • COUNT(*) calculates employees in each group
  • HAVING removes groups with 5 or fewer employees
  • Final output shows only departments with more than 5 employees

WHERE vs HAVING (Critical Concept)

Although WHERE and HAVING both filter data, they operate at different stages of query execution.


Example: Combined WHERE and HAVING

SELECT 
    department,
    AVG(salary) AS avg_salary
FROM employees
WHERE hire_date > '2019-01-01'
GROUP BY department
HAVING AVG(salary) > 55000;

Explanation

  • WHERE filters rows first (only employees hired after 2019)
  • GROUP BY groups the remaining rows by department
  • HAVING filters grouped results based on average salary

Key Differences: WHERE vs HAVING

FeatureWHEREHAVING
FiltersRowsGroups
Used withIndividual columnsAggregate functions
Execution timeBefore GROUP BYAfter GROUP BY
Can use aggregates❌ No✔ Yes

Logical Execution Order of SQL Query (Important)

SQL does not execute queries in the order written.

Actual execution order:

  1. FROM
  2. WHERE
  3. GROUP BY
  4. HAVING
  5. SELECT
  6. ORDER BY
  7. LIMIT

Understanding this helps avoid logical mistakes.


GROUP BY with ORDER BY

You can sort grouped results using ORDER BY.

Example:

SELECT department, COUNT(*) AS emp_count
FROM employees
GROUP BY department
ORDER BY emp_count DESC;

GROUP BY with DISTINCT

  • DISTINCT is usually unnecessary with GROUP BY
  • GROUP BY already creates unique groups

Real-World Use Cases

  • Department-wise employee count
  • Average salary by role
  • Sales analysis by region and product
  • Monthly revenue reports
  • Job portal analytics
  • HR and payroll dashboards

Common Beginner Mistakes

  • Forgetting GROUP BY for non-aggregated columns
  • Using WHERE instead of HAVING for aggregates
  • Using column aliases in HAVING (DB-dependent)
  • Grouping too many columns unnecessarily
  • Expecting row-level output from grouped queries

Best Practices

  • Use clear and meaningful aliases
  • Filter rows early using WHERE
  • Use HAVING only for aggregated conditions
  • Avoid unnecessary columns in GROUP BY
  • Format queries for readability

Leave a Comment

    🚀 Join Common Jobs Pro — Referrals & Profile Visibility Join Now ×
    🔥