Monday, February 9, 2015

SQL: GROUP BY And HAVING














SQL GROUPING allows you to segregate data into groups so that you can work on it separately from the rest of the table records. Let's say you want to get the number of products in a category in the Northwind database Products table. You would write the following query:
 SELECT COUNT(*) NumberOfProductsByCategory
FROM Products
GROUP BY CategoryID
The query above gives you the following results:

The query gives you the number of products in each category, however it's not very useful. You don't really know what category the count is for in each record. You might want to try to change the query into something like this:
SELECT CategoryID,COUNT(*) NumberOfProductsByCategory
FROM Products
GROUP BY CategoryID

The above query is more useful the preceding one, however it only gives you the CategoryID number not the CategoryName in the Categories table. Being the perfectionist that you are you say to yourself, I can do better. "Yes, I can". I think that was a campaign slogan. So you try a join like this:
SELECT c.CategoryName,c.CategoryID
RIGHT JOIN Categories AS c ON c.CategoryID = p.CategoryID
The above query joins the Categories table with the Products table to be able to select from both tables therefore giving the ability to select the CategoryName field.
You probably think to yourself, hey if I can get the CategoryName with the RIGHT JOIN I can just add the GROUP BY filter and be done with it. Yah, my Shaolin Master from the SQL Wing will be proud. So you write the following query:
SELECT c.CategoryName,p.CategoryID,COUNT(*) AS NumberOfProductsByCategory
FROM Products p
RIGHT JOIN Categories AS c ON p.CategoryID = c.CategoryID
GROUP BY c.CategoryID
But behold you get an error! OMG! !@@##$%##$% Luckily your Shaolin Master walked into your room and saw that you were distraught. Your master ask you why you so distraught? You tell the Master, I don't understand, the join was working by itself but when I add the GROUP BY it throws me this stupid SQL error:

Column 'Categories.CategoryName' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.

Your master says, if the expression is used in the SELECT statement then the same expression must be used in the GROUP BY part of the statement. You think ah ha! I can just add the columns in the SELECT list of columns to the GROUP BY filtering in the query. So you write:
SELECT c.CategoryID,c.CategoryName,COUNT(*) AS NumberOfProductsByCategory 
FROM Products p
RIGHT JOIN Categories AS c ON c.CategoryID = p.CategoryID
GROUP BY c.CategoryID,c.CategoryName
And like magic the result came to your screen. Tears of joy were streaming down your face because you've gotten the results that you were looking for.


Option 2: You can select what you want from the Products table and then use the GROUP BY in another select in a RIGHT JOIN to the get the COUNT() for each category. You will get the correct results with this query as well.
SELECT c.CategoryID,c.CategoryName,NumberOfProductsByCategory
FROM Categories AS c
RIGHT JOIN (SELECT CategoryID, COUNT(*) AS NumberOfProductsByCategory
FROM Products GROUP BY CategoryID) AS p ON p.CategoryID = c.CategoryID


As you have noticed the GROUP BY queries are missing the WHERE clause. What happened to the powerful WHERE clause? The reason we don't have a where clause is because WHERE does not work with GROUP BY results. That's where HAVING comes in. HAVING takes care of the filtering of the GROUP BY, while the WHERE clause takes care of the non GROUP BY results. For example let's say you want to get the categories with the most expensive products you can write a query that filters the product price with the WHERE clause and then filter the group data with the HAVING clause. You can write the following query to get the expensive products and their categories:
SELECT c.CategoryID,c.CategoryName,COUNT(*) AS NumberOfProductsByCategory 
FROM Products p
RIGHT JOIN Categories AS c ON c.CategoryID = p.CategoryID
WHERE p.UnitPrice > 10
GROUP BY c.CategoryID,c.CategoryName
HAVING COUNT(*) >=5
ORDER BY NumberOfProductsByCategory DESC
The query above gets products that are more than $10, and gets the categories that has more than 9 products. The two filters are separate but they work together nicely with the WHERE and HAVING clause.

No comments:

Post a Comment