Selecting Top N Records per Group

When you have multiple rows for let’s say a productID, there will probably be times when you need to select the top n subset of each those productid’s based upon a sort criteria. This is a relatively common TSQL need and there are a couple different methods to do this.

First let’s create some dummy data.

CREATE TABLE [#productzabub](
    [SalesOrderID] [INT] NOT NULL,
    [ProductID] [INT] NOT NULL
)
INSERT [#productzabub] ([SalesOrderID], [ProductID]) VALUES (71782, 707)
INSERT [#productzabub] ([SalesOrderID], [ProductID]) VALUES (71783, 707)
INSERT [#productzabub] ([SalesOrderID], [ProductID]) VALUES (71784, 707)
INSERT [#productzabub] ([SalesOrderID], [ProductID]) VALUES (71782, 708)
INSERT [#productzabub] ([SalesOrderID], [ProductID]) VALUES (71783, 708)
INSERT [#productzabub] ([SalesOrderID], [ProductID]) VALUES (71784, 708)
INSERT [#productzabub] ([SalesOrderID], [ProductID]) VALUES (71782, 711)
INSERT [#productzabub] ([SalesOrderID], [ProductID]) VALUES (71783, 711)
INSERT [#productzabub] ([SalesOrderID], [ProductID]) VALUES (71784, 711)
INSERT [#productzabub] ([SalesOrderID], [ProductID]) VALUES (71782, 712)
INSERT [#productzabub] ([SalesOrderID], [ProductID]) VALUES (71783, 712)
INSERT [#productzabub] ([SalesOrderID], [ProductID]) VALUES (71784, 712)
INSERT [#productzabub] ([SalesOrderID], [ProductID]) VALUES (71782, 714)
INSERT [#productzabub] ([SalesOrderID], [ProductID]) VALUES (71783, 714)
INSERT [#productzabub] ([SalesOrderID], [ProductID]) VALUES (71784, 714)
INSERT [#productzabub] ([SalesOrderID], [ProductID]) VALUES (71782, 715)
INSERT [#productzabub] ([SalesOrderID], [ProductID]) VALUES (71783, 715)
INSERT [#productzabub] ([SalesOrderID], [ProductID]) VALUES (71784, 715)
INSERT [#productzabub] ([SalesOrderID], [ProductID]) VALUES (71783, 716)
INSERT [#productzabub] ([SalesOrderID], [ProductID]) VALUES (71784, 716)

Now that we have our data created, let’s select the top 2 rows for each productid based upon the earliest salesorderid. This salesorderid could actually represent any sequential column in the table, be it a date, or a primary key.

The first method we’ll use is to select using the new window function ROW_NUMBER() introduced in SQL 2005. What we’re doing here is simply creating a numbering system that enumerates each instance of the productid ordered by the salesorderid. We then do another select on this query.

    SELECT TOP 20
        t.SalesOrderID,
        t.ProductID
    FROM (
      SELECT
        sod.SalesOrderID,
        sod.ProductID,
        rowid = ROW_NUMBER() OVER (PARTITION BY ProductID ORDER BY SalesOrderID)
      FROM #productzabub sod
    ) t
    WHERE rowid <= 2

The next method we’ll use is compatible with all versions of SQL. Here we are doing a self-join using a correlated subquery. Meaning we are correlating the productid from the outer query, with the productid from the inner query. We then select the top two salesorderid’s from each instance, constraining our outer result set to the top 2 rows for product.

    SELECT
        sod.SalesOrderID,
        sod.ProductID
    FROM #productzabub sod
    WHERE SalesOrderID IN
    (
        SELECT TOP 2 SalesOrderID
        FROM #productzabub
        WHERE ProductID = sod.ProductID
        ORDER BY SalesOrderID
    )

That’s pretty much it. Personally I prefer using the window functions when I can, they typically have a speed advantage over other methods because they don’t join anything, just sort by doing an order by internally. You may see other methods using a CTE. This is not actually another method, it’s simply the same as the windowed function execution-wise.

Featured Articles

 Site Author