관리 메뉴

nalaolla

MySQL - How To Get Top N Rows per Each Group 본문

MYSQL

MySQL - How To Get Top N Rows per Each Group

날아올라↗↗ 2018. 2. 7. 13:14
728x90

Sample Data

Assume we have the following table definition and data:

   CREATE TABLE cities
   (
      city VARCHAR(80),
      country VARCHAR(80),
      population INT
   );
 
   INSERT INTO cities VALUES ('New York', 'United States', 8175133); 
   INSERT INTO cities VALUES ('Los Angeles', 'United States', 3792621); 
   INSERT INTO cities VALUES ('Chicago', 'United States', 2695598); 
 
   INSERT INTO cities VALUES ('Paris', 'France', 2181000);
   INSERT INTO cities VALUES ('Marseille', 'France', 808000);
   INSERT INTO cities VALUES ('Lyon', 'France', 422000);
 
   INSERT INTO cities VALUES ('London', 'United Kingdom',	7825300);
   INSERT INTO cities VALUES ('Birmingham', 'United Kingdom', 1016800);
   INSERT INTO cities VALUES ('Leeds', 'United Kingdom', 770800);

MySQL Query to Get Top 2

To get the 2 largest cities for each country, you can use the following query in MySQL:

   SELECT city, country, population
   FROM
     (SELECT city, country, population, 
                  @country_rank := IF(@current_country = country, @country_rank + 1, 1) AS country_rank,
                  @current_country := country 
       FROM cities
       ORDER BY country, population DESC
     ) ranked
   WHERE country_rank <= 2;

The query returns:

citycountrypopulation
ParisFrance2181000
MarseilleFrance808000
LondonUnited Kingdom7825300
BirminghamUnited Kingdom1016800
New YorkUnited States8175133
Los AngelesUnited States3792621

How It Works

Explanation of the MySQL query:

  • Session Variables

Currently MySQL does not support ROW_NUMBER() function that can assign a sequence number within a group, but as a workaround we can use MySQL session variables.

These variables do not require declaration, and can be used in a query to do calculations and to store intermediate results.

@current_country := country

This code is executed for each row and stores the value of country column to @current_country variable.

@country_rank := IF(@current_country = country, @country_rank + 1, 1)

In this code, if @current_country is the same we increment rank, otherwise set it to 1. For the first row @current_country is NULL, so rank is also set to 1.

For correct ranking, we need to have ORDER BY country, population DESC

So if we just execute the subquery:

   SELECT city, country, population, 
       @country_rank := IF(@current_country = country, @country_rank + 1, 1) AS country_rank,
       @current_country := country 
  FROM cities
  ORDER BY country, population DESC

We get the list of cities ranked by population within the country:

citycountrypopulationcountry_rankcurrent_country
ParisFrance21810001France
MarseilleFrance8080002France
LyonFrance4220003France
LondonUnited Kingdom78253001United Kingdom
BirminghamUnited Kingdom10168002United Kingdom
LeedsUnited Kingdom7708003United Kingdom
New YorkUnited States81751331United States
Los AngelesUnited States37926212United States
ChicagoUnited States26955983United States
  • Selecting Range

When we have a rank assigned to each city within its country, we can retrieve the required range:

   -- Get top 2 for each country
   SELECT city, country, population
   FROM (/*subquery above*/) ranked
   WHERE country_rank <= 2;
 
   -- Get the city with 3rd population for each country
   SELECT city, country, population
   FROM (/*subquery above*/) ranked
   WHERE country_rank = 3;

ROW_NUMBER() - Oracle, SQL Server and PostgreSQL

In Oracle, SQL Server and PostgreSQL you can achieve the same functionality using ROW_NUMBER function:

   SELECT city, country, population
   FROM
    (SELECT city, country, population, 
                  ROW_NUMBER() OVER (PARTITION BY country ORDER BY population DESC) as country_rank
      FROM cities) ranked
   WHERE country_rank <= 2;

This query works in Oracle, SQL Server and PostgreSQL without any changes and returns:

citycountrypopulation
ParisFrance2181000
MarseilleFrance808000
LondonUnited Kingdom7825300
BirminghamUnited Kingdom1016800
New YorkUnited States8175133
Los AngelesUnited States3792621

Resources


728x90