Joining Data in SQL

Dr. Chester Ismay - DataCamp


Course Description

Now that you’ve learned the basics of SQL in our Introduction to SQL course, it’s time to supercharge your queries using joins and relational set theory. In this course, you’ll learn all about the power of joining tables while exploring interesting features of countries and their cities throughout the world. You will master inner and outer joins, as well as self joins, semi joins, anti joins and cross joins—fundamental tools in any PostgreSQL wizard’s toolbox. Never fear set theory again after learning all about unions, intersections, and except clauses through easy-to-understand diagrams and examples. Lastly, you’ll be introduced to the challenging topic of subqueries. You will be able to visually grasp these ideas by using Venn diagrams and other linking illustrations. Should there be further discussion, please contact us via email: .

1 Introduction to joins

In this chapter, you’ll be introduced to the concept of joining tables, and will explore the different ways you can enrich your queries using inner joins and self joins. You’ll also see how to use the case statement to split up a field into different categories.

1.1 Introduction to INNER JOIN

1.1.1 Inner join

Although this course focuses on PostgreSQL, you’ll find that these joins and the material here applies to different forms of SQL as well.

Throughout this course, you’ll be working with the countries database containing information about the most populous world cities as well as country-level economic data, population data, and geographic data. This countries database also contains information on languages spoken in each country.

You can see the different tables in this database by clicking on the corresponding tabs. Click through them to get a sense for the types of data that each table contains before you continue with the course! Take note of the fields that appear to be shared across the tables.

Recall from the video the basic syntax for an INNER JOIN, here including all columns in both tables:

SELECT *
FROM left_table
INNER JOIN right_table
ON left_table.id = right_table.id;

You’ll start off with a SELECT statement and then build up to an INNER JOIN with the cities and countries tables. Let’s get to it!

Begin by selecting all columns from the cities table.

-- Select all columns from cities
SELECT *
FROM cities;
Table 1.1: Displaying records 1 - 10
name country_code city_proper_pop metroarea_pop urbanarea_pop
Abidjan CIV 4765000 NA 4765000
Abu Dhabi ARE 1145000 NA 1145000
Abuja NGA 1235880 6000000 1235880
Accra GHA 2070463 4010054 2070463
Addis Ababa ETH 3103673 4567857 3103673
Ahmedabad IND 5570585 NA 5570585
Alexandria EGY 4616625 NA 4616625
Algiers DZA 3415811 5000000 3415811
Almaty KAZ 1703481 NA 1703481
Ankara TUR 5271000 4585000 5271000
  • Inner join the cities table on the left to the countries table on the right, keeping all of the fields in both tables.
  • You should match the tables on the country_code field in cities and the code field in countries.
  • Do not alias your tables here or in the next step. Using cities and countries is fine for now.
  • SELECT * 
    FROM cities
      -- Inner join to countries
      INNER JOIN countries
        -- Match on the country codes
        ON cities.country_code = countries.code;
    Table 1.2: Displaying records 1 - 10
    name country_code city_proper_pop metroarea_pop urbanarea_pop code country_name continent region surface_area indep_year local_name gov_form capital cap_long cap_lat name
    Kabul AFG 3414100 NA 3414100 AFG Afghanistan Asia Southern and Central Asia 652090 1919 Afganistan/Afqanestan Islamic Emirate Kabul 69.17610 34.52280 Afghanistan
    Algiers DZA 3415811 5000000 3415811 DZA Algeria Africa Northern Africa 2381740 1962 Al-Jaza’ir/Algerie Republic Algiers 3.05097 36.73970 Algeria
    Oran DZA 1560329 3454078 1560329 DZA Algeria Africa Northern Africa 2381740 1962 Al-Jaza’ir/Algerie Republic Algiers 3.05097 36.73970 Algeria
    Luanda AGO 2825311 NA 2825311 AGO Angola Africa Central Africa 1246700 1975 Angola Republic Luanda 13.24200 -8.81155 Angola
    Abu Dhabi ARE 1145000 NA 1145000 ARE United Arab Emirates Asia Middle East 83600 1971 Al-Imarat al-´Arabiya al-Muttahida Emirate Federation Abu Dhabi 54.37050 24.47640 United Arab Emirates
    Dubai ARE 2643410 NA 2643410 ARE United Arab Emirates Asia Middle East 83600 1971 Al-Imarat al-´Arabiya al-Muttahida Emirate Federation Abu Dhabi 54.37050 24.47640 United Arab Emirates
    Buenos Aires ARG 3054300 14122000 3054300 ARG Argentina South America South America 2780400 1816 Argentina Federal Republic Buenos Aires -58.41730 -34.61180 Argentina
    Cordoba ARG 1330023 1528000 1330023 ARG Argentina South America South America 2780400 1816 Argentina Federal Republic Buenos Aires -58.41730 -34.61180 Argentina
    Rosario ARG 1193605 1276000 1193605 ARG Argentina South America South America 2780400 1816 Argentina Federal Republic Buenos Aires -58.41730 -34.61180 Argentina
    Yerevan ARM 1060138 NA 1060138 ARM Armenia Asia Middle East 29800 1991 Hajastan Republic Yerevan 44.50900 40.15960 Armenia
    • Modify the SELECT statement to keep only the name of the city, the name of the country, and the name of the region the country resides in.
    • Alias the name of the city AS city and the name of the country AS country.
    -- Select name fields (with alias) and region 
    SELECT cities.name AS city, countries.name AS country, region
    FROM cities
      INNER JOIN countries
        ON cities.country_code = countries.code;
    Table 1.3: Displaying records 1 - 10
    city country region
    Kabul Afghanistan Southern and Central Asia
    Algiers Algeria Northern Africa
    Oran Algeria Northern Africa
    Luanda Angola Central Africa
    Abu Dhabi United Arab Emirates Middle East
    Dubai United Arab Emirates Middle East
    Buenos Aires Argentina South America
    Cordoba Argentina South America
    Rosario Argentina South America
    Yerevan Armenia Middle East
    select * from countries
    Table 1.4: Displaying records 1 - 10
    code country_name continent region surface_area indep_year local_name gov_form capital cap_long cap_lat name
    AFG Afghanistan Asia Southern and Central Asia 652090 1919 Afganistan/Afqanestan Islamic Emirate Kabul 69.17610 34.52280 Afghanistan
    NLD Netherlands Europe Western Europe 41526 1581 Nederland Constitutional Monarchy Amsterdam 4.89095 52.37380 Netherlands
    ALB Albania Europe Southern Europe 28748 1912 Shqiperia Republic Tirane 19.81720 41.33170 Albania
    DZA Algeria Africa Northern Africa 2381740 1962 Al-Jaza’ir/Algerie Republic Algiers 3.05097 36.73970 Algeria
    ASM American Samoa Oceania Polynesia 199 NA Amerika Samoa US Territory Pago Pago -170.69100 -14.28460 American Samoa
    AND Andorra Europe Southern Europe 468 1278 Andorra Parliamentary Coprincipality Andorra la Vella 1.52180 42.50750 Andorra
    AGO Angola Africa Central Africa 1246700 1975 Angola Republic Luanda 13.24200 -8.81155 Angola
    ATG Antigua and Barbuda North America Caribbean 442 1981 Antigua and Barbuda Constitutional Monarchy Saint John’s -61.84560 17.11750 Antigua and Barbuda
    ARE United Arab Emirates Asia Middle East 83600 1971 Al-Imarat al-´Arabiya al-Muttahida Emirate Federation Abu Dhabi 54.37050 24.47640 United Arab Emirates
    ARG Argentina South America South America 2780400 1816 Argentina Federal Republic Buenos Aires -58.41730 -34.61180 Argentina

    Great work! In the next exercise you’ll explore how you can do more aliasing to limit the amount of writing.

    1.1.2 Inner join (2)

    Instead of writing the full table name, you can use table aliasing as a shortcut. For tables you also use AS to add the alias immediately after the table name with a space. Check out the aliasing of cities and countries below.

    SELECT c1.name AS city, c2.name AS country
    FROM cities AS c1
    INNER JOIN countries AS c2
    ON c1.country_code = c2.code;
    

    Notice that to select a field in your query that appears in multiple tables, you’ll need to identify which table/table alias you’re referring to by using a . in your SELECT statement.

    You’ll now explore a way to get data from both the countries and economies tables to examine the inflation rate for both 2010 and 2015.

    Sometimes it’s easier to write SQL code out of order: you write the SELECT statement after you’ve done the JOIN.

    • Join the tables countries (left) and economies (right) aliasing countries AS c and economies AS e.
    • Specify the field to match the tables ON.
    • From this join, SELECT:
      • c.code, aliased as country_code.
      • name, year, and inflation_rate, not aliased.
    -- Select fields with aliases
    SELECT c.code AS country_code, name, year, inflation_rate
    FROM countries AS c
      -- Join to economies (alias e)
      INNER JOIN economies AS e
        -- Match on code
        ON c.code = e.code;
    Table 1.5: Displaying records 1 - 10
    country_code name year inflation_rate
    AFG Afghanistan 2010 2.179
    AFG Afghanistan 2015 -1.549
    AGO Angola 2010 14.480
    AGO Angola 2015 10.287
    ALB Albania 2010 3.605
    ALB Albania 2015 1.896
    ARE United Arab Emirates 2010 0.878
    ARE United Arab Emirates 2015 4.070
    ARG Argentina 2010 10.461
    ARG Argentina 2015 NA

    Nicely done! Using this short aliases takes some getting used to, but it will save you a lot of typing.

    1.1.3 Inner join (3)

    The ability to combine multiple joins in a single query is a powerful feature of SQL, e.g:

    SELECT *
    FROM left_table
      INNER JOIN right_table
        ON left_table.id = right_table.id
      INNER JOIN another_table
        ON left_table.id = another_table.id;
    

    As you can see here it becomes tedious to continually write long table names in joins. This is when it becomes useful to alias each table using the first letter of its name (e.g. countries AS c)! It is standard practice to alias in this way and, if you choose to alias tables or are asked to specifically for an exercise in this course, you should follow this protocol.

    Now, for each country, you want to get the country name, its region, the fertility rate, and the unemployment rate for both 2010 and 2015.

    Note that results should work throughout this course with or without table aliasing unless specified differently.

    • Inner join countries (left) and populations (right) on the code and country_code fields respectively.
    • Alias countries AS c and populations AS p.
    • Select code, name, and region from countries and also select year and fertility_rate from populations (5 fields in total).
    -- Select fields
    SELECT c.code, name, region, year, fertility_rate
      -- From countries (alias as c)
      FROM countries AS c
      -- Join with populations (as p)
      INNER JOIN populations AS p
        -- Match on country code
        ON c.code = p.country_code;
    Table 1.6: Displaying records 1 - 10
    code name region year fertility_rate
    ABW Aruba Caribbean 2010 1.704
    ABW Aruba Caribbean 2015 1.647
    AFG Afghanistan Southern and Central Asia 2010 5.746
    AFG Afghanistan Southern and Central Asia 2015 4.653
    AGO Angola Central Africa 2010 6.416
    AGO Angola Central Africa 2015 5.996
    ALB Albania Southern Europe 2010 1.663
    ALB Albania Southern Europe 2015 1.793
    AND Andorra Southern Europe 2010 1.270
    AND Andorra Southern Europe 2015 NA
    • Add an additional INNER JOIN with economies to your previous query by joining on code.
    • Include the unemployment_rate column that became available through joining with economies.
    • Note that year appears in both populations and economies, so you have to explicitly use e.year instead of year as you did before.
    -- Select fields
    SELECT c.code, name, region, e.year, fertility_rate, unemployment_rate
      -- From countries (alias as c)
      FROM countries AS c
      -- Join to populations (as p)
      INNER JOIN populations AS p
        -- Match on country code
        ON c.code = p.country_code
      -- Join to economies (as e)
      INNER JOIN economies AS e
        -- Match on country code
        ON c.code = e.code;
    Table 1.7: Displaying records 1 - 10
    code name region year fertility_rate unemployment_rate
    AFG Afghanistan Southern and Central Asia 2010 4.653 NA
    AFG Afghanistan Southern and Central Asia 2010 5.746 NA
    AFG Afghanistan Southern and Central Asia 2015 4.653 NA
    AFG Afghanistan Southern and Central Asia 2015 5.746 NA
    AGO Angola Central Africa 2010 5.996 NA
    AGO Angola Central Africa 2010 6.416 NA
    AGO Angola Central Africa 2015 5.996 NA
    AGO Angola Central Africa 2015 6.416 NA
    ALB Albania Southern Europe 2010 1.663 14
    ALB Albania Southern Europe 2010 1.793 14
    • Scroll down the query result and take a look at the results for Albania from your previous query. Does something seem off to you?
    • The trouble with doing your last join on c.code = e.code and not also including year is that e.g. the 2010 value for fertility_rate is also paired with the 2015 value for unemployment_rate.
    • Fix your previous query: in your last ON clause, use AND to add an additional joining condition. In addition to joining on code in c and e, also join on year in e and p.
    -- Select fields
    SELECT c.code, name, region, e.year, fertility_rate, unemployment_rate
      -- From countries (alias as c)
      FROM countries AS c
      -- Join to populations (as p)
      INNER JOIN populations AS p
        -- Match on country code
        ON c.code = p.country_code
      -- Join to economies (as e)
      INNER JOIN economies AS e
        -- Match on country code and year
        ON c.code = e.code AND e.year = p.year;
    Table 1.8: Displaying records 1 - 10
    code name region year fertility_rate unemployment_rate
    AFG Afghanistan Southern and Central Asia 2010 5.746 NA
    AFG Afghanistan Southern and Central Asia 2015 4.653 NA
    AGO Angola Central Africa 2010 6.416 NA
    AGO Angola Central Africa 2015 5.996 NA
    ALB Albania Southern Europe 2010 1.663 14.00
    ALB Albania Southern Europe 2015 1.793 17.10
    ARE United Arab Emirates Middle East 2010 1.868 NA
    ARE United Arab Emirates Middle East 2015 1.767 NA
    ARG Argentina South America 2010 2.370 7.75
    ARG Argentina South America 2015 2.308 NA

    Good work! Time to learn something new!

    1.2 INNER JOIN via USING

    1.2.1 Review inner join using on

    Why does the following code result in an error?

    SELECT c.name AS country, l.name AS language
    FROM countries AS c
      INNER JOIN languages AS l;
    
  • The languages table has more rows than the countries table.
  • There are multiple languages spoken in many countries.
  • INNER JOIN requires a specification of the key field (or fields) in each table.
  • Join queries may not be followed by a semi-colon.
  • Correct!

    1.2.2 Inner join with using

    When joining tables with a common field name, e.g.

    SELECT *
    FROM countries
      INNER JOIN economies
        ON countries.code = economies.code
    

    You can use USING as a shortcut:

    SELECT *
    FROM countries
      INNER JOIN economies
        USING(code)
    

    You’ll now explore how this can be done with the countries and languages tables.

    • Inner join countries on the left and languages on the right with USING(code).
    • Select the fields corresponding to:
      • country name AS country,
      • continent name,
      • language name AS language, and
      • whether or not the language is official.

    Remember to alias your tables using the first letter of their names.

    -- Select fields
    SELECT c.name AS country, continent, l.name AS language, official
      -- From countries (alias as c)
      FROM countries AS c
      -- Join to languages (as l)
      INNER JOIN languages AS l
        -- Match using code
        USING(code);
    Table 1.9: Displaying records 1 - 10
    country continent language official
    Afghanistan Asia Dari 1
    Afghanistan Asia Pashto 1
    Afghanistan Asia Turkic 0
    Afghanistan Asia Other 0
    Albania Europe Albanian 1
    Albania Europe Greek 0
    Albania Europe Other 0
    Albania Europe unspecified 0
    Algeria Africa Arabic 1
    Algeria Africa French 0

    Well done! Another technique to save you some typing!

    1.3 Self-ish joins, just in CASE

    1.3.1 Self-join

    In this exercise, you’ll use the populations table to perform a self-join to calculate the percentage increase in population from 2010 to 2015 for each country code!

    Since you’ll be joining the populations table to itself, you can alias populations as p1 and also populations as p2. This is good practice whenever you are aliasing and your tables have the same first letter. Note that you are required to alias the tables with self-joins.

    • Join populations with itself ON country_code.
    • Select the country_code from p1 and the size field from both p1 and p2. SQL won’t allow same-named fields, so alias p1.size as size2010 and p2.size as size2015.
    -- Select fields with aliases
    SELECT p1.country_code,
           p1.size AS size2010,
           p2.size AS size2015
    -- From populations (alias as p1)
    FROM populations AS p1
      -- Join to itself (alias as p2)
      INNER JOIN populations AS p2
        -- Match on country code
        ON  p1.country_code = p2.country_code;
    Table 1.10: Displaying records 1 - 10
    country_code size2010 size2015
    ABW 101597 101597
    ABW 101597 103889
    ABW 103889 101597
    ABW 103889 103889
    AFG 27962207 27962207
    AFG 27962207 32526562
    AFG 32526562 27962207
    AFG 32526562 32526562
    AGO 21219954 21219954
    AGO 21219954 25021974
    • Notice from the result that for each country_code you have four entries laying out all combinations of 2010 and 2015.
    • Extend the ON in your query to include only those records where the p1.year (2010) matches with p2.year - 5 (2015 - 5 = 2010). This will omit the three entries per country_code that you aren’t interested in.
    -- Select fields with aliases
    SELECT p1.country_code,
           p1.size AS size2010,
           p2.size AS size2015
    -- From populations (alias as p1)
    FROM populations AS p1
      -- Join to itself (alias as p2)
      INNER JOIN populations AS p2
        -- Match on country code
        ON p1.country_code = p2.country_code
            -- and year (with calculation)
            AND p1.year = p2.year - 5;
    Table 1.11: Displaying records 1 - 10
    country_code size2010 size2015
    ABW 101597 103889
    AFG 27962207 32526562
    AGO 21219954 25021974
    ALB 2913021 2889167
    AND 84419 70473
    ARE 8329453 9156963
    ARG 41222875 43416755
    ARM 2963496 3017712
    ASM 55636 55538
    ATG 87233 91818

    As you just saw, you can also use SQL to calculate values like p2.year - 5 for you. With two fields like size2010 and size2015, you may want to determine the percentage increase from one field to the next:

    With two numeric fields A and B, the percentage growth from A to B can be calculated as (BA)/A100.0.

    Add a new field to SELECT, aliased as growth_perc, that calculates the percentage population growth from 2010 to 2015 for each country, using p2.size and p1.size.

    -- Select fields with aliases
    SELECT p1.country_code,
           p1.size AS size2010, 
           p2.size AS size2015,
           -- Calculate growth_perc
           ((p2.size - p1.size)/p1.size * 100.0) AS growth_perc
    -- From populations (alias as p1)
    FROM populations AS p1
      -- Join to itself (alias as p2)
      INNER JOIN populations AS p2
        -- Match on country code
        ON p1.country_code = p2.country_code
            -- and year (with calculation)
            AND p1.year = p2.year - 5;
    Table 1.12: Displaying records 1 - 10
    country_code size2010 size2015 growth_perc
    ABW 101597 103889 0
    AFG 27962207 32526562 0
    AGO 21219954 25021974 0
    ALB 2913021 2889167 0
    AND 84419 70473 0
    ARE 8329453 9156963 0
    ARG 41222875 43416755 0
    ARM 2963496 3017712 0
    ASM 55636 55538 0
    ATG 87233 91818 0

    Nice!

    1.3.2 Case when and then

    Often it’s useful to look at a numerical field not as raw data, but instead as being in different categories or groups.

    You can use CASE with WHEN, THEN, ELSE, and END to define a new grouping field.

    Using the countries table, create a new field AS geosize_group that groups the countries into three groups:

    • If surface_area is greater than 2 million, geosize_group is ‘large’.
    • If surface_area is greater than 350 thousand but not larger than 2 million, geosize_group is ‘medium’.
    • Otherwise, geosize_group is ‘small’.
    SELECT name, continent, code, surface_area,
        -- First case
        CASE WHEN surface_area > 2000000 THEN 'large'
            -- Second case
            WHEN surface_area > 350000 THEN 'medium'
            -- Else clause + end
            ELSE 'small' END
            -- Alias name
            AS geosize_group
    -- From table
    FROM countries;
    Table 1.13: Displaying records 1 - 10
    name continent code surface_area geosize_group
    Afghanistan Asia AFG 652090 medium
    Netherlands Europe NLD 41526 small
    Albania Europe ALB 28748 small
    Algeria Africa DZA 2381740 large
    American Samoa Oceania ASM 199 small
    Andorra Europe AND 468 small
    Angola Africa AGO 1246700 medium
    Antigua and Barbuda North America ATG 442 small
    United Arab Emirates Asia ARE 83600 small
    Argentina South America ARG 2780400 large

    Well done! Time for the last exercise of this chapter!

    1.3.3 Inner challenge

    The table you created with the added geosize_group field has been loaded for you here with the name countries_plus. Observe the use of (and the placement of) the INTO command to create this countries_plus table:

    SELECT name, continent, code, surface_area,
        CASE WHEN surface_area > 2000000
                THEN 'large'
           WHEN surface_area > 350000
                THEN 'medium'
           ELSE 'small' END
           AS geosize_group
    INTO countries_plus
    FROM countries;
    

    You will now explore the relationship between the size of a country in terms of surface area and in terms of population using grouping fields created with CASE.

    By the end of this exercise, you’ll be writing two queries back-to-back in a single script. You got this!

    Using the populations table focused only for the year 2015, create a new field aliased as popsize_group to organize population size into

    • ‘large’ (> 50 million),
    • ‘medium’ (> 1 million), and
    • ‘small’ groups.

    Select only the country code, population size, and this new popsize_group as fields.

    SELECT country_code, size,
        -- First case
        CASE WHEN size > 50000000 THEN 'large'
            -- Second case
            WHEN size > 1000000 THEN 'medium'
            -- Else clause + end
            ELSE 'small' END
            -- Alias name (popsize_group)
            AS popsize_group
    -- From table
    FROM populations
    -- Focus on 2015
    WHERE year = 2015;
    Table 1.14: Displaying records 1 - 10
    country_code size popsize_group
    ABW 103889 small
    AFG 32526562 medium
    AGO 25021974 medium
    ALB 2889167 medium
    AND 70473 small
    ARE 9156963 medium
    ARG 43416755 medium
    ARM 3017712 medium
    ASM 55538 small
    ATG 91818 small
    • Use INTO to save the result of the previous query as pop_plus. You can see an example of this in the countries_plus code in the assignment text. Make sure to include a ; at the end of your WHERE clause!

    • Then, include another query below your first query to display all the records in pop_plus using SELECT * FROM pop_plus; so that you generate results and this will display pop_plus in the query result.

    SELECT country_code, size,
        CASE WHEN size > 50000000 THEN 'large'
            WHEN size > 1000000 THEN 'medium'
            ELSE 'small' END
            AS popsize_group
    FROM populations
    WHERE year = 2015;
    copy_to(conn, pop_plus, overwrite = TRUE)
    -- Select all columns of pop_plus
    SELECT * FROM pop_plus;
    Table 1.15: Displaying records 1 - 10
    country_code size popsize_group
    ABW 103889 small
    AFG 32526562 medium
    AGO 25021974 medium
    ALB 2889167 medium
    AND 70473 small
    ARE 9156963 medium
    ARG 43416755 medium
    ARM 3017712 medium
    ASM 55538 small
    ATG 91818 small
    • Keep the first query intact that creates pop_plus using INTO.
    • Write a query to join countries_plus AS c on the left with pop_plus AS p on the right matching on the country code fields.
    • Sort the data based on geosize_group, in ascending order so that large appears on top.
    • Select the name, continent, geosize_group, and popsize_group fields.
    -- Select fields
    SELECT name, continent, geosize_group, popsize_group
    -- From countries_plus (alias as c)
    FROM countries_plus AS c
      -- Join to pop_plus (alias as p)
      INNER JOIN pop_plus AS p
        -- Match on country code
        ON c.code = p.country_code
    -- Order the table    
    ORDER BY geosize_group;
    Table 1.16: Displaying records 1 - 10
    name continent geosize_group popsize_group
    Algeria Africa large medium
    Argentina South America large medium
    Australia Oceania large medium
    Brazil South America large large
    Greenland North America large small
    India Asia large large
    Canada North America large medium
    Kazakhstan Asia large medium
    China Asia large large
    Congo, The Democratic Republic of the Africa large large

    This concludes chapter 1 and you now know the INs of JOINs. Off to chapter 2 to learn the OUTs!

    2 Outer joins and cross joins

    In this chapter, you’ll come to grips with different kinds of outer joins. You’ll learn how to gain further insights into your data through left joins, right joins, and full joins. In addition to outer joins, you’ll also work with cross joins.

    2.1 LEFT and RIGHT JOINs

    2.1.1 Left Join

    Now you’ll explore the differences between performing an inner join and a left join using the cities and countries tables.

    You’ll begin by performing an inner join with the cities table on the left and the countries table on the right. Remember to alias the name of the city field as city and the name of the country field as country.

    You will then change the query to a left join. Take note of how many records are in each query here!

    Fill in the code based on the instructions in the code comments to complete the inner join. Note how many records are in the result of the join in the query result.

    -- Select the city name (with alias), the country code,
    -- the country name (with alias), the region,
    -- and the city proper population
    SELECT c1.name AS city, code, c2.name AS country,
           region, city_proper_pop
    -- From left table (with alias)
    FROM cities AS c1
      -- Join to right table (with alias)
      INNER JOIN countries AS c2
        -- Match on country code?
        ON c1.country_code = c2.code
    -- Order based on descending country code
    ORDER BY code DESC;
    Table 2.1: Displaying records 1 - 10
    city code country region city_proper_pop
    Harare ZWE Zimbabwe Eastern Africa 1606000
    Lusaka ZMB Zambia Eastern Africa 1742979
    Cape Town ZAF South Africa Southern Africa 3740026
    Durban ZAF South Africa Southern Africa 3442361
    Ekurhuleni ZAF South Africa Southern Africa 3178470
    Johannesburg ZAF South Africa Southern Africa 4434827
    Sana’a YEM Yemen Middle East 1937451
    Hanoi VNM Vietnam Southeast Asia 6844100
    Ho Chi Minh City VNM Vietnam Southeast Asia 7681700
    Caracas VEN Venezuela South America 1943901

    Change the code to perform a LEFT JOIN instead of an INNER JOIN. After executing this query, note how many records the query result contains.

    SELECT c1.name AS city, code, c2.name AS country,
           region, city_proper_pop
    FROM cities AS c1
      -- Join right table (with alias)
      LEFT JOIN countries AS c2
        -- Match on country code
        ON c1.country_code = c2.code
    -- Order by descending country code
    ORDER BY code DESC;
    Table 2.2: Displaying records 1 - 10
    city code country region city_proper_pop
    Harare ZWE Zimbabwe Eastern Africa 1606000
    Lusaka ZMB Zambia Eastern Africa 1742979
    Cape Town ZAF South Africa Southern Africa 3740026
    Durban ZAF South Africa Southern Africa 3442361
    Ekurhuleni ZAF South Africa Southern Africa 3178470
    Johannesburg ZAF South Africa Southern Africa 4434827
    Sana’a YEM Yemen Middle East 1937451
    Hanoi VNM Vietnam Southeast Asia 6844100
    Ho Chi Minh City VNM Vietnam Southeast Asia 7681700
    Caracas VEN Venezuela South America 1943901

    Great work!

    2.1.2 Left join (2)

    Next, you’ll try out another example comparing an inner join to its corresponding left join. Before you begin though, take note of how many records are in both the countries and languages tables below.

    You will begin with an inner join on the countries table on the left with the languages table on the right. Then you’ll change the code to a left join in the next bullet.

    Note the use of multi-line comments here using / and /.

    • Perform an inner join and alias the name of the country field as country and the name of the language field as language.
    • Sort based on descending country name.
    /*
    Select country name AS country, the country's local name,
    the language name AS language, and
    the percent of the language spoken in the country
    */
    SELECT c.name AS country, local_name, l.name AS language, percent
    -- From left table (alias as c)
    FROM countries AS c
      -- Join to right table (alias as l)
      INNER JOIN languages AS l
        -- Match on fields
        ON c.code = l.code
    -- Order by descending country
    ORDER BY country DESC;
    Table 2.3: Displaying records 1 - 10
    country local_name language percent
    Zimbabwe Zimbabwe Chewa NA
    Zimbabwe Zimbabwe Chibarwe NA
    Zimbabwe Zimbabwe English NA
    Zimbabwe Zimbabwe Kalanga NA
    Zimbabwe Zimbabwe Koisan NA
    Zimbabwe Zimbabwe Nambya NA
    Zimbabwe Zimbabwe Ndau NA
    Zimbabwe Zimbabwe Ndebele NA
    Zimbabwe Zimbabwe Shangani NA
    Zimbabwe Zimbabwe Shona NA
    • Perform a left join instead of an inner join. Observe the result, and also note the change in the number of records in the result.
    • Carefully review which records appear in the left join result, but not in the inner join result.
    /*
    Select country name AS country, the country's local name,
    the language name AS language, and
    the percent of the language spoken in the country
    */
    SELECT c.name AS country, local_name, l.name AS language, percent
    -- From left table (alias as c)
    FROM countries AS c
      -- Join to right table (alias as l)
      LEFT JOIN languages AS l
        -- Match on fields
        ON c.code = l.code
    -- Order by descending country
    ORDER BY country DESC;
    Table 2.4: Displaying records 1 - 10
    country local_name language percent
    Zimbabwe Zimbabwe Chewa NA
    Zimbabwe Zimbabwe Chibarwe NA
    Zimbabwe Zimbabwe English NA
    Zimbabwe Zimbabwe Kalanga NA
    Zimbabwe Zimbabwe Koisan NA
    Zimbabwe Zimbabwe Nambya NA
    Zimbabwe Zimbabwe Ndau NA
    Zimbabwe Zimbabwe Ndebele NA
    Zimbabwe Zimbabwe Shangani NA
    Zimbabwe Zimbabwe Shona NA

    Perfect! Notice that the INNER JOIN version resulted in 909 records. The LEFT JOIN version returned 916 rows.

    2.1.3 Left join (3)

    You’ll now revisit the use of the AVG() function introduced in our introductory SQL course. You will use it in combination with left join to determine the average gross domestic product (GDP) per capita by region in 2010.

    • Begin with a left join with the countries table on the left and the economies table on the right.
    • Focus only on records with 2010 as the year.
    -- Select name, region, and gdp_percapita
    SELECT name, region, gdp_percapita
    -- From countries (alias as c)
    FROM countries AS c
      -- Left join with economies (alias as e)
      LEFT JOIN economies AS e
        -- Match on code fields
        ON c.code = e.code
    -- Focus on 2010
    WHERE year = 2010;
    Table 2.5: Displaying records 1 - 10
    name region gdp_percapita
    Afghanistan Southern and Central Asia 539.667
    Angola Central Africa 3599.270
    Albania Southern Europe 4098.130
    United Arab Emirates Middle East 34628.630
    Argentina South America 10412.950
    Armenia Middle East 3121.780
    Antigua and Barbuda Caribbean 13531.780
    Australia Australia and New Zealand 56362.840
    Austria Western Europe 46757.130
    Azerbaijan Middle East 5847.260
    • Modify your code to calculate the average GDP per capita AS avg_gdp for each region in 2010.
    • Select the region and avg_gdp fields.
    -- Select fields
    SELECT region, AVG(gdp_percapita) AS avg_gdp
    -- From countries (alias as c)
    FROM countries AS c
      -- Left join with economies (alias as e)
      LEFT JOIN economies AS e
        -- Match on code fields
        ON c.code = e.code
    -- Focus on 2010
    WHERE year = 2010
    -- Group by region
    GROUP BY region;
    Table 2.6: Displaying records 1 - 10
    region avg_gdp
    Australia and New Zealand 44792.385
    Baltic Countries 12631.030
    British Islands 43588.330
    Caribbean 11413.339
    Central Africa 4797.240
    Central America 4969.970
    Eastern Africa 1757.348
    Eastern Asia 26205.852
    Eastern Europe 10095.457
    Melanesia 2532.610
    • Arrange this data on average GDP per capita for each region in 2010 from highest to lowest average GDP per capita.
    -- Select fields
    SELECT region, AVG(gdp_percapita) AS avg_gdp
    -- From countries (alias as c)
    FROM countries AS c
      -- Left join with economies (alias as e)
      LEFT JOIN economies AS e
        -- Match on code fields
        ON c.code = e.code
    -- Focus on 2010
    WHERE year = 2010
    -- Group by region
    GROUP BY region
    -- Order by descending avg_gdp
    ORDER BY avg_gdp DESC;
    Table 2.7: Displaying records 1 - 10
    region avg_gdp
    Western Europe 58130.96
    Nordic Countries 57074.00
    North America 47911.51
    Australia and New Zealand 44792.39
    British Islands 43588.33
    Eastern Asia 26205.85
    Southern Europe 22926.41
    Middle East 18204.64
    Baltic Countries 12631.03
    Caribbean 11413.34

    Well done. Notice how gradually you’re adding more and more building blocks to your SQL vocabulary. This enables you to answer questions of ever-increasing complexity!

    2.1.4 Right join

    Right joins aren’t as common as left joins. One reason why is that you can always write a right join as a left join.

    The left join code is commented out here. Your task is to write a new query using rights joins that produces the same result as what the query using left joins produces. Keep this left joins code commented as you write your own query just below it using right joins to solve the problem.

    Note the order of the joins matters in your conversion to using right joins!

    -- convert this code to use RIGHT JOINs instead of LEFT JOINs
    SELECT cities.name AS city, urbanarea_pop, countries.name AS country,
           indep_year, languages.name AS language, percent
    FROM cities
      LEFT JOIN countries
        ON cities.country_code = countries.code
      LEFT JOIN languages
        ON countries.code = languages.code
    ORDER BY city, language;
    
    -- edited by cliex159
    --SELECT cities.name AS city, urbanarea_pop, countries.name AS country,
    --       indep_year, languages.name AS language, percent
    --FROM languages
    --  RIGHT JOIN countries
    --    ON languages.code = countries.code
    --  RIGHT JOIN cities
    --    ON countries.code = cities.country_code
    --ORDER BY city, language;
    Table 2.8: Displaying records 1 - 10
    city urbanarea_pop country indep_year language percent
    Abidjan 4765000 Cote d’Ivoire 1960 French NA
    Abidjan 4765000 Cote d’Ivoire 1960 Other NA
    Abu Dhabi 1145000 United Arab Emirates 1971 Arabic NA
    Abu Dhabi 1145000 United Arab Emirates 1971 English NA
    Abu Dhabi 1145000 United Arab Emirates 1971 Hindi NA
    Abu Dhabi 1145000 United Arab Emirates 1971 Persian NA
    Abu Dhabi 1145000 United Arab Emirates 1971 Urdu NA
    Abuja 1235880 Nigeria 1960 English NA
    Abuja 1235880 Nigeria 1960 Fulani NA
    Abuja 1235880 Nigeria 1960 Hausa NA

    Correct; everything should be reversed!

    2.2 FULL JOINs

    2.2.1 Full join

    In this exercise, you’ll examine how your results differ when using a full join versus using a left join versus using an inner join with the countries and currencies tables.

    You will focus on the North American region and also where the name of the country is missing. Dig in to see what we mean!

    Begin with a full join with countries on the left and currencies on the right. The fields of interest have been SELECTed for you throughout this exercise.

    Then complete a similar left join and conclude with an inner join.

    Choose records in which region corresponds to North America or is NULL.

    -- edited by cliex159
    --SELECT name AS country, code, region, basic_unit
    ---- From countries
    --FROM countries
    --  -- Join to currencies
    --  FULL JOIN currencies
    --    -- Match on code
    --    USING (code)
    ---- Where region is North America or null
    --WHERE region = 'North America' OR region IS NULL
    ---- Order by region
    --ORDER BY region;
    
    SELECT name AS country, code, region, basic_unit
    FROM countries
    LEFT JOIN currencies USING (code)
    UNION ALL
    SELECT name AS country, code, region, basic_unit
    FROM currencies
    LEFT JOIN countries USING (code)
    -- Where region is North America or null
    WHERE region = 'North America' OR region IS NULL
    -- Order by region
    ORDER BY region;
    Table 2.9: Displaying records 1 - 10
    country code region basic_unit
    NA AIA NA East Caribbean dollar
    NA IOT NA United States dollar
    NA CCK NA Australian dollar
    NA COK NA New Zealand dollar
    NA TMP NA United States dollar
    NA FLK NA Falkland Islands pound
    NA MSR NA East Caribbean dollar
    NA NIU NA New Zealand dollar
    NA ROM NA Romanian leu
    NA SHN NA Saint Helena pound

    Repeat the same query as before, using a LEFT JOIN instead of a FULL JOIN. Note what has changed compared to the FULL JOIN result!

    SELECT name AS country, code, region, basic_unit
    -- From countries
    FROM countries
      -- Join to currencies
      LEFT JOIN currencies
        -- Match on code
        USING (code)
    -- Where region is North America or null
    WHERE region = 'North America' OR region IS NULL
    -- Order by region
    ORDER BY region;
    Table 2.10: 4 records
    country code region basic_unit
    Bermuda BMU North America Bermudian dollar
    Greenland GRL North America NA
    Canada CAN North America Canadian dollar
    United States USA North America United States dollar

    Repeat the same query again but use an INNER JOIN instead of a FULL JOIN. Note what has changed compared to the FULL JOIN and LEFT JOIN results!

    SELECT name AS country, code, region, basic_unit
    -- From countries
    FROM countries
      -- Join to currencies
      INNER JOIN currencies
        -- Match on code
        USING (code)
    -- Where region is North America or null
    WHERE region = 'North America' OR region IS NULL
    -- Order by region
    ORDER BY region;
    Table 2.11: 3 records
    country code region basic_unit
    Bermuda BMU North America Bermudian dollar
    Canada CAN North America Canadian dollar
    United States USA North America United States dollar

    Have you kept an eye out on the different numbers of records these queries returned? The FULL JOIN query returned 18 rows, the OUTER JOIN returned 4 rows, and the INNER JOIN only returned 3 rows. Do these results make sense to you?

    2.2.2 Full join (2)

    You’ll now investigate a similar exercise to the last one, but this time focused on using a table with more records on the left than the right. You’ll work with the languages and countries tables.

    Begin with a full join with languages on the left and countries on the right. Appropriate fields have been selected for you again here.

    • Choose records in which countries.name starts with the capital letter ‘V’ or is NULL.
    • Arrange by countries.name in ascending order to more clearly see the results.
    -- edited by cliex159
    -- SELECT countries.name, code, languages.name AS language
    -- -- From languages
    -- FROM languages
    --   -- Join to countries
    --   FULL JOIN countries
    --     -- Match on code
    --     USING (code)
    -- -- Where countries.name starts with V or is null
    -- WHERE countries.name LIKE 'V%' OR countries.name IS NULL
    -- -- Order by ascending countries.name
    -- ORDER BY countries.name;
    
    SELECT countries.name, code, languages.name AS language
    FROM languages
    LEFT JOIN countries USING (code)
    UNION ALL
    SELECT countries.name, code, languages.name AS language
    FROM countries
    LEFT JOIN languages USING (code)
    -- Where countries.name starts with V or is null
    WHERE countries.name LIKE 'V%' OR countries.name IS NULL
    -- Order by ascending countries.name
    ORDER BY countries.name;
    Table 2.12: Displaying records 1 - 10
    name code language
    NA AIA English
    NA CXR English
    NA CXR Chinese
    NA CXR Malay
    NA CCK Malay
    NA CCK English
    NA COK English
    NA COK Rarotongan
    NA COK Other
    NA MSR English
    • Repeat the same query as before, using a LEFT JOIN instead of a FULL JOIN. Note what has changed compared to the FULL JOIN result!
    • SELECT countries.name, code, languages.name AS language
      -- From languages
      FROM languages
        -- Join to countries
        LEFT JOIN countries
          -- Match using code
          USING (code)
      -- Where countries.name starts with V or is null
      WHERE countries.name LIKE 'V%' OR countries.name IS NULL
      -- Order by ascending countries.name
      ORDER BY countries.name;
      Table 2.13: Displaying records 1 - 10
      name code language
      NA AIA English
      NA CXR English
      NA CXR Chinese
      NA CXR Malay
      NA CCK Malay
      NA CCK English
      NA COK English
      NA COK Rarotongan
      NA COK Other
      NA MSR English
    • Repeat once more, but use an INNER JOIN instead of a LEFT JOIN. Note what has changed compared to the FULL JOIN and LEFT JOIN results.
    SELECT countries.name, code, languages.name AS language
    -- From languages
    FROM languages
      -- Join to countries
      INNER JOIN countries
      -- Match using code
        USING (code)
    -- Where countries.name starts with V or is null
    WHERE countries.name LIKE 'V%' OR countries.name IS NULL
    -- Order by ascending countries.name
    ORDER BY countries.name;
    Table 2.14: Displaying records 1 - 10
    name code language
    Vanuatu VUT Bislama
    Vanuatu VUT English
    Vanuatu VUT French
    Vanuatu VUT Other
    Vanuatu VUT Tribal Languages
    Venezuela VEN Spanish
    Venezuela VEN indigenous
    Vietnam VNM English
    Vietnam VNM Other
    Vietnam VNM Vietnamese

    Well done. Again, make sure to compare the number of records the different types of joins return and try to verify whether the results make sense.

    2.2.3 Full join (3)

    You’ll now explore using two consecutive full joins on the three tables you worked with in the previous two exercises.

    • Complete a full join with countries on the left and languages on the right.
    • Next, full join this result with currencies on the right.
    • Use LIKE to choose the Melanesia and Micronesia regions (Hint: ‘M%esia’).
    • Select the fields corresponding to the country name AS country, region, language name AS language, and basic and fractional units of currency.
    # edited by cliex159
    # Select fields (with aliases)
    #SELECT c1.name AS country, region, l.name AS language,
    #       basic_unit, frac_unit
    # From countries (alias as c1)
    #FROM countries AS c1
    # Join with languages (alias as l)
    #  FULL JOIN languages AS l
    # Match on code
    #    USING (code)
    # Join with currencies (alias as c2)
    #  FULL JOIN currencies AS c2
    # Match on code
    #    USING (code)
    # Where region like Melanesia and Micronesia
    #WHERE region LIKE 'M%esia';

    Great work!

    2.2.4 Review outer joins

    A(n) ___ join is a join combining the results of a ___ join and a ___ join.

  • left, full, right
  • right, full, left
  • inner, left, right
  • None of the above are true
  • Correct!

    2.3 CROSSing the rubicon

    2.3.1 A table of two cities

    This exercise looks to explore languages potentially and most frequently spoken in the cities of Hyderabad, India and Hyderabad, Pakistan.

    • Create a CROSS JOIN with cities AS c on the left and languages AS l on the right.
    • Make use of LIKE and Hyder% to choose Hyderabad in both countries.
    • Select only the city name AS city and language name AS language.
    -- Select fields
    SELECT c.name AS city, l.name AS language
    -- From cities (alias as c)
    FROM cities AS c        
      -- Join to languages (alias as l)
      CROSS JOIN languages AS l
    -- Where c.name like Hyderabad
    WHERE c.name LIKE 'Hyder%';
    Table 2.15: Displaying records 1 - 10
    city language
    Hyderabad (India) Dari
    Hyderabad (India) Pashto
    Hyderabad (India) Turkic
    Hyderabad (India) Other
    Hyderabad (India) Albanian
    Hyderabad (India) Greek
    Hyderabad (India) Other
    Hyderabad (India) unspecified
    Hyderabad (India) Arabic
    Hyderabad (India) French
    • Use an INNER JOIN instead of a CROSS JOIN. Think about what the difference will be in the results for this INNER JOIN result and the one for the CROSS JOIN.
    -- Select fields
    SELECT c.name AS city, l.name AS language
    -- From cities (alias as c)
    FROM cities AS c        
      -- Join to languages (alias as l)
      INNER JOIN languages AS l
        -- Match on country code
        ON c.country_code = l.code
    -- Where c.name like Hyderabad
    WHERE c.name LIKE 'Hyder%';
    Table 2.16: Displaying records 1 - 10
    city language
    Hyderabad (India) Hindi
    Hyderabad (India) Bengali
    Hyderabad (India) Telugu
    Hyderabad (India) Marathi
    Hyderabad (India) Tamil
    Hyderabad (India) Urdu
    Hyderabad (India) Gujarati
    Hyderabad (India) Kannada
    Hyderabad (India) Malayalam
    Hyderabad (India) Oriya

    Good one! Can you see the difference between a CROSS JOIN and a INNER JOIN?

    2.3.2 Outer challenge

    Now that you’re fully equipped to use OUTER JOINs, try a challenge problem to test your knowledge!

    In terms of life expectancy for 2010, determine the names of the lowest five countries and their regions.

    • Select country name AS country, region, and life expectancy AS life_exp.
    • Make sure to use LEFT JOIN, WHERE, ORDER BY, and LIMIT.
    -- Select fields
    SELECT c.name AS country,
           region,
           life_expectancy AS life_exp
    -- From countries (alias as c)
    FROM countries AS c
      -- Join to populations (alias as p)
      LEFT JOIN populations AS p
        -- Match on country code
        ON c.code = p.country_code
    -- Focus on 2010
    WHERE year = 2010
    -- Order by life_exp
    ORDER BY life_exp
    -- Limit to 5 records
    LIMIT 5;
    Table 2.17: 5 records
    country region life_exp
    Andorra Southern Europe NA
    American Samoa Polynesia NA
    Cayman Islands Caribbean NA
    Dominica Caribbean NA
    Gibraltar Southern Europe NA

    This was the last exercise of this chapter on outer joins and cross joins. In the next chapter, you’ll learn about set theory clauses!

    3 Set theory clauses

    In this chapter, you’ll learn more about set theory using Venn diagrams and get an introduction to union, union all, intersect, and except clauses. You’ll finish by investigating semi joins and anti joins, which provide a nice introduction to subqueries.

    3.1 State of the UNION

    3.1.1 Union

    You have two new tables, economies2010 and economies2015, available to you. The economies table is also included for reference.

    • Combine the two new tables into one table containing all of the fields in economies2010.
    • Sort this resulting single table by country code and then by year, both in ascending order.
    -- Select fields from 2010 table
    SELECT *
      -- From 2010 table
      FROM economies2010
        -- Set theory clause
        UNION
    -- Select fields from 2015 table
    SELECT *
      -- From 2015 table
      FROM economies2015
    -- Order by code and year
    ORDER BY code, year;
    Table 3.1: Displaying records 1 - 10
    code year income_group gross_savings
    AFG 2010 Low income 37.133
    AFG 2015 Low income 21.466
    AGO 2010 Upper middle income 23.534
    AGO 2015 Upper middle income -0.425
    ALB 2010 Upper middle income 20.011
    ALB 2015 Upper middle income 13.840
    ARE 2010 High income 27.073
    ARE 2015 High income 34.106
    ARG 2010 Upper middle income 17.361
    ARG 2015 Upper middle income 14.111

    What a beauty!

    3.1.2 Union (2)

    UNION can also be used to determine all occurrences of a field across multiple tables. Try out this exercise with no starter code.

    • Determine all (non-duplicated) country codes in either the cities or the currencies table. The result should be a table with only one field called country_code.
    • Sort by country_code in alphabetical order.
    -- Select field
    SELECT country_code
      -- From cities
      FROM cities
        -- Set theory clause
        UNION
    -- Select field
    SELECT code
      -- From currencies
      FROM currencies
    -- Order by country_code
    ORDER BY country_code;
    Table 3.2: Displaying records 1 - 10
    country_code
    ABW
    AFG
    AGO
    AIA
    ALB
    AND
    ARE
    ARG
    ARM
    ATG

    Well done! Let’s take it up a notch!

    3.1.3 Union all

    As you saw, duplicates were removed from the previous two exercises by using UNION.

    To include duplicates, you can use UNION ALL.

    • Determine all combinations (include duplicates) of country code and year that exist in either the economies or the populations tables. Order by code then year.
    • The result of the query should only have two columns/fields. Think about how many records this query should result in.
    • You’ll use code very similar to this in your next exercise after the video. Make note of this code after completing it.
    -- Select fields
    SELECT code, year
      -- From economies
      FROM economies
        -- Set theory clause
        UNION ALL
    -- Select fields
    SELECT country_code, year
      -- From populations
      FROM populations
    -- Order by code, year
    ORDER BY code, year;
    Table 3.3: Displaying records 1 - 10
    code year
    ABW 2010
    ABW 2015
    AFG 2010
    AFG 2010
    AFG 2015
    AFG 2015
    AGO 2010
    AGO 2010
    AGO 2015
    AGO 2015

    Can you spot some duplicates in the query result?

    3.2 INTERSECTional data science

    3.2.1 Intersect

    UNION ALL will extract all records from two tables, while INTERSECT will only return records that both tables have in common. In this exercise, you will create a similar query as before, however, this time you will look at the records in common for country code and year for the economies and populations tables.

    Note the number of records from the result of this query compared to the similar UNION ALL query result of 814 records.

    • Use INTERSECT to determine the records in common for country code and year for the economies and populations tables.
    • Again, order by code and then by year, both in ascending order.
    -- Select fields
    SELECT code, year
      -- From economies
      FROM economies
        -- Set theory clause
        INTERSECT
    -- Select fields
    SELECT country_code, year
      -- From populations
      FROM populations
    -- Order by code and year
    ORDER BY code, year;
    Table 3.4: Displaying records 1 - 10
    code year
    AFG 2010
    AFG 2015
    AGO 2010
    AGO 2015
    ALB 2010
    ALB 2015
    ARE 2010
    ARE 2015
    ARG 2010
    ARG 2015

    Boom!

    3.2.2 Intersect (2)

    As you think about major world cities and their corresponding country, you may ask which countries also have a city with the same name as their country name?

    Use INTERSECT to answer this question with countries and cities!

    -- Select fields
    SELECT name
      -- From countries
      FROM countries
        -- Set theory clause
        INTERSECT
    -- Select fields
    SELECT name
      -- From cities
      FROM cities;
    Table 3.5: 2 records
    name
    Hong Kong
    Singapore

    Nice one! It looks as though Singapore is the only country that has a city with the same name!

    3.2.3 Review union and intersect

    Which of the following combinations of terms and definitions is correct?

  • UNION: returns all records (potentially duplicates) in both tables
  • UNION ALL: returns only unique records
  • INTERSECT: returns only records appearing in both tables
  • None of the above are matched correctly
  • Correct!

    3.3 EXCEPTional

    3.3.1 Except

    Get the names of cities in cities which are not noted as capital cities in countries as a single field result.

    Note that there are some countries in the world that are not included in the countries table, which will result in some cities not being labeled as capital cities when in fact they are.

    • Order the resulting field in ascending order.
    • Can you spot the city/cities that are actually capital cities which this query misses?
    -- Select field
    SELECT name
      -- From cities
      FROM cities
        -- Set theory clause
        EXCEPT
    -- Select field
    SELECT capital
      -- From countries
      FROM countries
    -- Order by result
    ORDER BY name;
    Table 3.6: Displaying records 1 - 10
    name
    Abidjan
    Ahmedabad
    Alexandria
    Almaty
    Auckland
    Bandung
    Barcelona
    Barranquilla
    Basra
    Belo Horizonte

    EXCEPTional!

    3.3.2 Except (2)

    Now you will complete the previous query in reverse!

    Determine the names of capital cities that are not listed in the cities table.

    • Order by capital in ascending order.
    • The cities table contains information about 236 of the world’s most populous cities. The result of your query may surprise you in terms of the number of capital cities that do not appear in this list!
    -- Select field
    SELECT capital
      -- From countries
      FROM countries
        -- Set theory clause
        EXCEPT
    -- Select field
    SELECT name
      -- From cities
      FROM cities
    -- Order by ascending capital
    ORDER BY capital;
    Table 3.7: Displaying records 1 - 10
    capital
    Agana
    Amman
    Amsterdam
    Andorra la Vella
    Antananarivo
    Apia
    Ashgabat
    Asmara
    Astana

    Well done. Is this query surprising, as the instructions suggested?

    3.4 Semi-joins and Anti-joins

    3.4.1 Semi-join

    You are now going to use the concept of a semi-join to identify languages spoken in the Middle East.

    Begin by selecting all country codes in the Middle East as a single field result using SELECT, FROM, and WHERE.

    -- Select code
    SELECT code
      -- From countries
      FROM countries
    -- Where region is Middle East
    WHERE region = 'Middle East';
    Table 3.8: Displaying records 1 - 10
    code
    ARE
    ARM
    AZE
    BHR
    GEO
    IRQ
    ISR
    YEM
    JOR
    KWT
    • Below the commented code, select only unique languages by name appearing in the languages table.
    • Order the resulting single field table by name in ascending order.
    -- Query from step 1:
    /*
    SELECT code
      FROM countries
    WHERE region = 'Middle East';
    */
    
    -- Select field
    SELECT DISTINCT name
      -- From languages
      FROM languages
    -- Order by name
    ORDER BY name;
    Table 3.9: Displaying records 1 - 10
    name
    Afar
    Afrikaans
    Akyem
    Albanian
    Alsatian
    Amerindian
    Amharic
    Angolar
    Antiguan creole
    Arabic
    • Combine the previous two queries into one query by adding a WHERE IN statement to the SELECT DISTINCT query to determine the unique languages spoken in the Middle East.
    • Order the result by name in ascending order.
    -- Query from step 2
    SELECT DISTINCT name
      FROM languages
    -- Where in statement
    WHERE code IN
      -- Query from step 1
      -- Subquery
      (SELECT code
       FROM countries
       WHERE region = 'Middle East')
    -- Order by name
    ORDER BY name;
    Table 3.10: Displaying records 1 - 10
    name
    Arabic
    Aramaic
    Armenian
    Azerbaijani
    Azeri
    Baluchi
    Bulgarian
    Circassian
    English
    Farsi

    Your first subquery is a fact! Let’s dive a little deeper into the concept.

    3.4.2 Relating semi-join to a tweaked inner join

    Let’s revisit the code from the previous exercise, which retrieves languages spoken in the Middle East.

    SELECT DISTINCT name
    FROM languages
    WHERE code IN
      (SELECT code
       FROM countries
       WHERE region = 'Middle East')
    ORDER BY name;
    

    Sometimes problems solved with semi-joins can also be solved using an inner join.

    SELECT languages.name AS language
    FROM languages
    INNER JOIN countries
    ON languages.code = countries.code
    WHERE region = 'Middle East'
    ORDER BY language;
    

    This inner join isn’t quite right. What is missing from this second code block to get it to match with the correct answer produced by the first block?

  • HAVING instead of WHERE
  • DISTINCT
  • UNIQUE
  • Correct! There’s no use on retrieving ‘Arabic’ multiple times; you only care about DISTINCT languages here.

    3.4.3 Diagnosing problems using anti-join

    Another powerful join in SQL is the anti-join. It is particularly useful in identifying which records are causing an incorrect number of records to appear in join queries.

    You will also see another example of a subquery here, as you saw in the first exercise on semi-joins. Your goal is to identify the currencies used in Oceanian countries!

    Begin by determining the number of countries in countries that are listed in Oceania using SELECT, FROM, and WHERE.

    -- Select statement
    SELECT COUNT(*) as count_number
      -- From countries
      FROM countries
    -- Where continent is Oceania
    WHERE continent = 'Oceania';
    Table 3.11: 1 records
    count_number
    19
    • Complete an inner join with countries AS c1 on the left and currencies AS c2 on the right to get the different currencies used in the countries of Oceania.
    • Match ON the code field in the two tables.
    • Include the country code, country name, and basic_unit AS currency.

    Observe the query result and make note of how many different countries are listed here.

    -- Select fields (with aliases)
    SELECT c1.code, name, basic_unit AS currency
      -- From countries (alias as c1)
      FROM countries AS c1
        -- Join with currencies (alias as c2)
        INNER JOIN currencies AS c2
        -- Match on code
        ON c1.code = c2.code
    -- Where continent is Oceania
    WHERE c1.continent = 'Oceania';
    Table 3.12: Displaying records 1 - 10
    code name currency
    AUS Australia Australian dollar
    KIR Kiribati Australian dollar
    MHL Marshall Islands United States dollar
    NRU Nauru Australian dollar
    PLW Palau United States dollar
    PNG Papua New Guinea Papua New Guinean kina
    PYF French Polynesia CFP franc
    SLB Solomon Islands Solomon Islands dollar
    WSM Samoa Samoan tala
    TON Tonga Tongan paʻanga

    Note that not all countries in Oceania were listed in the resulting inner join with currencies. Use an anti-join to determine which countries were not included!

    • Use NOT IN and (SELECT code FROM currencies) as a subquery to get the country code and country name for the Oceanian countries that are not included in the currencies table.
    -- Select fields
    SELECT code, name
      -- From Countries
      FROM countries
      -- Where continent is Oceania
      WHERE continent = 'Oceania'
        -- And code not in
        AND code NOT IN
        -- Subquery
        (SELECT code
         FROM currencies);
    Table 3.13: 5 records
    code name
    ASM American Samoa
    FJI Fiji Islands
    GUM Guam
    FSM Micronesia, Federated States of
    MNP Northern Mariana Islands

    Nice! Can you tell which countries were not included now?

    3.4.4 Set theory challenge

    Congratulations! You’ve now made your way to the challenge problem for this third chapter. Your task here will be to incorporate two of UNION/UNION ALL/INTERSECT/EXCEPT to solve a challenge involving three tables.

    In addition, you will use a subquery as you have in the last two exercises! This will be great practice as you hop into subqueries more in Chapter 4!

    • Identify the country codes that are included in either economies or currencies but not in populations.
    • Use that result to determine the names of cities in the countries that match the specification in the previous instruction.
    -- Select the city name
    SELECT name
      -- Alias the table where city name resides
      FROM cities AS c1
      -- Choose only records matching the result of multiple set theory clauses
      WHERE country_code IN
    (
        -- Select appropriate field from economies AS e
        SELECT e.code
        FROM economies AS e
        -- Get all additional (unique) values of the field from currencies AS c2   
        UNION
        SELECT c2.code
        FROM currencies AS c2
        -- Exclude those appearing in populations AS p  
        EXCEPT
        SELECT p.country_code
        FROM populations AS p
    );
    Table 3.14: 6 records
    name
    Bucharest
    Kaohsiung
    New Taipei City
    Taichung
    Tainan
    Taipei

    Success! Head over to the final chapter of this course to feel the power of subqueries at your fingertips!

    4 Subqueries

    In this closing chapter, you’ll learn how to use nested queries and you’ll use what you’ve learned in this course to solve three challenge problems.

    4.1 Subqueries inside WHERE and SELECT clauses

    4.1.1 Subquery inside where

    You’ll now try to figure out which countries had high average life expectancies (at the country level) in 2015.

    Begin by calculating the average life expectancy across all countries for 2015.

    -- Select average life_expectancy
    SELECT AVG(life_expectancy) as avg_life_expectancy
      -- From populations
      FROM populations
    -- Where year is 2015
    WHERE year = 2015;
    Table 4.1: 1 records
    avg_life_expectancy
    71.67634

    Recall that you can use SQL to do calculations for you. Suppose we wanted only records that were above 1.15 * 100 in terms of life expectancy for 2015:

    SELECT *
      FROM populations
    WHERE life_expectancy > 1.15 * 100
      AND year = 2015;
    

    Select all fields from populations with records corresponding to larger than 1.15 times the average you calculated in the first task for 2015. In other words, change the 100 in the example above with a subquery.

    -- Select fields
    SELECT *
      -- From populations
      FROM populations
    -- Where life_expectancy is greater than
    WHERE life_expectancy >
      -- 1.15 * subquery
      1.15 * (SELECT AVG(life_expectancy)
       FROM populations
       WHERE year = 2015) AND
      year = 2015;
    Table 4.2: Displaying records 1 - 10
    pop_id country_code year fertility_rate life_expectancy size
    21 AUS 2015 1.833 82.45122 23789752
    376 CHE 2015 1.540 83.19756 8281430
    356 ESP 2015 1.320 83.38049 46443994
    134 FRA 2015 2.010 82.67073 66538391
    170 HKG 2015 1.195 84.27805 7305700
    174 ISL 2015 1.930 82.86098 330815
    190 ITA 2015 1.370 83.49024 60730582
    194 JPN 2015 1.460 83.84366 126958472
    340 SGP 2015 1.240 82.59512 5535002
    374 SWE 2015 1.880 82.55122 9799186

    Good work! Let’s see how you do on a more high-level question in one go.

    4.1.2 Subquery inside where (2)

    Use your knowledge of subqueries in WHERE to get the urban area population for only capital cities.

    • Make use of the capital field in the countries table in your subquery.
    • Select the city name, country code, and urban area population fields.
    -- Select fields
    SELECT name, country_code, urbanarea_pop
      -- From cities
      FROM cities
    -- Where city name in the field of capital cities
    WHERE name IN
      -- Subquery
      (SELECT capital
       FROM countries)
    ORDER BY urbanarea_pop DESC;
    Table 4.3: Displaying records 1 - 10
    name country_code urbanarea_pop
    Beijing CHN 21516000
    Dhaka BGD 14543124
    Tokyo JPN 13513734
    Moscow RUS 12197596
    Cairo EGY 10230350
    Kinshasa COD 10130000
    Jakarta IDN 10075310
    Seoul KOR 9995784
    Mexico City MEX 8974724
    Lima PER 8852000

    Alright. You’ve got some practice on subqueries inside WHERE now. Time to see how you do when these subqueries are in the SELECT statement!

    4.1.3 Subquery inside select

    In this exercise, you’ll see how some queries can be written using either a join or a subquery.

    You have seen previously how to use GROUP BY with aggregate functions and an inner join to get summarized information from multiple tables.

    The code given in the first query selects the top nine countries in terms of number of cities appearing in the cities table. Recall that this corresponds to the most populous cities in the world. Your task will be to convert the second query to get the same result as the provided code.

    • Submit the code to view the result of the provided query.
    SELECT countries.name AS country, COUNT(*) AS cities_num
      FROM cities
          INNER JOIN countries
          ON countries.code = cities.country_code
    GROUP BY country
    ORDER BY cities_num DESC, country
    LIMIT 9;
    
    /* 
    SELECT ___ AS ___,
      (SELECT ___
       FROM ___
       WHERE countries.code = cities.country_code) AS cities_num
    FROM ___
    ORDER BY ___ ___, ___
    LIMIT 9;
    */
    Table 4.4: 9 records
    country cities_num
    China 36
    India 18
    Japan 11
    Brazil 10
    Pakistan 9
    United States 9
    Indonesia 7
    Russian Federation 7
    South Korea 7
    • Convert the GROUP BY code to use a subquery inside of SELECT by filling in the blanks to get a result that matches the one given using the GROUP BY code in the first query.
    • Again, sort the result by cities_num descending and then by country ascending.
    /*
    SELECT countries.name AS country, COUNT(*) AS cities_num
      FROM cities
          INNER JOIN countries
          ON countries.code = cities.country_code
    GROUP BY country
    ORDER BY cities_num DESC, country
    LIMIT 9;
    */
    
    SELECT countries.name AS country,
      -- Subquery
      (SELECT COUNT(*)
       FROM cities
       WHERE countries.code = cities.country_code) AS cities_num
    FROM countries
    ORDER BY cities_num DESC, country
    LIMIT 9;
    Table 4.5: 9 records
    country cities_num
    China 36
    India 18
    Japan 11
    Brazil 10
    Pakistan 9
    United States 9
    Indonesia 7
    Russian Federation 7
    South Korea 7

    Great! The next video will introduce you to using subqueries in the FROM clause. Exciting stuff!

    4.2 Subquery inside FROM clause

    4.2.1 Subquery inside from

    The last type of subquery you will work with is one inside of FROM.

    You will use this to determine the number of languages spoken for each country, identified by the country’s local name! (Note this may be different than the name field and is stored in the local_name field.)

    • Begin by determining for each country code how many languages are listed in the languages table using SELECT, FROM, and GROUP BY.
    • Alias the aggregated field as lang_num.
    -- Select fields (with aliases)
    SELECT code, COUNT(*) AS lang_num
      -- From languages
      FROM languages
    -- Group by code
    GROUP BY code;
    Table 4.6: Displaying records 1 - 10
    code lang_num
    ABW 7
    AFG 4
    AGO 12
    AIA 1
    ALB 4
    AND 4
    ARE 5
    ARG 6
    ARM 3
    ASM 5
    • Include the previous query (aliased as subquery) as a subquery in the FROM clause of a new query.
    • Select the local name of the country from countries.
    • Also, select lang_num from subquery.
    • Make sure to use WHERE appropriately to match code in countries and in subquery.
    • Sort by lang_num in descending order.
    SELECT local_name, subquery.lang_num
      FROM countries,
        (SELECT code, COUNT(*) AS lang_num
         FROM languages
         GROUP BY code) AS subquery
      WHERE countries.code = subquery.code
    ORDER BY lang_num DESC;
    Table 4.7: Displaying records 1 - 10
    local_name lang_num
    Zambia 19
    YeItyop´iya 16
    Zimbabwe 16
    Bharat/India 14
    Nepal 14
    France 13
    Mali 13
    South Africa 13
    Angola 12
    Malawi 12

    This one wasn’t easy!

    4.2.2 Advanced subquery

    You can also nest multiple subqueries to answer even more specific questions.

    In this exercise, for each of the six continents listed in 2015, you’ll identify which country had the maximum inflation rate, and how high it was, using multiple subqueries. The table result of your final query should look something like the following, where anything between < > will be filled in with appropriate values:

    +------------+---------------+-------------------+
    | name       | continent     | inflation_rate    |
    |------------+---------------+-------------------|
    | <country1> | North America | <max_inflation1>  |
    | <country2> | Africa        | <max_inflation2>  |
    | <country3> | Oceania       | <max_inflation3>  |
    | <country4> | Europe        | <max_inflation4>  |
    | <country5> | South America | <max_inflation5>  |
    | <country6> | Asia          | <max_inflation6>  |
    +------------+---------------+-------------------+
    

    Again, there are multiple ways to get to this solution using only joins, but the focus here is on showing you an introduction into advanced subqueries.

    • Create an INNER JOIN with countries on the left and economies on the right with USING, without aliasing your tables or columns.
    • Retrieve the country’s name, continent, and inflation rate for 2015.
    -- Select fields
    SELECT name, continent, inflation_rate
      -- From countries
      FROM countries
        -- Join to economies
        INNER JOIN economies
        -- Match on code
        USING (code)
    -- Where year is 2015
    WHERE year = 2015;
    Table 4.8: Displaying records 1 - 10
    name continent inflation_rate
    Afghanistan Asia -1.549
    Angola Africa 10.287
    Albania Europe 1.896
    United Arab Emirates Asia 4.070
    Argentina South America NA
    Armenia Asia 3.731
    Antigua and Barbuda North America 0.969
    Australia Oceania 1.461
    Austria Europe 0.810
    Azerbaijan Asia 4.049

    Select the maximum inflation rate in 2015 AS max_inf grouped by continent using the previous step’s query as a subquery in the FROM clause.

    • Thus, in your subquery you should:
      • Create an inner join with countries on the left and economies on the right with USING (without aliasing your tables or columns).
      • Retrieve the country name, continent, and inflation rate for 2015.
      • Alias the subquery as subquery.

    This will result in the six maximum inflation rates in 2015 for the six continents as one field table. Make sure to not include continent in the outer SELECT statement.

    -- Select the maximum inflation rate as max_inf
    SELECT MAX(inflation_rate) AS max_inf
      -- Subquery using FROM (alias as subquery)
      FROM (
          SELECT name, continent, inflation_rate
          FROM countries
          INNER JOIN economies
          USING (code)
          WHERE year = 2015) AS subquery
    -- Group by continent
    GROUP BY continent;
    Table 4.9: 6 records
    max_inf
    21.858
    39.403
    48.684
    7.524
    9.784
    121.738
    • Now it’s time to append your second query to your first query using AND and IN to obtain the name of the country, its continent, and the maximum inflation rate for each continent in 2015.
    • For the sake of practice, change all joining conditions to use ON instead of USING.
    -- Select fields
    SELECT name, continent, inflation_rate
      -- From countries
      FROM countries
        -- Join to economies
        INNER JOIN economies
        -- Match on code
        ON countries.code = economies.code
      -- Where year is 2015
      WHERE year = 2015
        -- And inflation rate in subquery (alias as subquery)
        AND inflation_rate IN (
            SELECT MAX(inflation_rate) AS max_inf
            FROM (
                 SELECT name, continent, inflation_rate
                 FROM countries
                 INNER JOIN economies
                 ON countries.code = economies.code
                 WHERE year = 2015) AS subquery
          -- Group by continent
            GROUP BY continent);
    Table 4.10: 6 records
    name continent inflation_rate
    Haiti North America 7.524
    Malawi Africa 21.858
    Nauru Oceania 9.784
    Ukraine Europe 48.684
    Venezuela South America 121.738
    Yemen Asia 39.403

    Wow! Well done! This code works since each of the six maximum inflation rate values occur only once in the 2015 data. Think about whether this particular code involving subqueries would work in cases where there are ties for the maximum inflation rate values.

    4.2.3 Subquery challenge

    Let’s test your understanding of the subqueries with a challenge problem! Use a subquery to get 2015 economic data for countries that do not have

    • gov_form of ‘Constitutional Monarchy’ or
    • ‘Republic’ in their gov_form.

    Here, gov_form stands for the form of the government for each country. Review the different entries for gov_form in the countries table.

    • Select the country code, inflation rate, and unemployment rate.
    • Order by inflation rate ascending.
    • Do not use table aliasing in this exercise.
    -- Select fields
    SELECT code, inflation_rate, unemployment_rate
      -- From economies
      FROM economies
      -- Where year is 2015 and code is not in
      WHERE year = 2015 AND code NOT IN
        -- Subquery
        (SELECT code
         FROM countries
         WHERE (gov_form = 'Constitutional Monarchy' OR gov_form LIKE '%Republic%'))
    -- Order by inflation rate
    ORDER BY inflation_rate;
    Table 4.11: Displaying records 1 - 10
    code inflation_rate unemployment_rate
    AFG -1.549 NA
    CHE -1.140 3.178
    PRI -0.751 12.000
    ROU -0.596 6.812
    BRN -0.423 6.900
    TON -0.283 NA
    OMN 0.065 NA
    TLS 0.553 NA
    BEL 0.620 8.492
    CAN 1.132 6.900

    Superb! Let’s review subqueries before you head off to the last video of this course!

    4.2.4 Subquery review

    Within which SQL clause are subqueries most frequently found?

  • WHERE
  • FROM
  • SELECT
  • IN
  • Correct!

    4.3 Course review

    4.3.1 Final challenge

    Welcome to the end of the course! The next three exercises will test your knowledge of the content covered in this course and apply many of the ideas you’ve seen to difficult problems. Good luck!

    Read carefully over the instructions and solve them step-by-step, thinking about how the different clauses work together.

    In this exercise, you’ll need to get the country names and other 2015 data in the economies table and the countries table for Central American countries with an official language.

    • Select unique country names. Also select the total investment and imports fields.
    • Use a left join with countries on the left. (An inner join would also work, but please use a left join here.)
    • Match on code in the two tables AND use a subquery inside of ON to choose the appropriate languages records.
    • Order by country name ascending.
    • Use table aliasing but not field aliasing in this exercise.
    -- Select fields
    SELECT DISTINCT name, total_investment, imports
      -- From table (with alias)
      FROM countries AS c
        -- Join with table (with alias)
        LEFT JOIN economies AS e
          -- Match on code
          ON (c.code = e.code
            -- and code in Subquery
            AND c.code IN (
              SELECT l.code
              FROM languages AS l
              WHERE official = TRUE
            ) )
      -- Where region and year are correct
      WHERE region = 'Central America' AND year = 2015
    -- Order by field
    ORDER BY name;
    Table 4.12: 7 records
    name total_investment imports
    Belize 22.014 6.743
    Costa Rica 20.218 4.629
    El Salvador 13.983 8.193
    Guatemala 13.433 15.124
    Honduras 24.633 9.353
    Nicaragua 31.862 11.665
    Panama 46.557 5.898

    One down, two to go!

    4.3.2 Final challenge (2)

    Whoofta! That was challenging, huh?

    Let’s ease up a bit and calculate the average fertility rate for each region in 2015.

    • Include the name of region, its continent, and average fertility rate aliased as avg_fert_rate.
    • Sort based on avg_fert_rate ascending.
    • Remember that you’ll need to GROUP BY all fields that aren’t included in the aggregate function of SELECT.
    -- Select fields
    SELECT region, continent, AVG(fertility_rate) AS avg_fert_rate
      -- From left table
      FROM countries AS c
        -- Join to right table
        INNER JOIN populations AS p
          -- Match on join condition
          ON c.code = p.country_code
      -- Where specific records matching some condition
      WHERE year = 2015
    -- Group appropriately?
    GROUP BY region, continent
    -- Order appropriately
    ORDER BY avg_fert_rate;
    Table 4.13: Displaying records 1 - 10
    region continent avg_fert_rate
    Southern Europe Europe 1.426100
    Eastern Europe Europe 1.490889
    Baltic Countries Europe 1.603333
    Eastern Asia Asia 1.620714
    Western Europe Europe 1.632500
    North America North America 1.765750
    British Islands Europe 1.875000
    Nordic Countries Europe 1.893333
    Australia and New Zealand Oceania 1.911500
    Caribbean North America 1.950571

    Interesting. It seems that the average fertility rate is lowest in Southern Europe and highest in Central Africa. Two down, one to go!

    4.3.3 Final challenge (3)

    Welcome to the last challenge problem. By now you’re a query warrior! Remember that these challenges are designed to take you to the limit to solidify your SQL knowledge! Take a deep breath and solve this step-by-step.

    You are now tasked with determining the top 10 capital cities in Europe and the Americas in terms of a calculated percentage using city_proper_pop and metroarea_pop in cities.

    Do not use table aliasing in this exercise.

    • Select the city name, country code, city proper population, and metro area population.
    • Calculate the percentage of metro area population composed of city proper population for each city in cities, aliased as city_perc.
    • Focus only on capital cities in Europe and the Americas in a subquery.
    • Make sure to exclude records with missing data on metro area population.
    • Order the result by city_perc descending.
    • Then determine the top 10 capital cities in Europe and the Americas in terms of this city_perc percentage.
    -- Select fields
    SELECT name, country_code, city_proper_pop, metroarea_pop,
          -- Calculate city_perc
          city_proper_pop / metroarea_pop * 100 AS city_perc
      -- From appropriate table    
      FROM cities
      -- Where
      WHERE name IN
        -- Subquery
        (SELECT capital
         FROM countries
         WHERE (continent = 'Europe'
            OR continent LIKE '%America'))
           AND metroarea_pop IS NOT NULL
    -- Order appropriately
    ORDER BY city_perc DESC
    -- Limit amount
    LIMIT 10;
    Table 4.14: Displaying records 1 - 10
    name country_code city_proper_pop metroarea_pop city_perc
    Berlin DEU 3517424 5871022 0
    Bogota COL 7878783 9800000 0
    Brasilia BRA 2556149 3919864 0
    Budapest HUN 1759407 2927944 0
    Buenos Aires ARG 3054300 14122000 0
    Caracas VEN 1943901 2923959 0
    Guatemala City GTM 2110100 4500000 0
    Lima PER 8852000 10750000 0
    London GBR 8673713 13879757 0
    Mexico City MEX 8974724 20063000 0

    That’s a wrap! Check out the excellent follow-up course entitled Intermediate SQL by Mona Khalil too!