Intermediate SQL

Mona Khalil


Course Description

So you’ve learned how to aggregate and join data from tables in your database—now what? How do you manipulate, transform, and make the most sense of your data? This intermediate-level course will teach you several key functions necessary to wrangle, filter, and categorize information in a relational database, expand your SQL toolkit, and answer complex questions. You will learn the robust use of CASE statements, subqueries, and window functions—all while discovering some interesting facts about soccer using the European Soccer Database. Should there be further discussion, please contact us via email: .

1 We’ll take the CASE

In this chapter, you will learn how to use the CASE WHEN statement to create categorical variables, aggregate data into a single column with multiple filtering conditions, and calculate counts and percentages.

1.1 We’ll take the CASE

1.1.1 Basic CASE statements

What is your favorite team?

The European Soccer Database contains data about 12,800 matches from 11 countries played between 2011-2015! Throughout this course, you will be shown filtered versions of the tables in this database in order to better explore their contents.

In this exercise, you will identify matches played between FC Schalke 04 and FC Bayern Munich. There are 2 teams identified in each match in the hometeam_id and awayteam_id columns, available to you in the filtered matches table. ID can join to the team_api_id column in the teams table, but you cannot perform a join on both at the same time.

However, you can perform this operation using a CASE statement once you’ve identified the team_api_id associated with each team!

  • Select the team’s long name and API id from the teams table.
  • Filter the query for FC Schalke 04 and FC Bayern Munich using IN, giving you the team_api_IDs needed for the next step.
  • SELECT
        -- Select the team long name and team API id
        team_long_name,
        team_api_id
    FROM teams
    -- Only include FC Schalke 04 and FC Bayern Munich
    WHERE team_long_name IN ('FC Schalke 04', 'FC Bayern Munich');
    Table 1.1: 2 records
    team_long_name team_api_id
    FC Bayern Munich 9823
    FC Schalke 04 10189
  • Create a CASE statement that identifies whether a match in Germany included FC Bayern Munich, FC Schalke 04, or neither as the home team.
  • Group the query by the CASE statement alias, home_team.
  • -- Identify the home team as Bayern Munich, Schalke 04, or neither
    SELECT 
        CASE WHEN hometeam_id = 10189 THEN 'FC Schalke 04'
             WHEN hometeam_id = 9823 THEN 'FC Bayern Munich'
             ELSE 'Other' END AS home_team,
        COUNT(id) AS total_matches
    FROM matches
    -- Group by the CASE statement alias
    GROUP BY home_team;
    Table 1.2: 3 records
    home_team total_matches
    FC Bayern Munich 136
    FC Schalke 04 136
    Other 25707

    Great job! You were able to identify two teams using CASE statement instead of a join, which could only identify one at a time. Let’s learn some more about key teams in Europe!

    1.1.2 CASE statements comparing column values

    Barcelona is considered one of the strongest teams in Spain’s soccer league.

    In this exercise, you will be creating a list of matches in the 2011/2012 season where Barcelona was the home team. You will do this using a CASE statement that compares the values of two columns to create a new group – wins, losses, and ties.

    In 3 steps, you will build a query that identifies a match’s winner, identifies the identity of the opponent, and finally filters for Barcelona as the home team. Completing a query in this order will allow you to watch your results take shape with each new piece of information.

    The matches table currently contains Barcelona’s matches from the 2011/2012 season, and has two key columns, hometeam_id and awayteam_id, that can be joined with the teams table. However, you can only join teams to one column at a time.

  • Select the date of the match and create a CASE statement to identify matches as home wins, home losses, or ties.
  • SELECT 
        -- Select the date of the match
        date,
        -- Identify home wins, losses, or ties
        CASE WHEN home_goal > away_goal THEN 'Home win!'
             WHEN home_goal < away_goal THEN 'Home loss :(' 
             ELSE 'Tie' END AS outcome
    FROM matches;
    Table 1.3: Displaying records 1 - 10
    date outcome
    2022-08-17 Tie
    2022-08-16 Tie
    2022-08-16 Home loss :(
    2022-08-17 Home win!
    2022-08-16 Home loss :(
    2022-09-24 Tie
    2022-08-16 Tie
    2022-08-16 Home loss :(
    2022-08-16 Home win!
    2022-11-01 Home win!
  • Left join the teams table team_api_id column to the matches table awayteam_id. This allows us to retrieve the away team’s identity.
  • Select team_long_name from teams as opponent and complete the CASE statement from Step 1.
  • SELECT 
        m.date,
        --Select the team long name column and call it 'opponent'
        t.team_long_name AS opponent, 
        -- Complete the CASE statement with an alias
        CASE WHEN m.home_goal > m.away_goal THEN 'Home win!'
             WHEN m.home_goal < m.away_goal THEN 'Home loss :('
             ELSE 'Tie' END AS outcome
    FROM matches AS m
    -- Left join teams onto matches
    LEFT JOIN teams AS t
    ON m.awayteam_id = t.team_api_id;
    Table 1.4: Displaying records 1 - 10
    date opponent outcome
    2022-08-17 Beerschot AC Tie
    2022-08-16 Sporting Lokeren Tie
    2022-08-16 RSC Anderlecht Home loss :(
    2022-08-17 RAEC Mons Home win!
    2022-08-16 Standard de Liège Home loss :(
    2022-09-24 Club Brugge KV Tie
    2022-08-16 KV Kortrijk Tie
    2022-08-16 Royal Excel Mouscron Home loss :(
    2022-08-16 Sporting Charleroi Home win!
    2022-11-01 KV Kortrijk Home win!
  • Complete the same CASE statement as the previous steps.
  • Filter for matches where the home team is FC Barcelona (id = 8634).
  • SELECT 
        m.date,
        t.team_long_name AS opponent,
        -- Complete the CASE statement with an alias
        CASE WHEN m.home_goal > m.away_goal THEN 'Barcelona win!'
             WHEN m.home_goal < m.away_goal THEN 'Barcelona loss :(' 
             ELSE 'Tie' END AS outcome 
    FROM matches AS m
    LEFT JOIN teams AS t 
    ON m.awayteam_id = t.team_api_id
    -- Filter for Barcelona as the home team
    WHERE m.hometeam_id = 8634; 
    Table 1.5: Displaying records 1 - 10
    date opponent outcome
    2022-11-08 Real Valladolid Barcelona win!
    2022-11-23 Getafe CF Tie
    2022-12-06 Valencia CF Barcelona win!
    2022-12-13 Real Madrid CF Barcelona win!
    2022-01-03 RCD Mallorca Barcelona win!
    2022-01-17 RC Deportivo de La Coruña Barcelona win!
    2022-09-13 Racing Santander Tie
    2022-01-24 CD Numancia Barcelona win!
    2022-02-08 Real Sporting de Gijón Barcelona win!
    2022-02-21 RCD Espanyol Barcelona loss :(

    Great job! Did you notice how often Barcelona tends to win matches?

    1.1.3 CASE statements comparing two column values part 2

    Similar to the previous exercise, you will construct a query to determine the outcome of Barcelona’s matches where they played as the away team. You will learn how to combine these two queries in chapters 2 and 3.

    Did their performance differ from the matches where they were the home team?

  • Complete the CASE statement to identify Barcelona’s away team games (id = 8634) as wins, losses, or ties.
  • Left join the teams table team_api_id column on the matches table hometeam_id column. This retrieves the identity of the home team opponent.
  • Filter the query to only include matches where Barcelona was the away team.
  • -- Select matches where Barcelona was the away team
    SELECT
        m.date,
        t.team_long_name AS opponent,
        CASE WHEN m.home_goal < m.away_goal THEN 'Barcelona win!'
             WHEN m.home_goal > m.away_goal THEN 'Barcelona loss :('
             ELSE 'Tie' END AS outcome
    FROM matches AS m
    -- Join teams to matches
    LEFT JOIN teams AS t
    ON m.hometeam_id = t.team_api_id
    WHERE m.awayteam_id = 8634;
    Table 1.6: Displaying records 1 - 10
    date opponent outcome
    2022-08-31 CD Numancia Barcelona loss :(
    2022-11-16 RC Recreativo Barcelona win!
    2022-11-29 Sevilla FC Barcelona win!
    2022-12-21 Villarreal CF Barcelona win!
    2022-01-11 CA Osasuna Barcelona win!
    2022-02-01 Racing Santander Barcelona win!
    2022-02-14 Real Betis Balompié Tie
    2022-03-01 Atlético Madrid Barcelona loss :(
    2022-03-15 UD Almería Barcelona win!
    2022-04-04 Real Valladolid Barcelona win!

    Fantastic! Barcelona’s performance seems to be worse when they are the away team. We’ll explore this further in the next lesson.

    1.2 More complex CASE WHEN

    1.2.1 In CASE of rivalry

    Barcelona and Real Madrid have been rival teams for more than 80 years. Matches between these two teams are given the name El Clásico (The Classic). In this exercise, you will query a list of matches played between these two rivals.

    You will notice in Step 2 that when you have multiple logical conditions in a CASE statement, you may quickly end up with a large number of WHEN clauses to logically test every outcome you are interested in. It’s important to make sure you don’t accidentally exclude key information in your ELSE clause.

    In this exercise, you will retrieve information about matches played between Barcelona (id = 8634) and Real Madrid (id = 8633). Note that the query you are provided with already identifies the Clásico matches using a filter in the WHERE clause.

  • Complete the first CASE statement, identifying Barcelona or Real Madrid as the home team using the hometeam_id column.
  • Complete the second CASE statement in the same way, using awayteam_id.
  • SELECT 
        date,
        -- Identify the home team as Barcelona or Real Madrid
        CASE WHEN hometeam_id = 8634 THEN 'FC Barcelona' 
             ELSE 'Real Madrid CF' END AS home,
        -- Identify the away team as Barcelona or Real Madrid
        CASE WHEN awayteam_id = 8634 THEN 'FC Barcelona' 
             ELSE 'Real Madrid CF' END AS away
    FROM matches
    WHERE (awayteam_id = 8634 OR hometeam_id = 8634)
          AND (awayteam_id = 8633 OR hometeam_id = 8633);
    Table 1.7: Displaying records 1 - 10
    date home away
    2022-12-13 FC Barcelona Real Madrid CF
    2022-05-02 Real Madrid CF FC Barcelona
    2022-11-29 FC Barcelona Real Madrid CF
    2022-04-10 Real Madrid CF FC Barcelona
    2022-11-29 FC Barcelona Real Madrid CF
    2022-04-16 Real Madrid CF FC Barcelona
    2022-12-10 Real Madrid CF FC Barcelona
    2022-04-21 FC Barcelona Real Madrid CF
    2022-03-02 Real Madrid CF FC Barcelona
    2022-10-07 FC Barcelona Real Madrid CF
  • Construct the final CASE statement identifying who won each match. Note there are 3 possible outcomes, but 5 conditions that you need to identify.
  • Fill in the logical operators to identify Barcelona or Real Madrid as the winner.
  • SELECT 
        date,
        CASE WHEN hometeam_id = 8634 THEN 'FC Barcelona' 
             ELSE 'Real Madrid CF' END as home,
        CASE WHEN awayteam_id = 8634 THEN 'FC Barcelona' 
             ELSE 'Real Madrid CF' END as away,
        -- Identify all possible match outcomes
        CASE WHEN home_goal > away_goal AND hometeam_id = 8634 THEN 'Barcelona win!'
            WHEN home_goal > away_goal AND hometeam_id = 8633 THEN 'Real Madrid win!'
            WHEN home_goal < away_goal AND awayteam_id = 8634 THEN 'Barcelona win!'
            WHEN home_goal < away_goal AND awayteam_id = 8633 THEN 'Real Madrid win!'
            ELSE 'Tie!' END as outcome
    FROM matches
    WHERE (awayteam_id = 8634 OR hometeam_id = 8634)
          AND (awayteam_id = 8633 OR hometeam_id = 8633);
    Table 1.8: Displaying records 1 - 10
    date home away outcome
    2022-12-13 FC Barcelona Real Madrid CF Barcelona win!
    2022-05-02 Real Madrid CF FC Barcelona Barcelona win!
    2022-11-29 FC Barcelona Real Madrid CF Barcelona win!
    2022-04-10 Real Madrid CF FC Barcelona Barcelona win!
    2022-11-29 FC Barcelona Real Madrid CF Barcelona win!
    2022-04-16 Real Madrid CF FC Barcelona Tie!
    2022-12-10 Real Madrid CF FC Barcelona Barcelona win!
    2022-04-21 FC Barcelona Real Madrid CF Real Madrid win!
    2022-03-02 Real Madrid CF FC Barcelona Real Madrid win!
    2022-10-07 FC Barcelona Real Madrid CF Tie!

    Fantastic! These two teams are fairly evenly matched!

    1.2.2 Filtering your CASE statement

    Let’s generate a list of matches won by Italy’s Bologna team! There are quite a few additional teams in the two tables, so a key part of generating a usable query will be using your CASE statement as a filter in the WHERE clause.

    CASE statements allow you to categorize data that you’re interested in – and exclude data you’re not interested in. In order to do this, you can use a CASE statement as a filter in the WHERE statement to remove output you don’t want to see.

    Here is how you might set that up:

    SELECT *
    FROM table
    WHERE 
        CASE WHEN a > 5 THEN 'Keep'
             WHEN a <= 5 THEN 'Exclude' END = 'Keep';
    

    In essence, you can use the CASE statement as a filtering column like any other column in your database. The only difference is that you don’t alias the statement in WHERE.

  • Identify Bologna’s team ID listed in the teams table by selecting the team_long_name and team_api_id.
  • -- Select team_long_name and team_api_id from team
    SELECT
        team_long_name,
        team_api_id
    FROM teams
    -- Filter for team long name
    WHERE team_long_name = 'Bologna';
    Table 1.9: 1 records
    team_long_name team_api_id
    Bologna 9857
  • Select the season and date that a match was played.
  • Complete the CASE statement so that only Bologna’s home and away wins are identified.
  • -- Select the season and date columns
    SELECT 
        season,
        date,
        -- Identify when Bologna won a match
        CASE WHEN hometeam_id = 9857 
              AND home_goal > away_goal 
             THEN 'Bologna Win'
             WHEN awayteam_id = 9857 
              AND away_goal > home_goal 
             THEN 'Bologna Win' 
              END AS outcome
    FROM matches;
    Table 1.10: Displaying records 1 - 10
    season date outcome
    2008/2009 2022-08-17 NA
    2008/2009 2022-08-16 NA
    2008/2009 2022-08-16 NA
    2008/2009 2022-08-17 NA
    2008/2009 2022-08-16 NA
    2008/2009 2022-09-24 NA
    2008/2009 2022-08-16 NA
    2008/2009 2022-08-16 NA
    2008/2009 2022-08-16 NA
    2008/2009 2022-11-01 NA
  • Select the home_goal and away_goal for each match.
  • Use the CASE statement in the WHERE clause to filter all NULL values generated by the statement in the previous step.
  • -- Select the season, date, home_goal, and away_goal columns
    SELECT 
        season,
        date,
        home_goal,
        away_goal
    FROM matches
    WHERE
    -- Exclude games not won by Bologna
        CASE WHEN hometeam_id = 9857 AND home_goal > away_goal THEN 'Bologna Win'
             WHEN awayteam_id = 9857 AND away_goal > home_goal THEN 'Bologna Win' 
             END IS NOT NULL;
    Table 1.11: Displaying records 1 - 10
    season date home_goal away_goal
    2008/2009 2022-08-31 1 2
    2008/2009 2022-12-13 5 2
    2008/2009 2022-01-18 1 2
    2008/2009 2022-01-28 0 1
    2008/2009 2022-03-08 3 0
    2008/2009 2022-04-26 2 0
    2008/2009 2022-05-17 2 1
    2008/2009 2022-05-31 3 1
    2008/2009 2022-10-19 3 1
    2009/2010 2022-10-28 2 1

    Good work! You now have details on every match in this database where Bologna won. We will work on adding more useful information to your queries later in this course.

    1.3 Aggregate CASE WHEN

    1.3.1 COUNT using CASE WHEN

    Do the number of soccer matches played in a given European country differ across seasons? We will use the European Soccer Database to answer this question.

    You will examine the number of matches played in 3 seasons within each country listed in the database. This is much easier to explore with each season’s matches in separate columns. Using the country and unfiltered match table, you will count the number of matches played in each country during the 2012/2013, 2013/2014, and 2014/2015 match seasons.

  • Create a CASE statement that identifies the id of matches played in the 2012/2013 season. Specify that you want ELSE values to be NULL.
  • Wrap the CASE statement in a COUNT function and group the query by the country alias.
  • SELECT 
        c.name AS country,
        -- Count games from the 2012/2013 season
        COUNT(CASE WHEN m.season = '2012/2013' 
                   THEN m.id ELSE NULL END) AS matches
    FROM country AS c
    LEFT JOIN match AS m
    ON c.id = m.country_id
    -- Group by country name alias
    GROUP BY country;
    Table 1.12: Displaying records 1 - 10
    country matches
    Belgium 240
    England 380
    France 380
    Germany 306
    Italy 380
    Netherlands 306
    Poland 240
    Portugal 240
    Scotland 228
    Spain 380
  • Create 3 CASE WHEN statements counting the matches played in each country across the 3 seasons.
  • END your CASE statement without an ELSE clause.
  • SELECT 
        c.name AS country,
        -- Count matches in each of the 3 seasons
        COUNT(CASE WHEN m.season = '2012/2013' THEN m.id END) AS matches,
        COUNT(CASE WHEN m.season = '2013/2014' THEN m.id END) AS matches,
        COUNT(CASE WHEN m.season = '2014/2015' THEN m.id END) AS matches
    FROM country AS c
    LEFT JOIN match AS m
    ON c.id = m.country_id
    -- Group by country name alias
    GROUP BY country;
    Table 1.13: Displaying records 1 - 10
    country matches matches matches
    Belgium 240 12 240
    England 380 380 380
    France 380 380 380
    Germany 306 306 306
    Italy 380 380 379
    Netherlands 306 306 306
    Poland 240 240 240
    Portugal 240 240 306
    Scotland 228 228 228
    Spain 380 380 380

    Fantastic! The number of matches played in each season seems relatively consistent across countries. Where do you see the largest difference?

    1.3.2 COUNT and CASE WHEN with multiple conditions

    In R or Python, you have the ability to calculate a SUM of logical values (i.e., TRUE/FALSE) directly. In SQL, you have to convert these values into 1 and 0 before calculating a sum. This can be done using a CASE statement.

    There’s one key difference when using SUM to aggregate logical values compared to using COUNT in the previous exercise –

    Your goal here is to use the country and match table to determine the total number of matches won by the home team in each country during the 2012/2013, 2013/2014, and 2014/2015 seasons.

  • Create 3 CASE statements to “count” matches in the ‘2012/2013’, ‘2013/2014’, and ‘2014/2015’ seasons, respectively.
  • Have each CASE statement return a 1 for every match you want to include, and a 0 for every match to exclude.
  • Wrap the CASE statement in a SUM to return the total matches played in each season.
  • Group the query by the country name alias.
  • SELECT 
        c.name AS country,
        -- Sum the total records in each season where the home team won
        SUM(CASE WHEN m.season = '2012/2013' AND m.home_goal > m.away_goal 
            THEN 1 ELSE 0 END) AS matches,
        SUM(CASE WHEN m.season = '2013/2014' AND m.home_goal > m.away_goal 
            THEN 1 ELSE 0 END) AS matches,
        SUM(CASE WHEN m.season = '2014/2015' AND m.home_goal > m.away_goal 
            THEN 1 ELSE 0 END) AS matches
    FROM country AS c
    LEFT JOIN match AS m
    ON c.id = m.country_id
    -- Group by country name alias
    GROUP BY country;
    Table 1.14: Displaying records 1 - 10
    country matches matches matches
    Belgium 102 6 106
    England 166 179 172
    France 170 168 181
    Germany 130 145 145
    Italy 177 181 152
    Netherlands 137 144 138
    Poland 97 110 114
    Portugal 103 108 137
    Scotland 89 102 102
    Spain 189 179 171

    Good job! You now have a second way of calculating the results from the previous exercise.

    1.3.3 Calculating percent with CASE and AVG

    CASE statements will return any value you specify in your THEN clause. This is an incredibly powerful tool for robust calculations and data manipulation when used in conjunction with an aggregate statement. One key task you can perform is using CASE inside an AVG function to calculate a percentage of information in your database.

    Here’s an example of how you set that up:

    AVG(CASE WHEN condition_is_met THEN 1
             WHEN condition_is_not_met THEN 0 END)
    

    With this approach, it’s important to accurately specify which records count as 0, otherwise your calculations may not be correct!

    Your task is to examine the number of wins, losses, and ties in each country. The matches table is filtered to include all matches from the 2013/2014 and 2014/2015 seasons.

  • Create 3 CASE statements to COUNT the total number of home team wins, away team wins, and ties, which will allow you to examine the total number of records.
  • SELECT 
        c.name AS country,
        -- Sum the home wins, away wins, and ties in each country
        COUNT(CASE WHEN m.home_goal > m.away_goal THEN m.id          
            END) AS home_wins,
        COUNT(CASE WHEN m.home_goal < m.away_goal THEN m.id 
            END) AS away_wins,
        COUNT(CASE WHEN m.home_goal = m.away_goal THEN m.id 
            END) AS ties
    FROM country AS c
    LEFT JOIN matches AS m
    ON c.id = m.country_id
    GROUP BY country;
    Table 1.15: Displaying records 1 - 10
    country home_wins away_wins ties
    Belgium 810 493 425
    England 1390 867 783
    France 1359 822 859
    Germany 1107 744 597
    Italy 1407 814 796
    Netherlands 1171 696 581
    Poland 870 525 525
    Portugal 908 611 533
    Scotland 760 617 447
    Spain 1485 851 704
  • Calculate the percentage of matches tied using a CASE statement inside AVG.
  • Fill in the logical operators for each statement. Alias your columns as ties_2013_2014 and ties_2014_2015, respectively.
  • SELECT 
        c.name AS country,
        -- Calculate the percentage of tied games in each season
        AVG(CASE WHEN m.season= '2013/2014' AND m.home_goal = m.away_goal THEN 1
                 WHEN m.season= '2013/2014' AND m.home_goal != m.away_goal THEN 0
                 END) AS ties_2013_2014,
        AVG(CASE WHEN m.season= '2014/2015' AND m.home_goal = m.away_goal THEN 1
                 WHEN m.season= '2014/2015' AND m.home_goal != m.away_goal THEN 0
                 END) AS ties_2014_2015
    FROM country AS c
    LEFT JOIN matches AS m
    ON c.id = m.country_id
    GROUP BY country;
    Table 1.16: Displaying records 1 - 10
    country ties_2013_2014 ties_2014_2015
    Belgium 0.1666667 0.2500000
    England 0.2052632 0.2447368
    France 0.2842105 0.2315789
    Germany 0.2091503 0.2679739
    Italy 0.2368421 0.3166227
    Netherlands 0.2745098 0.2385621
    Poland 0.3041667 0.2750000
    Portugal 0.2500000 0.2777778
    Scotland 0.2192982 0.1929825
    Spain 0.2263158 0.2394737
  • The previous “ties” columns returned values with 14 decimal points, which is not easy to interpret. Use the ROUND function to round to 2 decimal points.
  • SELECT 
        c.name AS country,
        -- Round the percentage of tied games to 2 decimal points
        ROUND(AVG(CASE WHEN m.season='2013/2014' AND m.home_goal = m.away_goal THEN 1
                 WHEN m.season='2013/2014' AND m.home_goal != m.away_goal THEN 0
                 END),2) AS pct_ties_2013_2014,
        ROUND(AVG(CASE WHEN m.season='2014/2015' AND m.home_goal = m.away_goal THEN 1
                 WHEN m.season='2014/2015' AND m.home_goal != m.away_goal THEN 0
                 END),2) AS pct_ties_2014_2015
    FROM country AS c
    LEFT JOIN matches AS m
    ON c.id = m.country_id
    GROUP BY country;
    Table 1.17: Displaying records 1 - 10
    country pct_ties_2013_2014 pct_ties_2014_2015
    Belgium 0.17 0.25
    England 0.21 0.24
    France 0.28 0.23
    Germany 0.21 0.27
    Italy 0.24 0.32
    Netherlands 0.27 0.24
    Poland 0.30 0.28
    Portugal 0.25 0.28
    Scotland 0.22 0.19
    Spain 0.23 0.24

    Fantastic! That’s much easier to read. What patterns/differences are you seeing in the results? Who tends to score ties more often?

    2 Short and Simple Subqueries

    In this chapter, you will learn about subqueries in the SELECT, FROM, and WHERE clauses. You will gain an understanding of when subqueries are necessary to construct your dataset and where to best include them in your queries.

    2.1 WHERE are the Subqueries?

    2.1.1 Filtering using scalar subqueries

    Subqueries are incredibly powerful for performing complex filters and transformations. You can filter data based on single, scalar values using a subquery in ways you cannot by using WHERE statements or joins. Subqueries can also be used for more advanced manipulation of your data set. You will likely encounter subqueries in any real-world setting that uses relational databases.

    In this exercise, you will generate a list of matches where the total goals scored (for both teams in total) is more than 3 times the average for games in the matches table, which includes all games played in the 2013/2014 season.

  • Calculate triple the average home + away goals scored across all matches. This will become your subquery in the next step. Note that this column does not have an alias, so it will be called ?column? in your results.
  • -- Select the average of home + away goals, multiplied by 3
    SELECT 
        3 * avg(home_goal + away_goal)
    FROM matches;
    Table 2.1: 1 records
    3 * avg(home_goal + away_goal)
    8.116594
  • Select the date, home goals, and away goals in the main query.
  • Filter the main query for matches where the total goals scored exceed the value in the subquery.
  • SELECT 
        -- Select the date, home goals, and away goals scored
        date,
        home_goal,
        away_goal
    FROM  matches
    -- Filter for matches where total goals exceeds 3x the average
    WHERE (home_goal + away_goal) > 
           (SELECT 3 * AVG(home_goal + away_goal)
            FROM matches); 
    Table 2.2: Displaying records 1 - 10
    date home_goal away_goal
    2022-10-29 4 5
    2022-11-22 9 1
    2022-01-16 7 2
    2022-08-28 8 2
    2022-12-29 7 3
    2022-05-19 5 5
    2022-12-14 6 3
    2022-03-22 3 6
    2022-08-30 3 6
    2022-01-23 4 5

    Look at those scores! This calculation would have been much more difficult without a subquery.

    2.1.2 Filtering using a subquery with a list

    Your goal in this exercise is to generate a list of teams that never played a game in their home city. Using a subquery, you will generate a list of unique hometeam_ID values from the unfiltered match table to exclude in the team table’s team_api_ID column.

    In addition to filtering using a single-value (scalar) subquery, you can create a list of values in a subquery to filter data based on a complex set of conditions. This type of subquery generates a one column reference list for the main query. As long as the values in your list match a column in your main query’s table, you don’t need to use a join – even if the list is from a separate table.

  • Create a subquery in the WHERE clause that retrieves all unique hometeam_ID values from the match table.
  • Select the team_long_name and team_short_name from the team table. Exclude all values from the subquery in the main query.
  • SELECT 
        -- Select the team long and short names
        team_long_name,
        team_short_name
    FROM team
    -- Exclude all values from the subquery
    WHERE team_api_id NOT IN
         (SELECT DISTINCT hometeam_id FROM match);
    Table 2.3: 0 records
    team_long_name team_short_name

    Great job! There are 52 teams without any home games recorded in this database. Why do you think that is?

    2.1.3 Filtering with more complex subquery conditions

    In the previous exercise, you generated a list of teams that have no home matches listed in the soccer database using a subquery in WHERE. Let’s do some further exploration in this database by creating a list of teams that scored 8 or more goals in a home match.

    In order to do this, you will construct a subquery in the WHERE statement with its own filtering condition.

  • Create a subquery in WHERE clause that retrieves all hometeam_ID values from match with a home_goal score greater than or equal to 8.
  • Select the team_long_name and team_short_name from the team table. Include all values from the subquery in the main query.
  • SELECT
        -- Select the team long and short names
        team_long_name,
        team_short_name
    FROM team
    -- Filter for teams with 8 or more home goals
    WHERE team_api_id IN
          (SELECT hometeam_id 
           FROM match
           WHERE home_goal >= 8);
    Table 2.4: Displaying records 1 - 10
    team_long_name team_short_name
    Manchester United MUN
    Tottenham Hotspur TOT
    Chelsea CHE
    Southampton SOU
    FC Bayern Munich BMU
    PSV PSV
    SL Benfica BEN
    Celtic CEL
    Real Madrid CF REA
    FC Barcelona BAR

    Excellent job! This is an interesting list of teams!

    2.2 Subqueries in FROM

    2.2.1 Joining Subqueries in FROM

    The match table in the European Soccer Database does not contain country or team names. You can get this information by joining it to the country table, and use this to aggregate information, such as the number of matches played in each country.

    If you’re interested in filtering data from one of these tables, you can also create a subquery from one of the tables, and then join it to an existing table in the database. A subquery in FROM is an effective way of answering detailed questions that requires filtering or transforming data before including it in your final results.

    Your goal in this exercise is to generate a subquery using the match table, and then join that subquery to the country table to calculate information about matches with 10 or more goals in total!

  • Create the subquery to be used in the next step, which selects the country ID and match ID (id) from the match table.
  • Filter the query for matches with greater than or equal to 10 goals.
  • SELECT 
        -- Select the country ID and match ID
        country_id, 
        id 
    FROM match
    -- Filter for matches with 10 or more goals in total
    WHERE (home_goal + away_goal) >= 10;
    Table 2.5: Displaying records 1 - 10
    country_id id
    1729 2157
    1729 3093
    1729 3369
    1729 3566
    4769 5192
    7809 9211
    13274 13899
    13274 14224
    19694 20107
    21518 23444
  • Construct a subquery that selects only matches with 10 or more total goals.
  • Inner join the subquery onto country in the main query.
  • Select name from country and count the id column from match.
  • SELECT
        -- Select country name and the count match IDs
        c.name AS country_name,
        COUNT(sub.id) AS matches
    FROM country AS c
    -- Inner join the subquery onto country
    -- Select the country id and match id columns
    INNER JOIN (SELECT country_id, id 
                FROM match
                -- Filter the subquery by matches with 10+ goals
                WHERE (home_goal + away_goal) >= 10) AS sub
    ON c.id = sub.country_id
    GROUP BY country_name;
    Table 2.6: 6 records
    country_name matches
    England 4
    France 1
    Germany 1
    Netherlands 2
    Scotland 1
    Spain 5

    Good job! You generated a count of matches in each country where the total goals was higher than 10. That’s a lot of goals!

    2.2.2 Building on Subqueries in FROM

    In the previous exercise, you found that England, Netherlands, Germany and Spain were the only countries that had matches in the database where 10 or more goals were scored overall. Let’s find out some more details about those matches – when they were played, during which seasons, and how many of the goals were home versus away goals.

    You’ll notice that in this exercise, the table alias is excluded for every column selected in the main query. This is because the main query is extracting data from the subquery, which is treated as a single table.

  • Complete the subquery inside the FROM clause. Select the country name from the country table, along with the date, the home goal, the away goal, and the total goals columns from the match table.
  • Create a column in the subquery that adds home and away goals, called total_goals. This will be used to filter the main query.
  • Select the country, date, home goals, and away goals in the main query.
  • Filter the main query for games with 10 or more total goals.
  • SELECT
        -- Select country, date, home, and away goals from the subquery
        country,
        date,
        home_goal,
        away_goal
    FROM
        -- Select country name, date, home_goal, away_goal, and total goals in the subquery
        (SELECT c.name AS country, 
                m.date, 
                m.home_goal, 
                m.away_goal,
               (m.home_goal + m.away_goal) AS total_goals
        FROM match AS m
        LEFT JOIN country AS c
        ON m.country_id = c.id) AS subquery
    -- Filter by total goals scored in the main query
    WHERE total_goals >= 10;
    Table 2.7: Displaying records 1 - 10
    country date home_goal away_goal
    England 2022-11-22 9 1
    England 2022-08-28 8 2
    England 2022-12-29 7 3
    England 2022-05-19 5 5
    France 2022-11-08 5 5
    Germany 2022-03-30 9 2
    Netherlands 2022-10-24 10 0
    Netherlands 2022-11-06 6 4
    Scotland 2022-05-05 6 6
    Spain 2022-10-30 7 3

    Great job! You can manipulate the information you extract by placing joins in or using a subquery. These must have been exciting matches!

    2.3 Subqueries in SELECT

    2.3.1 Add a subquery to the SELECT clause

    Subqueries in SELECT statements generate a single value that allow you to pass an aggregate value down a data frame. This is useful for performing calculations on data within your database.

    In the following exercise, you will construct a query that calculates the average number of goals per match in each country’s league.

  • In the subquery, select the average total goals by adding home_goal and away_goal.
  • Filter the results so that only the average of goals in the 2013/2014 season is calculated.
  • In the main query, select the average total goals by adding home_goal and away_goal. This calculates the average goals for each league.
  • Filter the results in the main query the same way you filtered the subquery. Group the query by the league name.
  • SELECT 
        l.name AS league,
        -- Select and round the league's total goals
        ROUND(AVG(m.home_goal + m.away_goal),2) AS avg_goals,
        -- Select and round the average total goals
        (SELECT ROUND(AVG(home_goal + away_goal),2) 
         FROM match
         WHERE season = '2013/2014') AS overall_avg
    FROM league AS l
    LEFT JOIN match AS m
    ON l.country_id = m.country_id
    -- Filter for the 2013/2014 season
    WHERE m.season = '2013/2014'
    GROUP BY l.name;
    Table 2.8: Displaying records 1 - 10
    league avg_goals overall_avg
    Belgium Jupiler League 2.50 2.77
    England Premier League 2.77 2.77
    France Ligue 1 2.46 2.77
    Germany 1. Bundesliga 3.16 2.77
    Italy Serie A 2.72 2.77
    Netherlands Eredivisie 3.20 2.77
    Poland Ekstraklasa 2.64 2.77
    Portugal Liga ZON Sagres 2.37 2.77
    Scotland Premier League 2.75 2.77
    Spain LIGA BBVA 2.75 2.77

    Good job! With a subquery in select, you can compare each league’s average to the overall average. Which leagues tend to score higher than average?

    2.3.2 Subqueries in Select for Calculations

    Subqueries in SELECT are a useful way to create calculated columns in a query. A subquery in SELECT can be treated as a single numeric value to use in your calculations. When writing queries in SELECT, it’s important to remember that filtering the main query does not filter the subquery – and vice versa.

    In the previous exercise, you created a column to compare each league’s average total goals to the overall average goals in the 2013/2014 season. In this exercise, you will add a column that directly compares these values by subtracting the overall average from the subquery.

  • Select the average goals scored in a match for each league in the main query.
  • Select the average goals scored in a match overall for the 2013/2014 season in the subquery.
  • Subtract the subquery from the average number of goals calculated for each league.
  • Filter the main query so that only games from the 2013/2014 season are included.
  • SELECT 
        -- Select the league name and average goals scored
        l.name AS league,
        ROUND(AVG(m.home_goal + m.away_goal),2) AS avg_goals,
        -- Subtract the overall average from the league average
        ROUND(AVG(m.home_goal + m.away_goal) - 
              (SELECT AVG(home_goal + away_goal)
               FROM match 
               WHERE season = '2013/2014'),2) AS diff
    FROM league AS l
    LEFT JOIN match AS m
    ON l.country_id = m.country_id
    -- Only include 2013/2014 results
    WHERE m.season = '2013/2014'
    GROUP BY l.name;
    Table 2.9: Displaying records 1 - 10
    league avg_goals diff
    Belgium Jupiler League 2.50 -0.27
    England Premier League 2.77 0.00
    France Ligue 1 2.46 -0.31
    Germany 1. Bundesliga 3.16 0.39
    Italy Serie A 2.72 -0.04
    Netherlands Eredivisie 3.20 0.43
    Poland Ekstraklasa 2.64 -0.13
    Portugal Liga ZON Sagres 2.37 -0.40
    Scotland Premier League 2.75 -0.02
    Spain LIGA BBVA 2.75 -0.02

    Good job! Games in the Netherlands tend to score the highest number of goals on average in this season. Why do you think that is?

    2.4 Subqueries best practices

    2.4.1 ALL the subqueries EVERYWHERE

    In soccer leagues, games are played at different stages. Winning teams progress from one stage to the next, until they reach the final stage. In each stage, the stakes become higher than the previous one. The match table includes data about the different stages that each match took place in.

    In this lesson, you will build a final query across 3 exercises that will contain three subqueries – one in the SELECT clause, one in the FROM clause, and one in the WHERE clause. In the final exercise, your query will extract data examining the average goals scored in each stage of a match. Does the average number of goals scored change as the stakes get higher from one stage to the next?

  • Extract the average number of home and away team goals in two SELECT subqueries.
  • Calculate the average home and away goals for the specific stage in the main query.
  • Filter both subqueries and the main query so that only data from the 2012/2013 season is included.
  • Group the query by the m.stage column.
  • SELECT 
        -- Select the stage and average goals for each stage
        m.stage,
        ROUND(AVG(m.home_goal + m.away_goal),2) AS avg_goals,
        -- Select the average overall goals for the 2012/2013 season
        ROUND((SELECT AVG(home_goal + away_goal) 
               FROM match 
               WHERE season = '2012/2013'),2) AS overall
    FROM match AS m
    -- Filter for the 2012/2013 season
    WHERE m.season = '2012/2013'
    -- Group by stage
    GROUP BY m.stage;
    Table 2.10: Displaying records 1 - 10
    stage avg_goals overall
    1 2.68 2.77
    2 2.65 2.77
    3 2.83 2.77
    4 2.80 2.77
    5 2.61 2.77
    6 2.78 2.77
    7 2.69 2.77
    8 3.09 2.77
    9 2.70 2.77
    10 2.96 2.77

    Excellent job! You will continue building on this data set in the next exercises.

    2.4.2 Add a subquery in FROM

    In the previous exercise, you created a data set listing the average home and away goals in each match stage of the 2012/2013 match season.

    In this next step, you will turn the main query into a subquery to extract a list of stages where the average home goals in a stage is higher than the overall average for home goals in a match.

  • Calculate the average home goals and average away goals from the match table for each stage in the FROM clause subquery.
  • Add a subquery to the WHERE clause that calculates the overall average home goals.
  • Filter the main query for stages where the average home goals is higher than the overall average.
  • Select the stage and avg_goals columns from the s subquery into the main query.
  • SELECT 
        -- Select the stage and average goals from the subquery
        s.stage,
        ROUND(s.avg_goals,2) AS avg_goals
    FROM 
        -- Select the stage and average goals in 2012/2013
        (SELECT
             stage,
             AVG(home_goal + away_goal) AS avg_goals
         FROM match
         WHERE season = '2012/2013'
         GROUP BY stage) AS s
    WHERE 
        -- Filter the main query using the subquery
        s.avg_goals > (SELECT AVG(home_goal + away_goal) 
                       FROM match WHERE season = '2012/2013');
    Table 2.11: Displaying records 1 - 10
    stage avg_goals
    3 2.83
    4 2.80
    6 2.78
    8 3.09
    10 2.96
    11 2.92
    12 3.23
    17 2.85
    20 2.96
    21 2.90

    Excellent job! Quite a few stages were eliminated with this filter. Do you notice anything interesting?

    2.4.3 Add a subquery in SELECT

    In the previous exercise, you added a subquery to the FROM statement and selected the stages where the number of average goals in a stage exceeded the overall average number of goals in the 2012/2013 match season. In this final step, you will add a subquery in SELECT to compare the average number of goals scored in each stage to the total.

  • Create a subquery in SELECT that yields the average goals scored in the 2012/2013 season. Name the new column overall_avg.
  • Create a subquery in FROM that calculates the average goals scored in each stage during the 2012/2013 season.
  • Filter the main query for stages where the average goals exceeds the overall average in 2012/2013.
  • SELECT 
        -- Select the stage and average goals from s
        s.stage,
        ROUND(s.avg_goals,2) AS avg_goal,
        -- Select the overall average for 2012/2013
        (SELECT AVG(home_goal + away_goal) FROM match WHERE season = '2012/2013') AS overall_avg
    FROM 
        -- Select the stage and average goals in 2012/2013 from match
        (SELECT
             stage,
             AVG(home_goal + away_goal) AS avg_goals
         FROM match
         WHERE season = '2012/2013'
         GROUP BY stage) AS s
    WHERE 
        -- Filter the main query using the subquery
        s.avg_goals > (SELECT AVG(home_goal + away_goal) 
                       FROM match WHERE season = '2012/2013');
    Table 2.12: Displaying records 1 - 10
    stage avg_goal overall_avg
    3 2.83 2.772699
    4 2.80 2.772699
    6 2.78 2.772699
    8 3.09 2.772699
    10 2.96 2.772699
    11 2.92 2.772699
    12 3.23 2.772699
    17 2.85 2.772699
    20 2.96 2.772699
    21 2.90 2.772699

    Amazing job! You used 3 subqueries to generate a list of stages that have higher than average goals scored in matches.

    3 Correlated Queries, Nested Queries, and Common Table Expressions

    In this chapter, you will learn how to use nested and correlated subqueries to extract more complex data from a relational database. You will also learn about common table expressions and how to best construct queries using multiple common table expressions.

    3.1 Correlated subqueries

    3.1.1 Basic Correlated Subqueries

    Correlated subqueries are subqueries that reference one or more columns in the main query. Correlated subqueries depend on information in the main query to run, and thus, cannot be executed on their own.

    Correlated subqueries are evaluated in SQL once per row of data retrieved – a process that takes a lot more computing power and time than a simple subquery.

    In this exercise, you will practice using correlated subqueries to examine matches with scores that are extreme outliers for each country – above 3 times the average score!

  • Select the country_id, date, home_goal, and away_goal columns in the main query.
  • Complete the AVG value in the subquery.
  • Complete the subquery column references, so that country_id is matched in the main and subquery.
  • SELECT 
        -- Select country ID, date, home, and away goals from match
        main.country_id,
        main.date,
        main.home_goal,
        main.away_goal
    FROM match AS main
    WHERE 
        -- Filter the main query by the subquery
        (home_goal + away_goal) > 
            (SELECT AVG((sub.home_goal + sub.away_goal) * 3)
             FROM match AS sub
             -- Join the main query to the subquery in WHERE
             WHERE main.country_id = sub.country_id);
    Table 3.1: Displaying records 1 - 10
    country_id date home_goal away_goal
    1 2022-10-29 4 5
    1729 2022-11-22 9 1
    1729 2022-01-16 7 2
    1729 2022-08-28 8 2
    1729 2022-12-29 7 3
    1729 2022-05-19 5 5
    1729 2022-12-14 6 3
    1729 2022-03-22 3 6
    1729 2022-08-30 3 6
    1729 2022-01-23 4 5

    Great job! Correlated subqueries take longer to produce results, but they often prevent you from having to create multiple subqueries.

    3.1.2 Correlated subquery with multiple conditions

    Correlated subqueries are useful for matching data across multiple columns. In the previous exercise, you generated a list of matches with extremely high scores for each country. In this exercise, you’re going to add an additional column for matching to answer the question – what was the highest scoring match for each country, in each season?

    *Note: this query may take a while to load.

  • Select the country_id, date, home_goal, and away_goal columns in the main query.
  • Complete the subquery: Select the matches with the highest number of total goals.
  • Match the subquery to the main query using country_id and season.
  • Fill in the correct logical operator so that total goals equals the max goals recorded in the subquery.
  • SELECT 
        -- Select country ID, date, home, and away goals from match
        main.country_id,
        main.date,
        main.home_goal,
        main.away_goal
    FROM match AS main
    WHERE 
        -- Filter for matches with the highest number of goals scored
        (home_goal + away_goal) =
            (SELECT MAX(sub.home_goal + sub.away_goal)
             FROM match AS sub
             WHERE main.country_id = sub.country_id
                   AND main.season = sub.season);
    Table 3.2: Displaying records 1 - 10
    country_id date home_goal away_goal
    1 2022-10-25 7 1
    1 2022-12-04 2 5
    1 2022-12-26 5 2
    1 2022-11-20 4 4
    1 2022-09-19 5 3
    1 2022-10-29 4 5
    1 2022-11-17 2 6
    1 2022-12-09 1 7
    1 2022-01-19 2 6
    1 2022-08-19 2 6

    Great job! Subqueries can be very useful for selecting data broken into multiple categories – though you may have to wait for the data to load!

    3.2 Nested subqueries

    3.2.1 Nested simple subqueries

    Nested subqueries can be either simple or correlated.

    Just like an unnested subquery, a nested subquery’s components can be executed independently of the outer query, while a correlated subquery requires both the outer and inner subquery to run and produce results.

    In this exercise, you will practice creating a nested subquery to examine the highest total number of goals in each season, overall, and during July across all seasons.

  • Complete the main query to select the season and the max total goals in a match for each season. Name this max_goals.
  • Complete the first simple subquery to select the max total goals in a match across all seasons. Name this overall_max_goals.
  • Complete the nested subquery to select the maximum total goals in a match played in July across all seasons.
  • Select the maximum total goals in the outer subquery. Name this entire subquery july_max_goals.
  • SELECT 
        -- Select the season and max goals scored in a match
        season,
        MAX(home_goal + away_goal) AS max_goals,
        -- Select the overall max goals scored in a match
       (SELECT MAX(home_goal + away_goal) FROM match) AS overall_max_goals,
        -- Select the max number of goals scored in any match in July
       (SELECT MAX(home_goal + away_goal) 
            FROM match
            WHERE id IN (
                  SELECT id FROM match WHERE cast(strftime('%m', date) as int) = 07)) AS july_max_goals
    FROM match
    GROUP BY season;
    Table 3.3: 8 records
    season max_goals overall_max_goals july_max_goals
    2008/2009 9 12 8
    2009/2010 12 12 8
    2010/2011 10 12 8
    2011/2012 10 12 8
    2012/2013 11 12 8
    2013/2014 10 12 8
    2014/2015 10 12 8
    2015/2016 12 12 8

    Good job! You just set up a nested subquery inside your main query to get some interesting data.

    3.2.2 Nest a subquery in FROM

    What’s the average number of matches per season where a team scored 5 or more goals? How does this differ by country?

    Let’s use a nested, correlated subquery to perform this operation. In the real world, you will probably find that nesting multiple subqueries is a task you don’t have to perform often. In some cases, however, you may find yourself struggling to properly group by the column you want, or to calculate information requiring multiple mathematical transformations (i.e., an AVG of a COUNT).

    Nesting subqueries and performing your transformations one step at a time, adding it to a subquery, and then performing the next set of transformations is often the easiest way to yield accurate information about your data. Let’s get to it!

  • Generate a list of matches where at least one team scored 5 or more goals.
  • -- Select matches where a team scored 5+ goals
    SELECT
        country_id,
        season,
        id
    FROM match
    WHERE home_goal >= 5 OR away_goal >= 5;
    Table 3.4: Displaying records 1 - 10
    country_id season id
    1 2008/2009 4
    1 2008/2009 55
    1 2008/2009 57
    1 2008/2009 79
    1 2008/2009 112
    1 2008/2009 124
    1 2008/2009 139
    1 2008/2009 215
    1 2008/2009 298
    1 2009/2010 369
  • Turn the query from the previous step into a subquery in the FROM statement.
  • COUNT the match ids generated in the previous step, and group the query by country_id and season.
  • -- Count match ids
    SELECT
        country_id,
        season,
        COUNT(id) AS matches
    -- Set up and alias the subquery
    FROM (
        SELECT
            country_id,
            season,
            id
        FROM match
        WHERE home_goal >= 5 OR away_goal >= 5) 
        AS subquery
    -- Group by country_id and season
    GROUP BY country_id, season;
    Table 3.5: Displaying records 1 - 10
    country_id season matches
    1 2008/2009 9
    1 2009/2010 5
    1 2010/2011 11
    1 2011/2012 11
    1 2012/2013 12
    1 2014/2015 11
    1 2015/2016 8
    1729 2008/2009 10
    1729 2009/2010 22
    1729 2010/2011 13
  • Finally, declare the same query from step 2 as a subquery in FROM with the alias outer_s.
  • Left join it to the country table using the outer query’s country_id column.
  • Calculate an AVG of high scoring matches per country in the main query.
  • SELECT
        c.name AS country,
        -- Calculate the average matches per season
        AVG(outer_s.matches) AS avg_seasonal_high_scores
    FROM country AS c
    -- Left join outer_s to country
    LEFT JOIN (
      SELECT country_id, season,
             COUNT(id) AS matches
      FROM (
        SELECT country_id, season, id
        FROM match
        WHERE home_goal >= 5 OR away_goal >= 5) AS inner_s
      -- Close parentheses and alias the subquery
      GROUP BY country_id, season) AS outer_s
    ON c.id = outer_s.country_id
    GROUP BY country;
    Table 3.6: Displaying records 1 - 10
    country avg_seasonal_high_scores
    Belgium 9.571429
    England 14.500000
    France 8.000000
    Germany 13.750000
    Italy 8.500000
    Netherlands 20.125000
    Poland 5.857143
    Portugal 8.625000
    Scotland 7.125000
    Spain 19.125000

    Great job!! This required quite a few steps, but you’ve got the hang of steps necessary to nest subqueries in FROM that you can apply to other parts of your query as well!

    3.3 Common Table Expressions

    3.3.1 Clean up with CTEs

    In chapter 2, you generated a list of countries and the number of matches in each country with more than 10 total goals. The query in that exercise utilized a subquery in the FROM statement in order to filter the matches before counting them in the main query. Below is the query you created:

    SELECT
      c.name AS country,
      COUNT(sub.id) AS matches
    FROM country AS c
    INNER JOIN (
      SELECT country_id, id 
      FROM match
      WHERE (home_goal + away_goal) >= 10) AS sub
    ON c.id = sub.country_id
    GROUP BY country;
    

    You can list one (or more) subqueries as common table expressions (CTEs) by declaring them ahead of your main query, which is an excellent tool for organizing information and placing it in a logical order.

    In this exercise, let’s rewrite a similar query using a CTE.

  • Complete the syntax to declare your CTE.
  • Select the country_id and match id from the match table in your CTE.
  • Left join the CTE to the league table using country_id.
  • -- Set up your CTE
    WITH match_list AS (
        SELECT 
            country_id, 
            id
        FROM match
        WHERE (home_goal + away_goal) >= 10)
    -- Select league and count of matches from the CTE
    SELECT
        l.name AS league,
        COUNT(match_list.id) AS matches
    FROM league AS l
    -- Join the CTE to the league table
    LEFT JOIN match_list 
    ON l.id = match_list.country_id
    GROUP BY l.name;
    Table 3.7: Displaying records 1 - 10
    league matches
    Belgium Jupiler League 0
    England Premier League 4
    France Ligue 1 1
    Germany 1. Bundesliga 1
    Italy Serie A 0
    Netherlands Eredivisie 2
    Poland Ekstraklasa 0
    Portugal Liga ZON Sagres 0
    Scotland Premier League 1
    Spain LIGA BBVA 5

    Nice job! You just rearranged a query to produce similar results to a subquery in FROM! Isn’t this much easier to read?

    3.3.2 Organizing with CTEs

    Previously, you modified a query based on a statement you completed in chapter 2 using common table expressions.

    This time, let’s expand on the exercise by looking at details about matches with very high scores using CTEs. Just like a subquery in FROM, you can join tables inside a CTE.

  • Declare your CTE, where you create a list of all matches with the league name.
  • Select the league, date, home, and away goals from the CTE.
  • Filter the main query for matches with 10 or more goals.
  • -- Set up your CTE
    WITH match_list AS (
      -- Select the league, date, home, and away goals
        SELECT 
            l.name AS league, 
            m.date, 
            m.home_goal, 
            m.away_goal,
           (m.home_goal + m.away_goal) AS total_goals
        FROM match AS m
        LEFT JOIN league as l ON m.country_id = l.id)
    -- Select the league, date, home, and away goals from the CTE
    SELECT league, date, home_goal, away_goal
    FROM match_list
    -- Filter by total goals
    WHERE total_goals >= 10;
    Table 3.8: Displaying records 1 - 10
    league date home_goal away_goal
    England Premier League 2022-11-22 9 1
    England Premier League 2022-08-28 8 2
    England Premier League 2022-12-29 7 3
    England Premier League 2022-05-19 5 5
    France Ligue 1 2022-11-08 5 5
    Germany 1. Bundesliga 2022-03-30 9 2
    Netherlands Eredivisie 2022-10-24 10 0
    Netherlands Eredivisie 2022-11-06 6 4
    Scotland Premier League 2022-05-05 6 6
    Spain LIGA BBVA 2022-10-30 7 3

    Nice job! CTEs are much easier to read than multiple subqueries!

    3.3.3 CTEs with nested subqueries

    If you find yourself listing multiple subqueries in the FROM clause with nested statement, your query will likely become long, complex, and difficult to read.

    Since many queries are written with the intention of being saved and re-run in the future, proper organization is key to a seamless workflow. Arranging subqueries as CTEs will save you time, space, and confusion in the long run!

  • Declare a CTE that calculates the total goals from matches in August of the 2013/2014 season.
  • Left join the CTE onto the league table using country_id from the match_list CTE.
  • Filter the list on the inner subquery to only select matches in August of the 2013/2014 season.
  • SELECT 
            country_id, 
           (home_goal + away_goal) AS goals
        FROM match
        -- Create a list of match IDs to filter data in the CTE
        WHERE id IN (
           SELECT id
           FROM match
           WHERE season = '2013/2014' AND cast(cast(strftime('%m', date) as int) as int) = 08)
    Table 3.9: Displaying records 1 - 10
    country_id goals
    1729 4
    1729 2
    1729 1
    1729 1
    1729 4
    1729 4
    1729 1
    1729 5
    1729 1
    1729 2
    -- Set up your CTE
    WITH match_list AS (
        SELECT 
            country_id, 
           (home_goal + away_goal) AS goals
        FROM match
        -- Create a list of match IDs to filter data in the CTE
        WHERE id IN (
           SELECT id
           FROM match
           WHERE season = '2013/2014' AND cast(cast(strftime('%m', date) as int) as int) = 08))
    -- Select the league name and average of goals in the CTE
    SELECT
        l.name,
        AVG(match_list.goals)
    FROM league AS l
    -- Join the CTE onto the league table
    LEFT JOIN match_list ON l.id = match_list.country_id
    GROUP BY l.name;
    Table 3.10: Displaying records 1 - 10
    name AVG(match_list.goals)
    Belgium Jupiler League NA
    England Premier League 2.000000
    France Ligue 1 2.027027
    Germany 1. Bundesliga 3.235294
    Italy Serie A 2.750000
    Netherlands Eredivisie 3.414634
    Poland Ekstraklasa 2.310345
    Portugal Liga ZON Sagres 3.000000
    Scotland Premier League 2.137931
    Spain LIGA BBVA 2.920000

    Fantastic work! Your CTE with a nested subquery is easy to read – and produces an interesting set of data!

    3.4 Deciding on techniques to use

    3.4.1 Get team names with a subquery

    Let’s solve a problem we’ve encountered a few times in this course so far – How do you get both the home and away team names into one final query result?

    Out of the 4 techniques we just discussed, this can be performed using subqueries, correlated subqueries, and CTEs. Let’s practice creating similar result sets using each of these 3 methods over the next 3 exercises, starting with subqueries in FROM.

  • Create a query that left joins team to match in order to get the identity of the home team. This becomes the subquery in the next step.
  • SELECT 
        m.id, 
        t.team_long_name AS hometeam
    -- Left join team to match
    FROM match AS m
    LEFT JOIN team as t
    ON m.hometeam_id = team_api_id;
    Table 3.11: Displaying records 1 - 10
    id hometeam
    1 KRC Genk
    2 SV Zulte-Waregem
    3 KSV Cercle Brugge
    4 KAA Gent
    5 FCV Dender EH
    6 KV Mechelen
    7 KSV Roeselare
    8 Tubize
    9 KVC Westerlo
    10 Club Brugge KV
    • Add a second subquery to the FROM statement to get the away team name, changing only the hometeam_id. Left join both subqueries to the match table on the id column.

    Warning: if your code is timing out, you have probably made a mistake in the JOIN and tried to join on the wrong fields which caused the table to be too big! Read the provided code and comments carefully, and check your ON conditions!

    SELECT
        m.date,
        -- Get the home and away team names
        hometeam,
        awayteam,
        m.home_goal,
        m.away_goal
    FROM match AS m
    
    -- Join the home subquery to the match table
    LEFT JOIN (
      SELECT match.id, team.team_long_name AS hometeam
      FROM match
      LEFT JOIN team
      ON match.hometeam_id = team.team_api_id) AS home
    ON home.id = m.id
    
    -- Join the away subquery to the match table
    LEFT JOIN (
      SELECT match.id, team.team_long_name AS awayteam
      FROM match
      LEFT JOIN team
      -- Get the away team ID in the subquery
      ON match.awayteam_id = team.team_api_id) AS away
    ON away.id = m.id;
    Table 3.12: Displaying records 1 - 10
    date hometeam awayteam home_goal away_goal
    2022-08-17 KRC Genk Beerschot AC 1 1
    2022-08-16 SV Zulte-Waregem Sporting Lokeren 0 0
    2022-08-16 KSV Cercle Brugge RSC Anderlecht 0 3
    2022-08-17 KAA Gent RAEC Mons 5 0
    2022-08-16 FCV Dender EH Standard de Liège 1 3
    2022-09-24 KV Mechelen Club Brugge KV 1 1
    2022-08-16 KSV Roeselare KV Kortrijk 2 2
    2022-08-16 Tubize Royal Excel Mouscron 1 2
    2022-08-16 KVC Westerlo Sporting Charleroi 1 0
    2022-11-01 Club Brugge KV KV Kortrijk 4 1

    Great job!! This is one clear way to generate a list of team names who played in a match. This is much easier to read than team ID numbers!

    3.4.2 Get team names with correlated subqueries

    Let’s solve the same problem using correlated subqueries – How do you get both the home and away team names into one final query result?

    This can easily be performed using correlated subqueries. But how might that impact the performance of your query? Complete the following steps and let’s find out!

    Please note that your query will run more slowly than the previous exercise!

  • Using a correlated subquery in the SELECT statement, match the team_api_id column from team to the hometeam_id from match.
  • SELECT
        m.date,
       (SELECT team_long_name
        FROM team AS t
        -- Connect the team to the match table
        WHERE t.team_api_id = m.hometeam_id) AS hometeam
    FROM match AS m;
    Table 3.13: Displaying records 1 - 10
    date hometeam
    2022-08-17 KRC Genk
    2022-08-16 SV Zulte-Waregem
    2022-08-16 KSV Cercle Brugge
    2022-08-17 KAA Gent
    2022-08-16 FCV Dender EH
    2022-09-24 KV Mechelen
    2022-08-16 KSV Roeselare
    2022-08-16 Tubize
    2022-08-16 KVC Westerlo
    2022-11-01 Club Brugge KV
  • Create a second correlated subquery in SELECT, yielding the away team’s name.
  • Select the home and away goal columns from match in the main query.
  • SELECT
        m.date,
       (SELECT team_long_name
        FROM team AS t
        WHERE t.team_api_id = m.hometeam_id) AS hometeam,
        -- Connect the team to the match table
       (SELECT team_long_name
        FROM team AS t
        WHERE t.team_api_id = m.awayteam_id) AS awayteam,
       -- Select home and away goals
        m.home_goal,
        m.away_goal
    FROM match AS m;
    Table 3.14: Displaying records 1 - 10
    date hometeam awayteam home_goal away_goal
    2022-08-17 KRC Genk Beerschot AC 1 1
    2022-08-16 SV Zulte-Waregem Sporting Lokeren 0 0
    2022-08-16 KSV Cercle Brugge RSC Anderlecht 0 3
    2022-08-17 KAA Gent RAEC Mons 5 0
    2022-08-16 FCV Dender EH Standard de Liège 1 3
    2022-09-24 KV Mechelen Club Brugge KV 1 1
    2022-08-16 KSV Roeselare KV Kortrijk 2 2
    2022-08-16 Tubize Royal Excel Mouscron 1 2
    2022-08-16 KVC Westerlo Sporting Charleroi 1 0
    2022-11-01 Club Brugge KV KV Kortrijk 4 1

    Fantastic work! The syntax here is much simpler, but if you’re working with a large dataset your query may take a while to run!

    3.4.3 Get team names with CTEs

    You’ve now explored two methods for answering the question, How do you get both the home and away team names into one final query result?

    Let’s explore the final method - common table expressions. Common table expressions are similar to the subquery method for generating results, mainly differing in syntax and the order in which information is processed.

  • Select id from match and team_long_name from team. Join these two tables together on hometeam_id in match and team_api_id in team.
  • SELECT 
        -- Select match id and team long name
        m.id, 
        t.team_long_name AS hometeam
    FROM match AS m
    -- Join team to match using team_api_id and hometeam_id
    LEFT JOIN team AS t 
    ON m.hometeam_id = t.team_api_id;
    Table 3.15: Displaying records 1 - 10
    id hometeam
    1 KRC Genk
    2 SV Zulte-Waregem
    3 KSV Cercle Brugge
    4 KAA Gent
    5 FCV Dender EH
    6 KV Mechelen
    7 KSV Roeselare
    8 Tubize
    9 KVC Westerlo
    10 Club Brugge KV
  • Declare the query from the previous step as a common table expression. SELECT everything from the CTE into the main query. Your results will not change at this step!
  • -- Declare the home CTE
    WITH home AS (
      SELECT m.id, t.team_long_name AS hometeam
      FROM match AS m
      LEFT JOIN team AS t 
      ON m.hometeam_id = t.team_api_id)
    -- Select everything from home
    SELECT *
    FROM home;
    Table 3.16: Displaying records 1 - 10
    id hometeam
    1 KRC Genk
    2 SV Zulte-Waregem
    3 KSV Cercle Brugge
    4 KAA Gent
    5 FCV Dender EH
    6 KV Mechelen
    7 KSV Roeselare
    8 Tubize
    9 KVC Westerlo
    10 Club Brugge KV
  • Let’s declare the second CTE, away. Join it to the first CTE on the id column.
  • The date, home_goal, and away_goal columns have been added to the CTEs. SELECT them into the main query.
  • WITH home AS (
      SELECT m.id, m.date, 
             t.team_long_name AS hometeam, m.home_goal
      FROM match AS m
      LEFT JOIN team AS t 
      ON m.hometeam_id = t.team_api_id),
    -- Declare and set up the away CTE
    away AS (
      SELECT m.id, m.date, 
             t.team_long_name AS awayteam, m.away_goal
      FROM match AS m
      LEFT JOIN team AS t 
      ON m.awayteam_id = t.team_api_id)
    -- Select date, home_goal, and away_goal
    SELECT 
        home.date,
        home.hometeam,
        away.awayteam,
        home.home_goal,
        away.away_goal
    -- Join away and home on the id column
    FROM home
    INNER JOIN away
    ON home.id = away.id;
    Table 3.17: Displaying records 1 - 10
    date hometeam awayteam home_goal away_goal
    2022-08-17 KRC Genk Beerschot AC 1 1
    2022-08-16 SV Zulte-Waregem Sporting Lokeren 0 0
    2022-08-16 KSV Cercle Brugge RSC Anderlecht 0 3
    2022-08-17 KAA Gent RAEC Mons 5 0
    2022-08-16 FCV Dender EH Standard de Liège 1 3
    2022-09-24 KV Mechelen Club Brugge KV 1 1
    2022-08-16 KSV Roeselare KV Kortrijk 2 2
    2022-08-16 Tubize Royal Excel Mouscron 1 2
    2022-08-16 KVC Westerlo Sporting Charleroi 1 0
    2022-11-01 Club Brugge KV KV Kortrijk 4 1

    And you’re done!! You now know three separate ways to manipulate and transform data to produce a complex query result! This is a great set of skills to have when working with complex relational databases!

    3.4.4 Which technique to use?

    The previous three exercises demonstrated that, in many cases, you can use multiple techniques in SQL to answer the same question.

    Based on what you learned, which of the following statements is false regarding differences in the use and performance of multiple/nested subqueries, correlated subqueries, and common table expressions?

  • Correlated subqueries can allow you to circumvent multiple, complex joins.
  • Common table expressions are declared first, improving query run time.
  • Correlated subqueries can reduce the length of your query, which improves query run time.
  • Multiple or nested subqueries are processed first, before your main query.
  • “Correct! Although correlated subqueries can reduce the length of your query, they also significantly increase query run time.”

    4 Window Functions

    You will learn about window functions and how to pass aggregate functions along a dataset. You will also learn how to calculate running totals and partitioned averages.

    4.1 It’s OVER

    4.1.1 The match is OVER

    The OVER() clause allows you to pass an aggregate function down a data set, similar to subqueries in SELECT. The OVER() clause offers significant benefits over subqueries in select – namely, your queries will run faster, and the OVER() clause has a wide range of additional functions and clauses you can include with it that we will cover later on in this chapter.

    In this exercise, you will revise some queries from previous chapters using the OVER() clause.

  • Select the match ID, country name, season, home, and away goals from the match and country tables.
  • Complete the query that calculates the average number of goals scored overall and then includes the aggregate value in each row using a window function.
  • SELECT 
        -- Select the id, country name, season, home, and away goals
        m.id,
        c.name AS country,
        m.season,
        m.home_goal,
        m.away_goal,
        -- Use a window to include the aggregate average in each row
        AVG(m.home_goal + m.away_goal) OVER() AS overall_avg
    FROM match AS m
    LEFT JOIN country AS c ON m.country_id = c.id;
    Table 4.1: Displaying records 1 - 10
    id country season home_goal away_goal overall_avg
    1 Belgium 2008/2009 1 1 2.705531
    2 Belgium 2008/2009 0 0 2.705531
    3 Belgium 2008/2009 0 3 2.705531
    4 Belgium 2008/2009 5 0 2.705531
    5 Belgium 2008/2009 1 3 2.705531
    6 Belgium 2008/2009 1 1 2.705531
    7 Belgium 2008/2009 2 2 2.705531
    8 Belgium 2008/2009 1 2 2.705531
    9 Belgium 2008/2009 1 0 2.705531
    10 Belgium 2008/2009 4 1 2.705531

    Good job! The basic window function with an OVER clause functions just like a subquery in SELECT.

    4.1.2 What’s OVER here?

    Window functions allow you to create a RANK of information according to any variable you want to use to sort your data. When setting this up, you will need to specify what column/calculation you want to use to calculate your rank. This is done by including an ORDER BY clause inside the OVER() clause. Below is an example:

    SELECT 
        id,
        RANK() OVER(ORDER BY home_goal) AS rank
    FROM match;
    

    In this exercise, you will create a data set of ranked matches according to which leagues, on average, score the most goals in a match.

  • Select the league name and average total goals scored from league and match.
  • Complete the window function so it calculates the rank of average goals scored across all leagues in the database.
  • Order the rank by the average total of home and away goals scored.
  • SELECT 
        -- Select the league name and average goals scored
        l.name AS league,
        AVG(m.home_goal + m.away_goal) AS avg_goals,
        -- Rank each league according to the average goals
        RANK() OVER(ORDER BY AVG(m.home_goal + m.away_goal)) AS league_rank
    FROM league AS l
    LEFT JOIN match AS m 
    ON l.id = m.country_id
    WHERE m.season = '2011/2012'
    GROUP BY l.name
    -- Order the query by the rank you created
    ORDER BY league_rank;
    Table 4.2: Displaying records 1 - 10
    league avg_goals league_rank
    Poland Ekstraklasa 2.195833 1
    France Ligue 1 2.515789 2
    Italy Serie A 2.583799 3
    Switzerland Super League 2.623457 4
    Scotland Premier League 2.635965 5
    Portugal Liga ZON Sagres 2.641667 6
    Spain LIGA BBVA 2.763158 7
    England Premier League 2.805263 8
    Germany 1. Bundesliga 2.859477 9
    Belgium Jupiler League 2.879167 10

    Good job! You’ll notice that the default order is from smallest to largest value – in this case, the Polish league tends to have the fewest goals scored on average.

    4.1.3 Flip OVER your results

    In the last exercise, the rank generated in your query was organized from smallest to largest. By adding DESC to your window function, you can create a rank sorted from largest to smallest.

    SELECT 
        id,
        RANK() OVER(ORDER BY home_goal DESC) AS rank
    FROM match;
    
  • Complete the same parts of the query as the previous exercise.
  • Complete the window function to rank each league from highest to lowest average goals scored.
  • Order the main query by the rank you just created.
  • SELECT 
        -- Select the league name and average goals scored
        l.name AS league,
        AVG(m.home_goal + m.away_goal) AS avg_goals,
        -- Rank leagues in descending order by average goals
        RANK() OVER(ORDER BY AVG(m.home_goal + m.away_goal) DESC) AS league_rank
    FROM league AS l
    LEFT JOIN match AS m 
    ON l.id = m.country_id
    WHERE m.season = '2011/2012'
    GROUP BY l.name
    -- Order the query by the rank you created
    ORDER BY league_rank;
    Table 4.3: Displaying records 1 - 10
    league avg_goals league_rank
    Netherlands Eredivisie 3.258170 1
    Belgium Jupiler League 2.879167 2
    Germany 1. Bundesliga 2.859477 3
    England Premier League 2.805263 4
    Spain LIGA BBVA 2.763158 5
    Portugal Liga ZON Sagres 2.641667 6
    Scotland Premier League 2.635965 7
    Switzerland Super League 2.623457 8
    Italy Serie A 2.583799 9
    France Ligue 1 2.515789 10

    Great job! Unlike a subquery in SELECT, your window function will apply the filter that you include in your WHERE clause.

    4.2 OVER with a PARTITION

    4.2.1 PARTITION BY a column

    The PARTITION BY clause allows you to calculate separate “windows” based on columns you want to divide your results. For example, you can create a single column that calculates an overall average of goals scored for each season.

    In this exercise, you will be creating a data set of games played by Legia Warszawa (Warsaw League), the top ranked team in Poland, and comparing their individual game performance to the overall average for that season.

    Where do you see more outliers? Are they Legia Warszawa’s home or away games?

  • Complete the two window functions that calculate the home and away goal averages. Partition the window functions by season to calculate separate averages for each season.
  • Filter the query to only include matches played by Legia Warszawa, id = 8673.
  • SELECT 
        date,
        season,
        home_goal,
        away_goal,
        CASE WHEN hometeam_id = 8673 THEN 'home' 
             ELSE 'away' END AS warsaw_location,
        -- Calculate the average goals scored partitioned by season
        AVG(home_goal) OVER(PARTITION BY season) AS season_homeavg,
        AVG(away_goal) OVER(PARTITION BY season) AS season_awayavg
    FROM match
    -- Filter the data set for Legia Warszawa matches only
    WHERE 
        hometeam_id = 8673 
        OR awayteam_id = 8673
    ORDER BY (home_goal + away_goal) DESC;
    Table 4.4: Displaying records 1 - 10
    date season home_goal away_goal warsaw_location season_homeavg season_awayavg
    2022-09-14 2013/2014 3 5 away 1.766667 1.233333
    2022-10-24 2009/2010 5 2 home 1.233333 0.700000
    2022-05-25 2010/2011 2 5 away 1.633333 1.133333
    2022-09-13 2014/2015 4 3 home 1.566667 1.333333
    2022-02-25 2010/2011 3 3 away 1.633333 1.133333
    2022-07-20 2013/2014 5 1 home 1.766667 1.233333
    2022-11-28 2008/2009 2 3 away 1.600000 0.700000
    2022-05-30 2008/2009 4 1 home 1.600000 0.700000
    2022-03-26 2009/2010 2 3 away 1.233333 0.700000
    2022-10-22 2010/2011 1 4 away 1.633333 1.133333

    Good job! You’ve generated an ordered data set with window functions to compare each game’s score to averages for that season.

    4.2.2 PARTITION BY multiple columns

    The PARTITION BY clause can be used to break out window averages by multiple data points (columns). You can even calculate the information you want to use to partition your data! For example, you can calculate average goals scored by season and by country, or by the calendar year (taken from the date column).

    In this exercise, you will calculate the average number home and away goals scored Legia Warszawa, and their opponents, partitioned by the month in each season.

  • Construct two window functions partitioning the average of home and away goals by season and month.
  • Filter the dataset by Legia Warszawa’s team ID (8673) so that the window calculation only includes matches involving them.
  • SELECT 
        date,
        season,
        home_goal,
        away_goal,
        CASE WHEN hometeam_id = 8673 THEN 'home' 
             ELSE 'away' END AS warsaw_location,
        -- Calculate average goals partitioned by season and month
        AVG(home_goal) OVER(PARTITION BY season, 
                cast(strftime('%m', date) as int)) AS season_mo_home,
        AVG(away_goal) OVER(PARTITION BY season, 
                cast(strftime('%m', date) as int)) AS season_mo_away
    FROM match
    WHERE 
        hometeam_id = 8673 
        OR awayteam_id = 8673
    ORDER BY (home_goal + away_goal) DESC;
    Table 4.5: Displaying records 1 - 10
    date season home_goal away_goal warsaw_location season_mo_home season_mo_away
    2022-09-14 2013/2014 3 5 away 2.250000 2.5000000
    2022-10-24 2009/2010 5 2 home 2.500000 0.7500000
    2022-05-25 2010/2011 2 5 away 2.000000 1.1666667
    2022-09-13 2014/2015 4 3 home 2.000000 2.6666667
    2022-02-25 2010/2011 3 3 away 3.000000 3.0000000
    2022-07-20 2013/2014 5 1 home 2.500000 2.0000000
    2022-05-30 2008/2009 4 1 home 2.000000 0.6000000
    2022-11-28 2008/2009 2 3 away 1.833333 0.6666667
    2022-03-26 2009/2010 2 3 away 1.000000 1.2500000
    2022-04-02 2010/2011 2 3 home 1.600000 1.2000000

    Excellent! You successfully partitioned your averages by each month in the season. In the next lesson, we’ll expand on the usefulness of information generated with window functions.

    4.3 Sliding windows

    4.3.1 Slide to the left

    Sliding windows allow you to create running calculations between any two points in a window using functions such as PRECEDING, FOLLOWING, and CURRENT ROW. You can calculate running counts, sums, averages, and other aggregate functions between any two points you specify in the data set.

    In this exercise, you will expand on the examples discussed in the video, calculating the running total of goals scored by the FC Utrecht when they were the home team during the 2011/2012 season. Do they score more goals at the end of the season as the home or away team?

  • Complete the window function by:
    • Assessing the running total of home goals scored by FC Utrecht.
    • Assessing the running average of home goals scored.
    • Ordering both the running average and running total by date.
  • SELECT 
        date,
        home_goal,
        away_goal,
        -- Create a running total and running average of home goals
        SUM(home_goal) OVER(ORDER BY date 
            ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_total,
        AVG(home_goal) OVER(ORDER BY date 
            ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_avg
    FROM match
    WHERE 
        hometeam_id = 9908 
        AND season = '2011/2012';
    Table 4.6: Displaying records 1 - 10
    date home_goal away_goal running_total running_avg
    2022-01-22 1 1 1 1.000000
    2022-02-12 1 1 2 1.000000
    2022-02-19 3 0 5 1.666667
    2022-03-04 0 0 5 1.250000
    2022-03-18 3 1 8 1.600000
    2022-03-30 3 2 11 1.833333
    2022-04-15 4 2 15 2.142857
    2022-04-28 1 3 16 2.000000
    2022-05-02 2 2 18 2.000000
    2022-08-14 2 2 20 2.000000

    Good job! Sliding windows allow for a wide variety of calculations with your database.

    4.3.2 Slide to the right

    Now let’s see how FC Utrecht performs when they’re the away team. You’ll notice that the total for the season is at the bottom of the data set you queried. Depending on your results, this could be pretty long, and scrolling down is not very helpful.

    In this exercise, you will slightly modify the query from the previous exercise by sorting the data set in reverse order and calculating a backward running total from the CURRENT ROW to the end of the data set (earliest record).

  • Complete the window function by:
    • Assessing the running total of home goals scored by FC Utrecht.
    • Assessing the running average of home goals scored.
    • Ordering both the running average and running total by date, descending.
  • SELECT 
        -- Select the date, home goal, and away goals
        date,
        home_goal,
        away_goal,
        -- Create a running total and running average of home goals
        SUM(home_goal) OVER(ORDER BY date DESC
            ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS running_total,
        AVG(home_goal) OVER(ORDER BY date DESC
            ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS running_avg
    FROM match
    WHERE 
        awayteam_id = 9908 
        AND season = '2011/2012';
    Table 4.7: Displaying records 1 - 10
    date home_goal away_goal running_total running_avg
    2022-12-17 1 0 25 1.470588
    2022-11-25 2 0 24 1.500000
    2022-11-20 2 2 22 1.466667
    2022-10-30 3 1 20 1.428571
    2022-10-15 1 0 17 1.307692
    2022-09-24 1 0 16 1.333333
    2022-09-11 2 3 15 1.363636
    2022-08-20 2 1 13 1.300000
    2022-08-06 0 0 11 1.222222
    2022-05-06 1 3 11 1.375000

    Good job! You can reverse your calculations with your window functions to best suit your needs in an analysis.

    4.4 Bringing it all together

    4.4.1 Setting up the home team CTE

    In this course, we’ve covered ways in which you can use CASE statements, subqueries, common table expressions, and window functions in your queries to structure a data set that best meets your needs. For this exercise, you will be using all of these concepts to generate a list of matches in which Manchester United was defeated during the 2014/2015 English Premier League season.

    Your first task is to create the first query that filters for matches where Manchester United played as the home team. This will become a common table expression in a later exercise.

  • Create a CASE statement that identifies each match as a win, lose, or tie for Manchester United.
  • Fill out the logical operators for each WHEN clause in the CASE statement (equals, greater than, less than).
  • Join the tables on home team ID from match, and team_api_id from team.
  • Filter the query to only include games from the 2014/2015 season where Manchester United was the home team.
  • SELECT 
        m.id, 
        t.team_long_name,
        -- Identify matches as home/away wins or ties
        CASE WHEN m.home_goal > m.away_goal THEN 'MU Win'
             WHEN m.home_goal < m.away_goal THEN 'MU Loss' 
             ELSE 'Tie' END AS outcome
    FROM match AS m
    -- Left join team on the home team ID and team API id
    LEFT JOIN team AS t 
    ON m.hometeam_id = t.team_api_id
    WHERE 
        -- Filter for 2014/2015 and Manchester United as the home team
        m.season = '2014/2015'
        AND t.team_long_name = 'Manchester United';
    Table 4.8: Displaying records 1 - 10
    id team_long_name outcome
    4013 Manchester United MU Loss
    4031 Manchester United MU Win
    4051 Manchester United MU Win
    4062 Manchester United MU Win
    4085 Manchester United MU Win
    4105 Manchester United MU Win
    4145 Manchester United MU Loss
    4164 Manchester United MU Win
    4181 Manchester United MU Win
    4203 Manchester United MU Win

    Good job! This query you wrote will become the first common table expression.

    4.4.2 Setting up the away team CTE

    Great job! Now that you have a query identifying the home team in a match, you will perform a similar set of steps to identify the away team. Just like the previous step, you will join the match and team tables. Each of these two queries will be declared as a Common Table Expression in the following step.

    The primary difference in this query is that you will be joining the tables on awayteam_id, and reversing the match outcomes in the CASE statement.

    When altering CASE statement logic in your own work, you can reverse either the logical condition (i.e., home_goal > away_goal) or the outcome in THEN – just make sure you only reverse one of the two!

  • Complete the CASE statement syntax.
  • Fill out the logical operators identifying each match as a win, loss, or tie for Manchester United.
  • Join the table on awayteam_id, and team_api_id.
  • SELECT 
        m.id, 
        t.team_long_name,
        -- Identify matches as home/away wins or ties
        CASE WHEN m.home_goal > m.away_goal THEN 'MU Loss'
             WHEN m.home_goal < m.away_goal THEN 'MU Win' 
             ELSE 'Tie' END AS outcome
    -- Join team table to the match table
    FROM match AS m
    LEFT JOIN team AS t 
    ON m.awayteam_id = t.team_api_id
    WHERE 
        -- Filter for 2014/2015 and Manchester United as the away team
        m.season = '2014/2015'
        AND t.team_long_name = 'Manchester United';
    Table 4.9: Displaying records 1 - 10
    id team_long_name outcome
    4026 Manchester United MU Loss
    4039 Manchester United MU Win
    4075 Manchester United MU Win
    4089 Manchester United Tie
    4117 Manchester United Tie
    4126 Manchester United Tie
    4136 Manchester United Tie
    4155 Manchester United MU Win
    4178 Manchester United Tie
    4197 Manchester United MU Loss

    Great job! You correctly constructed a CASE statement to accurately identify matches as a win or loss for Manchester United. This will become your second common table expression.

    4.4.3 Putting the CTEs together

    Now that you’ve created the two subqueries identifying the home and away team opponents, it’s time to rearrange your query with the home and away subqueries as Common Table Expressions (CTEs). You’ll notice that the main query includes the phrase, SELECT DISTINCT. Without identifying only DISTINCT matches, you will return a duplicate record for each game played.

    Continue building the query to extract all matches played by Manchester United in the 2014/2015 season.

  • Declare the home and away CTEs before your main query.
  • Join your CTEs to the match table using a LEFT JOIN.
  • Select the relevant data from the CTEs into the main query.
  • Select the date from match, team names from the CTEs, and home/ away goals from match in the main query.
  • -- Set up the home team CTE
    WITH home AS (
      SELECT m.id, t.team_long_name,
          CASE WHEN m.home_goal > m.away_goal THEN 'MU Win'
               WHEN m.home_goal < m.away_goal THEN 'MU Loss' 
               ELSE 'Tie' END AS outcome
      FROM match AS m
      LEFT JOIN team AS t ON m.hometeam_id = t.team_api_id),
    -- Set up the away team CTE
    away AS (
      SELECT m.id, t.team_long_name,
          CASE WHEN m.home_goal > m.away_goal THEN 'MU Loss'
               WHEN m.home_goal < m.away_goal THEN 'MU Win' 
               ELSE 'Tie' END AS outcome
      FROM match AS m
      LEFT JOIN team AS t ON m.awayteam_id = t.team_api_id)
    -- Select team names, the date and goals
    SELECT DISTINCT
        m.date,
        home.team_long_name AS home_team,
        away.team_long_name AS away_team,
        m.home_goal, m.away_goal
    -- Join the CTEs onto the match table
    FROM match AS m
    LEFT JOIN home ON m.id = home.id
    LEFT JOIN away ON m.id = away.id
    WHERE m.season = '2014/2015'
          AND (home.team_long_name = 'Manchester United' 
               OR away.team_long_name = 'Manchester United');
    Table 4.10: Displaying records 1 - 10
    date home_team away_team home_goal away_goal
    2022-08-16 Manchester United Swansea City 1 2
    2022-11-02 Manchester City Manchester United 1 0
    2022-11-08 Manchester United Crystal Palace 1 0
    2022-11-22 Arsenal Manchester United 1 2
    2022-11-29 Manchester United Hull City 3 0
    2022-12-02 Manchester United Stoke City 2 1
    2022-12-08 Southampton Manchester United 1 2
    2022-12-14 Manchester United Liverpool 3 0
    2022-12-20 Aston Villa Manchester United 1 1
    2022-12-26 Manchester United Newcastle United 3 1

    Fantastic! You wrote a query that extracted data from two common table expressions. You can scale up the number of CTEs as necessary, depending on your data.

    4.4.4 Add a window function

    Fantastic! You now have a result set that retrieves the match date, home team, away team, and the goals scored by each team. You have one final component of the question left – how badly did Manchester United lose in each match?

    In order to determine this, let’s add a window function to the main query that ranks matches by the absolute value of the difference between home_goal and away_goal. This allows us to directly compare the difference in scores without having to consider whether Manchester United played as the home or away team!

    The equation is complete for you – all you need to do is properly complete the window function!

  • Set up the CTEs so that the home and away teams each have a name, ID, and score associated with them.
  • Select the date, home team name, away team name, home goal, and away goals scored in the main query.
  • Rank the matches and order by the difference in scores in descending order.
  • -- Set up the home team CTE
    WITH home AS (
      SELECT m.id, t.team_long_name,
          CASE WHEN m.home_goal > m.away_goal THEN 'MU Win'
               WHEN m.home_goal < m.away_goal THEN 'MU Loss' 
               ELSE 'Tie' END AS outcome
      FROM match AS m
      LEFT JOIN team AS t ON m.hometeam_id = t.team_api_id),
    -- Set up the away team CTE
    away AS (
      SELECT m.id, t.team_long_name,
          CASE WHEN m.home_goal > m.away_goal THEN 'MU Loss'
               WHEN m.home_goal < m.away_goal THEN 'MU Win' 
               ELSE 'Tie' END AS outcome
      FROM match AS m
      LEFT JOIN team AS t ON m.awayteam_id = t.team_api_id)
    -- Select columns and and rank the matches by goal difference
    SELECT DISTINCT
        m.date,
        home.team_long_name AS home_team,
        away.team_long_name AS away_team,
        m.home_goal, m.away_goal,
        RANK() OVER(ORDER BY ABS(home_goal - away_goal) DESC) as match_rank
    -- Join the CTEs onto the match table
    FROM match AS m
    LEFT JOIN home ON m.id = home.id
    LEFT JOIN AWAY ON m.id = away.id
    WHERE m.season = '2014/2015'
          AND ((home.team_long_name = 'Manchester United' AND home.outcome = 'MU Loss')
          OR (away.team_long_name = 'Manchester United' AND away.outcome = 'MU Loss'));
    Table 4.11: 8 records
    date home_team away_team home_goal away_goal match_rank
    2022-04-26 Everton Manchester United 3 0 1
    2022-09-21 Leicester City Manchester United 5 3 2
    2022-08-16 Manchester United Swansea City 1 2 3
    2022-11-02 Manchester City Manchester United 1 0 3
    2022-01-11 Manchester United Southampton 0 1 3
    2022-02-21 Swansea City Manchester United 2 1 3
    2022-04-18 Chelsea Manchester United 1 0 3
    2022-05-02 Manchester United West Bromwich Albion 0 1 3

    Awesome! This is an interesting list of matches. You can easily modify this query for any team of your choice. Finally, if you liked this course, let your instructor know by tweeting them at @mona kay !