Data Management in SQL

Leonid Shpaner

Databases

Database Description
FaceX Fabricated dataset composed of a variety of tables meant to mimic the membership profiles of users on a social networking site like Facebook.
Billboard This is a small dataset consisting of Billboard rankings of songs, information about artists, their songs, and an extract of their tweets from Twitter. This data can be used to answer different kinds of questions related to the music industry.
Movies This is a small dataset that comprises data on gross revenue collection for a few movies and Oscar nominations. It can be used to answer different kinds of questions regarding filtering, summarizing, and joining of data tables.

Exploring Data

Code: Extracting Data Using SELECT DISTINCT

Database: FaceX

  1. Write the query to get the unique concentrations from the concentration table (Using the concentration table, your output table should have one column: Concentration).

  2. First, let’s query the entire table to get a better visual sense of what is in it:

    Click here to view the entire concentration table.

    1
    2
    3
    4
    SELECT
        DISTINCT Concentration
    FROM
        concentration
    

    Result: Click here to view the table for the executed query.






















  3. Modify your query above to sort the output in increasing alphabetical order (Using the concentration table, your output table should have one column: Concentration).
  4. 1
    2
    3
    4
    5
    6
    SELECT
        DISTINCT Concentration
    FROM
        concentration
    ORDER BY
        Concentration;
    

    Result: Click here to view the table for the executed query.






















  5. Write a query to get all the distinct combinations of Status, Sex, and Political Views from the profiles table in the FaceX database (Using the profiles table, your output table should have three columns: Status, Sex, and PoliticalViews).

  6. First, let’s query the entire table to get a better visual sense of what is in it:

    Click here to view the entire profiles table.

    1
    2
    3
    4
    5
    6
    SELECT
        DISTINCT Status,
        Sex,
        PoliticalViews
    FROM
        profiles;
    

    Result: Click here to view the table for the executed query.






















  7. Modify your query above to sort the output by Political Views, Status, and Sex (Using the profiles table, your output table should have three columns: Status, Sex, and PoliticalViews).
  8.  1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    SELECT
        DISTINCT Status,
        Sex,
        PoliticalViews
    FROM
        profiles
    ORDER BY
        PoliticalViews,
        Status,
        Sex;
    

    Result: Click here to view the table for the executed query.























Module 1 Assignment: Extracting Information From Raw Data

Database: Billboard


Instructions:

In this module assignment, you will apply concepts covered over the course of this module to a series of questions involving extraction. Using the Billboard dataset, respond to each question.

  1. Write a query to print the song titles from the Songs table and limit your output to 5 rows only (Using the Songs table, your output table should have one column: Song).

  2. First, let’s query the entire table to get a better visual sense of what is in it:

    Click here to view the entire songs table.

    1
    2
    3
    4
    5
    6
    SELECT
        Song
    FROM
        Songs
    LIMIT
        5;
    

    Result: Click here to view the table from the query.























  3. Write a query to print the names of all artists from the Songs table. Your query should not contain any other columns from this table (Using the Songs table, your output table should have one column: Artist).
  4. 1
    2
    3
    4
    SELECT
        Artist
    FROM
        Songs;
    

    Result: Click here to view the table from the query.
























  5. Modify the query above so that the names of artists do not repeat. Your query output should contain the name of each distinct artist only once (Using the Songs table, your output table should have one column: Artist).
  6. 1
    2
    3
    4
    SELECT
        DISTINCT Artist
    FROM
        Songs;
    

    Result: Click here to view the table from the query.























  7. Write an aggregation query to count the total number of rows in the Artists table (Using the Artists table, your output table should have one column: count()*).

  8. First, let’s query the entire table to get a better visual sense of what is in it:

    Click here to view the entire artists table.

    1
    2
    3
    4
    SELECT
        COUNT(*)
    FROM
        Artists;
    






















Code: Filter Data Using the WHERE Statement Part 1

Database: FaceX


Activity Questions:

  1. Write a query to print the names of all FaceX members in the profiles table who were born in the year 1985 (Using the profiles table, your output table should have one column: Name).

  2. First, let’s query the entire table to get a better visual sense of what is in it:

    Click here to view the entire profiles table.

    1
    2
    3
    4
    5
    6
    SELECT
        Name
    FROM
        profiles
    WHERE
        Birthday = 1985;
    

    Result: Click here to view the table from the query.























  3. Write a query to print the ProfileIds of all FaceX members who list “Harry Potter” as a favorite book in the favoritebooks table (Using the favoritebooks table, your output table should have one column: ProfileID).

  4. First, let’s query the entire table to get a better visual sense of what is in it:

    Click here to view the entire favoritebooks table.

    1
    2
    3
    4
    5
    6
    SELECT
        ProfileID
    FROM
        favoritebooks
    WHERE
        FavoriteBook = "Harry Potter";
    

    Result: Click here to view the table from the query.























Code: Filter Data Using the WHERE Statement Part 2

Database: FaceX


Activity Questions:

  1. Using the favoritemovies table in the FaceX dataset, write a query to output all of the favorite movies (repeats are fine) that include “harry” anywhere in the title. Your query should not be case-sensitive, i.e., its output should include all movies regardless of whether their names include ‘Harry’, ‘harry’, ‘HARRY’, etc (Using the favoritemovies table, your output table should have one column: FavoriteMovie).

  2. First, let’s query the entire table to get a better visual sense of what is in it:

    Click here to view the entire favoritemovies table.

    1
    2
    3
    4
    5
    6
    SELECT
        FavoriteMovie
    FROM
        favoritemovies
    WHERE
        FavoriteMovie LIKE '%harry%';
    

    Result: Click here to view the table from the query.






















  3. Write a query on the profiles table in the FaceX database to output the names of all users whose name starts in ‘David’. Your query should output only the names of the users (Using the profiles table, your output table should have one column: Name).

  4. First, let’s query the entire table to get a better visual sense of what is in it:

    Click here to view the entire profiles table.

    1
    2
    3
    4
    5
    6
    SELECT
        Name
    FROM
        profiles
    WHERE
        Name LIKE 'David %';
    

    Result: Click here to view the table from the query.






















  5. Write a query on the favoritemovies table to output all the records where ‘star war’ is included anywhere in the FavoriteMovie column. Your query should not be case-sensitive (Using the favoritemovies table, your output table should include all columns from the original table).

  6. First, let’s query the entire table to get a better visual sense of what is in it:

    Click here to view the entire favoritemovies table

    1
    2
    3
    4
    5
    6
    SELECT
      *
    FROM
      favoritemovies
    WHERE
      FavoriteMovie LIKE '%star war%';
    

    Result: Click here to view the table from the query.






















Code: Combine Filtration with Aggregation Part 1

Database: FaceX


Activity Questions:

  1. Write a query on the concentration table in the FaceX dataset to count the number of users whose concentration is ‘Computer Science’ (Using the concentration table, your output table should have one column: count()*).

  2. First, let’s query the entire table to get a better visual sense of what is in it:

    Click here to view the entire concentration table.

    1
    2
    3
    4
    5
    6
    SELECT
        COUNT(*)
    FROM
        concentration
    WHERE
        Concentration = 'Computer Science';
    






















  3. Write a query on the profiles table to count the number of FaceX users who are ‘Female’ and whose birthday is 1985 and whose political views are ‘Conservative’ (Using the profiles table, your output table should have one column: count()*).

  4. First, let’s query the entire table to get a better visual sense of what is in it:

    Click here to view the entire profiles table.

    1
    2
    3
    4
    5
    6
    7
    8
    SELECT
        COUNT(*)
    FROM
        profiles
    WHERE
        Sex = 'Female'
        AND Birthday = '1985'
        AND PoliticalViews = 'Conservative';
    






















  5. Write a query on the profiles table to count the number of FaceX users whose Sex is ‘Male’ and who were not born in 1985 or 1986. Use ‘<>’ in the conditions for birthday (Using the profiles table, your output table should have one column: count()*).
  6. 1
    2
    3
    4
    5
    6
    7
    8
    SELECT
        COUNT(*)
    FROM
        profiles
    WHERE
        Sex = 'Male'
        AND Birthday <> '1985'
        AND Birthday <> '1986';
    






















Module 2 Assignment: Filtering Relevant Data

Database: Billboard


Instructions:

In this module assignment, you will apply concepts covered over the course of this module to a series of questions involving filtration. Using the Billboard dataset, respond to each question.

  1. Write a query on the Songs table to print Taylor Swift’s songs along with the name of the Artist (Using the Songs table, your output table should have two columns: Song and Artist).

  2. First, let’s query the entire table to get a better visual sense of what is in it:

    Click here to view the entire songs table.

    1
    2
    3
    4
    5
    6
    7
    SELECT
        Song,
        Artist
    FROM
        Songs
    WHERE
        Artist = 'Taylor Swift';
    






















  3. Write a query on the Rankings table to print all songs that made it into the top 3 (Using the Rankings table, your output table should have one column: Song).

  4. First, let’s query the entire table to get a better visual sense of what is in it:

    Click here to view the entire rankings table.

    1
    2
    3
    4
    5
    6
    SELECT
        Song
    FROM
        Rankings
    WHERE
        Rank <= 3;
    

    Result: Click here to view the table from the query.






















  5. Note that the output of the query in Question 2 contains repetitions when a song occurs more than once in the top 3. Modify the query so that each such song is listed only once (Using the Rankings table, your output table should have one column: Song).
  6. 1
    2
    3
    4
    5
    6
    SELECT
        DISTINCT Song
    FROM
        Rankings
    WHERE
        Rank <= 3;
    

    Result: Click here to view the table from the query.






















  7. Let’s switch to the tweets table. Write a query to output all the tweets by Bruno Mars (Using the Tweets table, your output table should include all columns from the original table).

  8. First, let’s query the entire table to get a better visual sense of what is in it:

    Click here to view the entire tweets table.

    1
    2
    3
    4
    5
    6
    SELECT
        *
    FROM
        Tweets
    WHERE
        user_screen_name = 'BrunoMars';
    

    Result: Click here to view the table from the query.






















  9. Modify the above query to compute the average number of retweet counts for Bruno Mars (Using the Tweets table, your output table should have one column: avg(retweet_count)).
  10. 1
    2
    3
    4
    5
    6
    SELECT
        AVG(retweet_count)
    FROM
        Tweets
    WHERE
        user_screen_name = 'BrunoMars';
    






















  11. It is thought that tweets that contain a hashtag get retweeted more than those that don’t. To test this hypothesis, modify the query in Question 5 to compute the average of retweet_count for only those tweets of Bruno Mars that contain a ‘#’ anywhere in the text of the tweet (Using the Tweets table, your output table should have one column: avg(retweet_count)).
  12. 1
    2
    3
    4
    5
    6
    7
    SELECT
        AVG(retweet_count)
    FROM
        Tweets
    WHERE
        user_screen_name = 'BrunoMars'
        AND text LIKE '%#%';
    






















Code: Combine Filtration with Aggregation Part 2

Database: Movies


Activity Questions:

  1. List all distinct persons in the OscarNoms table in the MOVIES database who received a nomination for the movie Boyhood (Using the OscarNoms table, your output table should have one column: Person).

  2. First, let’s query the entire table to get a better visual sense of what is in it:

    Click here to view the entire OscarNoms table.

    1
    2
    3
    4
    5
    6
    SELECT
      DISTINCT Person
    FROM
      OscarNoms
    WHERE
      Movie = 'Boyhood';
    






















  3. Modify the above query to find the number of nominations for each person in the OscarNoms table. Each such person’s name should appear only once in your list (Using the OscarNoms table, your output table should have two columns: Person and freq).
  4. 1
    2
    3
    4
    5
    6
    7
    Select
      Person,
      COUNT(*) AS TotalNominations
    FROM
      OscarNoms
    GROUP BY
      Person;
    

    Result: Click here to view the table from the query.