Subsetting data: Exercises

# install.packages("tidyverse")
# install.packages("here")

library(tidyverse)
library(here)

## Load the data
characters <- readRDS(file = here::here("raw_data", "characters.rds"))
psych_stats <- read.csv(
  file = here::here("raw_data", "psych_stats.csv"),
  sep = ";"
)
str(characters)
'data.frame':   889 obs. of  7 variables:
 $ id        : chr  "F2" "F1" "F5" "F4" ...
 $ name      : chr  "Monica Geller" "Rachel Green" "Chandler Bing" "Joey Tribbiani" ...
 $ uni_id    : chr  "F" "F" "F" "F" ...
 $ uni_name  : chr  "Friends" "Friends" "Friends" "Friends" ...
 $ notability: num  79.7 76.7 74.4 74.3 72.6 51.6 86.5 84.2 82.6 65.6 ...
 $ link      : chr  "https://openpsychometrics.org/tests/characters/stats/F/2" "https://openpsychometrics.org/tests/characters/stats/F/1" "https://openpsychometrics.org/tests/characters/stats/F/5" "https://openpsychometrics.org/tests/characters/stats/F/4" ...
 $ image_link: chr  "https://openpsychometrics.org/tests/characters/test-resources/pics/F/2.jpg" "https://openpsychometrics.org/tests/characters/test-resources/pics/F/1.jpg" "https://openpsychometrics.org/tests/characters/test-resources/pics/F/5.jpg" "https://openpsychometrics.org/tests/characters/test-resources/pics/F/4.jpg" ...

Because subsetting data is such a basic skill, it will come up multiple times during this workshop. Here are some first exercises to get you started.

Exercise 1

Correct the following code, so only the first 10 rows and the last three columns are extracted:

characters[4:6, 10]

We have to target the rows we want to extract before the ,, the columns after.

characters[1:10, 4:6]
   uni_name notability
1   Friends       79.7
2   Friends       76.7
3   Friends       74.4
4   Friends       74.3
5   Friends       72.6
6   Friends       51.6
7  Euphoria       86.5
8  Euphoria       84.2
9  Euphoria       82.6
10 Euphoria       65.6
                                                        link
1   https://openpsychometrics.org/tests/characters/stats/F/2
2   https://openpsychometrics.org/tests/characters/stats/F/1
3   https://openpsychometrics.org/tests/characters/stats/F/5
4   https://openpsychometrics.org/tests/characters/stats/F/4
5   https://openpsychometrics.org/tests/characters/stats/F/3
6   https://openpsychometrics.org/tests/characters/stats/F/6
7  https://openpsychometrics.org/tests/characters/stats/EU/1
8  https://openpsychometrics.org/tests/characters/stats/EU/2
9  https://openpsychometrics.org/tests/characters/stats/EU/6
10 https://openpsychometrics.org/tests/characters/stats/EU/3

Exercise 2

  1. Why does the following code not work? Correct it in your own script.
characters[uni_name == "Friends", ]

You need to extract the column from the data frame with $ before you can compare it to the string.

characters[characters$uni_name == "Friends", ]
  id           name uni_id uni_name notability
1 F2  Monica Geller      F  Friends       79.7
2 F1   Rachel Green      F  Friends       76.7
3 F5  Chandler Bing      F  Friends       74.4
4 F4 Joey Tribbiani      F  Friends       74.3
5 F3  Phoebe Buffay      F  Friends       72.6
6 F6    Ross Geller      F  Friends       51.6
                                                      link
1 https://openpsychometrics.org/tests/characters/stats/F/2
2 https://openpsychometrics.org/tests/characters/stats/F/1
3 https://openpsychometrics.org/tests/characters/stats/F/5
4 https://openpsychometrics.org/tests/characters/stats/F/4
5 https://openpsychometrics.org/tests/characters/stats/F/3
6 https://openpsychometrics.org/tests/characters/stats/F/6
                                                                  image_link
1 https://openpsychometrics.org/tests/characters/test-resources/pics/F/2.jpg
2 https://openpsychometrics.org/tests/characters/test-resources/pics/F/1.jpg
3 https://openpsychometrics.org/tests/characters/test-resources/pics/F/5.jpg
4 https://openpsychometrics.org/tests/characters/test-resources/pics/F/4.jpg
5 https://openpsychometrics.org/tests/characters/test-resources/pics/F/3.jpg
6 https://openpsychometrics.org/tests/characters/test-resources/pics/F/6.jpg
  1. Which characters will this code extract: characters[(characters$uni_name == "Harry Potter" | characters$uni_name != "Harry Potter") & !(characters$notability > 90), ]?
    • All Harry Potter characters with a notability over 90.
    • All characters that are not from the Harry Potter universe and have a notability under 90.
    • All characters with a notability over 90.
    • All characters with a notability under 90.
  • All Harry Potter characters with a notability over 90.
  • All characters that are not from the Harry Potter universe and have a notability under 90.
  • All characters with a notability over 90.
  • All characters with a notability under 90.

Kind of a trick question: because we select all characters that are from the Harry Potter universe OR are not from there, we select all characters independent of their TV show. But we select all characters that have notability under 90 (beware of the ! in front of the respective comparison).

Exercise 3

  1. Which character(s) from “Game of Thrones” has a notability rating over 90? Use Base R.

You need to define a logical vector which contains TRUE values for all “Game of Thrones” characters that have a notability over 90.

characters[characters$uni_name == "Game of Thrones" & characters$notability > 90, ]
     id             name uni_id        uni_name notability
18 GOT2 Tyrion Lannister    GOT Game of Thrones       90.8
                                                         link
18 https://openpsychometrics.org/tests/characters/stats/GOT/2
                                                                     image_link
18 https://openpsychometrics.org/tests/characters/test-resources/pics/GOT/2.jpg

That’s only Tyrion Lannister.

  1. Which characters from “How I Met Your Mother” or “Breaking Bad” are included in the data? Use the tidyverse.

Use the filter() function.

library(tidyverse)
filter(characters, uni_name %in% c("How I Met Your Mother", "Breaking Bad"))
       id              name uni_id              uni_name notability
1  HIMYM4    Barney Stinson  HIMYM How I Met Your Mother       76.0
2  HIMYM3 Robin Scherbatsky  HIMYM How I Met Your Mother       74.2
3  HIMYM5       Lily Aldrin  HIMYM How I Met Your Mother       74.1
4  HIMYM2  Marshall Eriksen  HIMYM How I Met Your Mother       71.0
5  HIMYM1         Ted Mosby  HIMYM How I Met Your Mother       63.7
6     BB1      Walter White     BB          Breaking Bad       91.3
7     BB3     Jesse Pinkman     BB          Breaking Bad       88.9
8     BB9  Mike Ehrmantraut     BB          Breaking Bad       82.5
9     BB8         Gus Fring     BB          Breaking Bad       79.6
10    BB4     Hank Schrader     BB          Breaking Bad       74.8
11    BB7      Saul Goodman     BB          Breaking Bad       73.8
12   BB10     Jane Margolis     BB          Breaking Bad       61.3
13    BB2      Skyler White     BB          Breaking Bad       55.4
14    BB6       Flynn White     BB          Breaking Bad       46.8
15    BB5    Marie Schrader     BB          Breaking Bad       27.9
                                                           link
1  https://openpsychometrics.org/tests/characters/stats/HIMYM/4
2  https://openpsychometrics.org/tests/characters/stats/HIMYM/3
3  https://openpsychometrics.org/tests/characters/stats/HIMYM/5
4  https://openpsychometrics.org/tests/characters/stats/HIMYM/2
5  https://openpsychometrics.org/tests/characters/stats/HIMYM/1
6     https://openpsychometrics.org/tests/characters/stats/BB/1
7     https://openpsychometrics.org/tests/characters/stats/BB/3
8     https://openpsychometrics.org/tests/characters/stats/BB/9
9     https://openpsychometrics.org/tests/characters/stats/BB/8
10    https://openpsychometrics.org/tests/characters/stats/BB/4
11    https://openpsychometrics.org/tests/characters/stats/BB/7
12   https://openpsychometrics.org/tests/characters/stats/BB/10
13    https://openpsychometrics.org/tests/characters/stats/BB/2
14    https://openpsychometrics.org/tests/characters/stats/BB/6
15    https://openpsychometrics.org/tests/characters/stats/BB/5
                                                                       image_link
1  https://openpsychometrics.org/tests/characters/test-resources/pics/HIMYM/4.jpg
2  https://openpsychometrics.org/tests/characters/test-resources/pics/HIMYM/3.jpg
3  https://openpsychometrics.org/tests/characters/test-resources/pics/HIMYM/5.jpg
4  https://openpsychometrics.org/tests/characters/test-resources/pics/HIMYM/2.jpg
5  https://openpsychometrics.org/tests/characters/test-resources/pics/HIMYM/1.jpg
6     https://openpsychometrics.org/tests/characters/test-resources/pics/BB/1.jpg
7     https://openpsychometrics.org/tests/characters/test-resources/pics/BB/3.jpg
8     https://openpsychometrics.org/tests/characters/test-resources/pics/BB/9.jpg
9     https://openpsychometrics.org/tests/characters/test-resources/pics/BB/8.jpg
10    https://openpsychometrics.org/tests/characters/test-resources/pics/BB/4.jpg
11    https://openpsychometrics.org/tests/characters/test-resources/pics/BB/7.jpg
12   https://openpsychometrics.org/tests/characters/test-resources/pics/BB/10.jpg
13    https://openpsychometrics.org/tests/characters/test-resources/pics/BB/2.jpg
14    https://openpsychometrics.org/tests/characters/test-resources/pics/BB/6.jpg
15    https://openpsychometrics.org/tests/characters/test-resources/pics/BB/5.jpg