Reshaping: Exercise

Previous code

# install.packages("tidyverse")
# install.packages("here")

library(tidyverse)
library(here)

## Load the data
characters <- readRDS(file = here::here("raw_data", "characters.rds"))
psych_stats <- read.csv(
  file = here::here("raw_data", "psych_stats.csv"),
  sep = ";"
)

Exercise 1

Take a look at the data frame psych_stats. Which format does it have?

Wide format
Long format
None of the above

Solution

Wide format
Long format
None of the above

Each unit of observation, in this case each character, only has one row.

Exercise 2

Reshape it, so there are only three columns in the data set: char_id, question and rating.

Hint

You can select multiple columns like this: column_1:column_10.

Solution

psych_stats <- psych_stats %>%
  pivot_longer(cols = messy_neat:innocent_jaded, 
               names_to = "question", 
               values_to = "rating")

head(psych_stats)

# A tibble: 6 × 3
  char_id question                      rating
  <chr>   <chr>                          <dbl>
1 F2      messy_neat                     95.7 
2 F2      disorganized_self.disciplined  95.2 
3 F2      diligent_lazy                   6.10
4 F2      on.time_tardy                   6.2 
5 F2      competitive_cooperative         6.40
6 F2      scheduled_spontaneous           6.60

Now we have multiple rows for every character, but all question ratings are nicely aligned in one column.

Exercise 3

Try to reshape the data into long format again.

Solution

psych_stats %>%
  pivot_wider(id_cols = char_id, 
               names_from = "question", 
               values_from = "rating")

# A tibble: 889 × 365
   char_id messy_neat disorganized_self.disciplined diligent_lazy on.time_tardy
   <chr>        <dbl>                         <dbl>         <dbl>         <dbl>
 1 F2           95.7                           95.2          6.10           6.2
 2 F1           30.2                           25.9         51.8           77.9
 3 F5           45.3                           42.4         52.2           57.1
 4 F4           13                             11           78.1           84.1
 5 F3           20.9                           20.9         45.2           74  
 6 F6           81                             75.6         20             20.6
 7 EU1           9.60                          10.4         62.3           85.7
 8 EU2          27.7                           31.9         23.7           68.3
 9 EU6          40                             39.6         54.1           73.6
10 EU3          43.9                           31.1         32.2           58.2
# ℹ 879 more rows
# ℹ 360 more variables: competitive_cooperative <dbl>,
#   scheduled_spontaneous <dbl>, ADHD_OCD <dbl>, chaotic_orderly <dbl>,
#   motivated_unmotivated <dbl>, bossy_meek <dbl>, persistent_quitter <dbl>,
#   overachiever_underachiever <dbl>, muddy_washed <dbl>, beautiful_ugly <dbl>,
#   slacker_workaholic <dbl>, driven_unambitious <dbl>, outlaw_sheriff <dbl>,
#   precise_vague <dbl>, bad.cook_good.cook <dbl>, manicured_scruffy <dbl>, …

This is how we got it! But scratch that, it was just for the sake of the exercise. We want to use psych_stats in the long format from now on.