<- c(2023, 8, 8)
num_vec num_vec
[1] 2023 8 8
<- c("This", "is", "a", "vec", ".")
char_vec char_vec
[1] "This" "is" "a" "vec" "."
<- c(TRUE, FALSE)
log_vec log_vec
[1] TRUE FALSE
There are five main data structures in R which differ on their dimensions (one dimension, two dimensions, n dimensions) and the type of the elements they are containing (same type, different types):1
Homogeneous | Heterogeneous | |
---|---|---|
1d | atomic vector | list |
2d | matrix | data.frame |
nd | array |
Let’s take a closer look at the two we will use mostly throughout this workshop:
Atomic vectors (from here on only called vectors) contain elements of only the same type:
[1] 2023 8 8
[1] "This" "is" "a" "vec" "."
[1] TRUE FALSE
If we take a look at the structure of the vectors we have just created, we see se a short description of the data type we are dealing with in front of the vector:
num [1:3] 2023 8 8
chr [1:5] "This" "is" "a" "vec" "."
logi [1:2] TRUE FALSE
The first one is num
(numeric) so it only stores numeric values. The second one is char
(character), so it only can contain strings. And last but not least we have logi
(logical) for boolean values. Why is that important? Well, some functions only make sense for specific data types. For example:
Warning in mean.default(char_vec): argument is not numeric or logical:
returning NA
[1] NA
gives us a warning, because the input has the wrong format.
By the way, strings are just ‘words’ combined of multiple characters. We can combine multiple strings by using paste()
or paste0()
(the first one leaves a space between the words, the second one not):
[1] "My value is: 10"
[1] "My valueis:10"
This will come in handy later when we write our own functions, because it helps us to print variable messages, depending on the input given by the user.
A data frame is two dimensional and can store elements of different types.
Note that we do nothing else here than combining vectors to a data frame. Each vector will be one column, with an assigned column name.
Adding new columns to a data frame is pretty straight forward. We just define the column name, and then assign it some input. For example, we could add a column with the neuroticsm ratings for each person:
name age birth_month big5_extro big5_neuro
1 Anna 19 Jan 3.5 1
2 Alex 17 Sep 2.0 3
3 John 18 Oct 4.5 2
4 Jessi 18 Mar 4.2 4
Or, using the tidyverse
with the help of mutate()
:
A special type of data frame are the so called tibbles
. Tibbles are a modern version of data frames and the standard data frame type of the tidyverse
, as they have some advantageous characteristics (e.g., note the more informative printing of the data frame). So don’t be confused if you run into them, in general they behave like data frames.
persons_tibble <- tibble(
name = c("Anna", "Alex", "John", "Jessi"),
age = c(19, 17, 18, 18),
birth_month = c("Jan", "Sep", "Oct", "Mar"),
big5_extro = c(3.5, 2, 4.5, 4.2)
)
persons_tibble
# A tibble: 4 × 4
name age birth_month big5_extro
<chr> <dbl> <chr> <dbl>
1 Anna 19 Jan 3.5
2 Alex 17 Sep 2
3 John 18 Oct 4.5
4 Jessi 18 Mar 4.2
A list is a one dimensional object, which can, unlike a vector, contain elements of different types, but also of different lengths. For example, we can store a vectors of different lengths and data frames in a list, which makes it the most versatile data structure:
personality_rating <- list(
big5 = data.frame(name = c("Jessi", "John"),
extraversion = c(4.3, 2),
openness = c(3.8, NA)),
rating_type = "self_rating"
)
personality_rating
$big5
name extraversion openness
1 Jessi 4.3 3.8
2 John 2.0 NA
$rating_type
[1] "self_rating"
Here, we define the list personality_ratings
, which includes a data frame with the personality rating, and some meta information in the form of a character vector, describing the rating type. We won’t use it much in this workshop, but keep in mind it exists, as it quickly becomes necessary for managing more complex tasks.
Finally, just for the sake of comprehensiveness (we won’t use them in the following workshop, but that doesn’t mean they are irrelevant):
[,1] [,2]
[1,] "1" "3"
[2,] "2" "4"
Note how everything gets converted to character (with the “” around it), because we used a "3"
instead of 3
? That’s because a matrix can only have values of the same type.
Last but not least, just so you have seen it once:
, , 1
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
, , 2
[,1] [,2] [,3]
[1,] 7 9 11
[2,] 8 10 12
, , 3
[,1] [,2] [,3]
[1,] 13 15 17
[2,] 14 16 18
, , 4
[,1] [,2] [,3]
[1,] 19 21 23
[2,] 20 22 24
By using the dim
argument I specify that each matrix in this array has 2
rows, 3
columns, and that I want 4
matrices.
Table from Advanced R.↩︎