library(tidyverse)
<- readRDS(file = here::here("raw_data", "world_coordinates.rds"))
world_coordinates <- readRDS(file = here::here("raw_data", "athletes.rds")) athletes
Functions
This chapter is optional.
Motivation
Suppose we want to know the number of gold medals a specific athlete has won, along with some additional data, all printed into the console. Well, we could do something like this:
<- athletes %>%
medal_counts_athlete # Extract all rows containing gold medal winners:
filter(Medal %in% c("Gold")) %>%
# Group them by name:
group_by(Name) %>%
# Count the number of medals for each name:
count(Medal)
head(medal_counts_athlete)
# A tibble: 6 × 3
# Groups: Name [6]
Name Medal n
<chr> <chr> <int>
1 "A. Albert" Gold 1
2 "Aage Jrgen Christian Andersen" Gold 1
3 "Aage Valdemar Harald Frandsen" Gold 1
4 "Aagje \"Ada\" Kok (-van der Linden)" Gold 1
5 "Aale Maria Tynni (-Pirinen, -Haavio)" Gold 1
6 "Aaron Nguimbat" Gold 1
# Extract all rows of Usain Bolt
<- medal_counts_athlete %>%
medals_bolt filter(Name == "Usain St. Leo Bolt")
head(medals_bolt)
# A tibble: 1 × 3
# Groups: Name [1]
Name Medal n
<chr> <chr> <int>
1 Usain St. Leo Bolt Gold 8
# Extract all rows of Usain bolt from the athletes data set
<- athletes %>%
stats_bolt filter(Name == "Usain St. Leo Bolt") %>%
## sort the data frame by year:
arrange(Year)
head(stats_bolt)
NOC ID Name Sex Age Height Weight Team Games Year
1 JAM 13029 Usain St. Leo Bolt M 17 196 95 Jamaica 2004 Summer 2004
2 JAM 13029 Usain St. Leo Bolt M 21 196 95 Jamaica 2008 Summer 2008
3 JAM 13029 Usain St. Leo Bolt M 21 196 95 Jamaica 2008 Summer 2008
4 JAM 13029 Usain St. Leo Bolt M 21 196 95 Jamaica 2008 Summer 2008
5 JAM 13029 Usain St. Leo Bolt M 25 196 95 Jamaica 2012 Summer 2012
6 JAM 13029 Usain St. Leo Bolt M 25 196 95 Jamaica 2012 Summer 2012
Season City Sport Event Medal Region
1 Summer Athina Athletics Athletics Men's 200 metres <NA> Jamaica
2 Summer Beijing Athletics Athletics Men's 4 x 100 metres Relay <NA> Jamaica
3 Summer Beijing Athletics Athletics Men's 200 metres Gold Jamaica
4 Summer Beijing Athletics Athletics Men's 100 metres Gold Jamaica
5 Summer London Athletics Athletics Men's 4 x 100 metres Relay Gold Jamaica
6 Summer London Athletics Athletics Men's 100 metres Gold Jamaica
# Print a statement using the data we just have extracted:
print(
paste("Usain St. Leo Bolt participated in Olympic games in the year(s)",
paste0(unique(stats_bolt$Year), collapse = ", "),
"and won",
$n,
medals_bolt"Goldmedal/s in total. The athletes sport was:",
unique(stats_bolt$Sport),
".")
)
[1] "Usain St. Leo Bolt participated in Olympic games in the year(s) 2004, 2008, 2012, 2016 and won 8 Goldmedal/s in total. The athletes sport was: Athletics ."
Puuh, already not that quick, especially if this is meant as an easy way for users to extract the gold medal number for multiple athletes. They would have to specify for both data frames the name and build together their print statement from scratch. Luckily, we can just write a function which is a way to organize multiple operations together, so they can easily get repeated. Let’s do that quickly, and then take a step back and look at the components of a function:
<- function(athlete_name) {
count_goldmedals <- athletes %>%
medal_counts_athlete ## Extract all rows with gold medal winners:
filter(Medal == "Gold") %>%
## Group them by name
group_by(Name) %>%
## count the number of medals for each name:
count(Medal)
## Extract the medal count row for the athlete name provided by the user using the athlete_name argument:
<- medal_counts_athlete %>%
medals_name filter(Name == athlete_name)
## Extract the rows in the athlets data frame for the athlete name provided by the user using the athlete_name argument
<- athletes %>%
stats_name filter(Name == athlete_name) %>%
## Sort by year:
arrange(Year)
## Build the statement:
<- paste(
statement
athlete_name,"participated in Olympic games in the year(s)",
paste0(unique(stats_name$Year), collapse = ", "),
"and won",
$n,
medals_name"Goldmedal/s in total. The athletes sport was:",
unique(stats_name$Sport),
"."
)
print(statement)
return(medals_name)
}
count_goldmedals(athlete_name = "Usain St. Leo Bolt")
[1] "Usain St. Leo Bolt participated in Olympic games in the year(s) 2004, 2008, 2012, 2016 and won 8 Goldmedal/s in total. The athletes sport was: Athletics ."
# A tibble: 1 × 3
# Groups: Name [1]
Name Medal n
<chr> <chr> <int>
1 Usain St. Leo Bolt Gold 8
count_goldmedals(athlete_name = "Simone Arianne Biles")
[1] "Simone Arianne Biles participated in Olympic games in the year(s) 2016 and won 4 Goldmedal/s in total. The athletes sport was: Gymnastics ."
# A tibble: 1 × 3
# Groups: Name [1]
Name Medal n
<chr> <chr> <int>
1 Simone Arianne Biles Gold 4
Pretty cool, right? We just write our code once, and can reuse it as often as we want to. So, let’s take a closer look at how to actually do that.
How to write a function?
Everything that does something in R is a function. We have already used a lot of them, like print()
, filter()
, merge()
. The great thing is: we can define our own functions pretty easily:
function_name <- function(argument_1, argument_2, ...){
do some operations
return(result)
}
- We always have to give the function a concise name (often not that easy).
- Then we specify some arguments (which should also have concise names). In our introductory example that was just the athlete name. We can also provide a default option for the arguments, which the function will fall back on if the user doesn’t specify anything.
- Inside the
{ }
we define the operations, which can use the variable function arguments so the user can specify some aspects of the function behavior. - In the end, it is good practice to return the result by using
return()
, so it is always clear what the function is giving back to the user.
One minimal example with three arguments would be to sum three numbers:
<- function(x, y, z = 0){
sum_num <- x + y + z
result return(result)
}
sum_num(x = 1, y = 1, z = 2)
[1] 4
## We don't have to use the arguments in order, IF we name them:
sum_num(y = 2, z = 4, x = 1)
[1] 7
## We don't have to specify z, because the function can use a default:
sum_num(x = 3, y = 1)
[1] 4
It often makes sense to explicitly write the argument names into your function call. This makes your code clearer, and avoids a mix up.
Footnotes
Image by Laura Ockel on Unsplash.↩︎