Loop in R

R Workshop 2024 in UTokyo

SUN Yufei(Adrian)

UTokyo & Tsinghua University

Course Homepage

https://adriansun.drhuyue.site/course/r-workshop-tokyo-2024.html

Email: sunyf20@mails.tsinghua.edu.cn

Why we need loops?

“Please collect all the news from the ISS website.”

library(rvest)

ISS_news <- read_html("https://issnews.iss.u-tokyo.ac.jp/cat2/cat14/index.html") %>% 
  html_nodes("h3 a") %>% 
  html_attr("href")

# for each page, we need to:
# 1. read the html
# 2. extract the news
# 3. combine them together

first_page <- read_html(ISS_news[1]) %>% 
  html_nodes("p:nth-child(2)") %>% 
  html_text()

If we don’t use loops, we need to manually collect the news from each page. That will be 3*PageNum lines of code.

Why we need loops?

  • Reduces Code Repetition

  • Superior Efficiency

  • Saves Time

Key Concepts

  • Research Question: Is social trust, an important component of social capital, affected by social inequality?
    • How to create a “neighborhood distrust” variable based on neighborhood trust variables?
    • How to categorize income levels into low, medium, and high based on family income?
  • Loops
    • Conditional Statements
    • Iteration
    • While Loops
    • Repeat Loops

Loops

Loops are conditional repetitions.

Conditional Statements (IF)

Conditional statements are not loop statements but are the foundation of loop statements. They determine whether the loop continues. Almost all loop statements can be broken down into finite steps of conditional statements.

Logical Judgement

1 + 1 != 2

is.na(c(1, 2, NA, 3))

is.numeric(wvs7$female)

Conditional Handling

  • If…then, else…then… is a form of conditional statement used in almost all high-level programming languages.
    • Basic Syntax: if(<logical condition>){<execute command>}
    • Extended Syntax: if(<logical condition1>){<execute command1>} else if(<logical condition2>){<execute command2>}... else(<execute commandn>)
      • Statement Vectorization: ifelse(<logical condition>, <execute command1>, <execute command2>)
      • Multi-condition Abbreviation: case_when(<logical condition1> ~ <execute command1>, <logical condition2> ~ <execute command2>, ...)
  • e.g., Convert the gender variable in wvs7 to a numeric variable if it is logical
if(is.logical(wvs7$female)) wvs7$female <- as.numeric(wvs7$female)
  • e.g., Create a “neighborhood distrust” variable based on the neighborhood trust variable in wvs7, where 4 represents the least trust and 1 represents the most trust
if(wvs7$trust_neighbor[1] == 1) {
  wvs7$distrust_neighbor[1] <- 4
} else if (wvs7$trust_neighbor[1] == 2) {
  wvs7$distrust_neighbor[1] <- 3
} else if (wvs7$trust_neighbor[1] == 3) {
  wvs7$distrust_neighbor[1] <- 2
} else if (is.na(wvs7$trust_neighbor[1])) {
  wvs7$distrust_neighbor[1] <- NA
} else{
  wvs7$distrust_neighbor[1] <- 1
}
  • What is wrong with the following approach?
  • Does this approach solve my problem?
  • Is there a better way?
wvs7$distrust_neighbor <- 5 - wvs7$trust_neighbor
  • e.g.: According to the household income level in wvs7, low (1), medium (2), high (3) and other levels are distinguished. The distinction standards are the 25% and 75% quantile points.
vec_cut <- quantile(wvs7$incomeLevel, probs = c(0.25, 0.75), na.rm = TRUE)
wvs7$incomeCat3 <- ifelse(wvs7$incomeLevel < vec_cut[1], 1,
ifelse(wvs7$incomeLevel > vec_cut[2], 3,
ifelse(is.na(wvs7$incomeLevel), NA, 2)))
wvs7$incomeCat3 <- case_when(
wvs7$incomeLevel < vec_cut[1] ~ 1,
wvs7$incomeLevel >= vec_cut[1] & wvs7$incomeLevel <= vec_cut[2] ~ 2,
wvs7$incomeLevel > vec_cut[2] ~ 3
)
# If no condition is met, case_when directly assigns NA
  • What if I want to do it by country?
vec_cut <- quantile(wvs7$incomeLevel, probs = c(0.25, 0.75), na.rm = TRUE)
wvs7 <- group_by(wvs7, country) %>%
mutate(incomeCat3 = case_when(
incomeLevel < vec_cut[1] ~ 1,
incomeLevel >= vec_cut[1] & incomeLevel <= vec_cut[2] ~ 2,
incomeLevel > vec_cut[2] ~ 3
))

Traversal (FOR)

Command syntax: for(<index> in <input sequence>){<execution command>} Enter sequence settings: - given value range - Determined by variables and data frame dimensions

Traversal Loop

  • e.g., establish the “neighborhood distrust” variable based on the neighborhood trust variable in wvs7, 4 represents the least trust, 1 represents the most trust
length(wvs7$trust_neighbor)
for(i in 1:1264) {
if (is.na(wvs7$trust_neighbor[i])) {
wvs7$distrust_neighbor[i] <- NA
} else if (wvs7$trust_neighbor[i] == 1) {
wvs7$distrust_neighbor[i] <- 4
} else if (wvs7$trust_neighbor[i] == 2) {
wvs7$distrust_neighbor[i] <- 3
} else if (wvs7$trust_neighbor[i] == 3) {
wvs7$distrust_neighbor[i] <- 2
} else{
wvs7$distrust_neighbor[i] <- 1
}
}

For the output object, it can be determined according to the purpose of the command; for the input value range, dynamic assignment can also be used.

ls_tb <- vector(mode = "list")
for(i in seq(wvs7_trust)) {
ls_tb[[i]] <- pull(wvs7_trust, i) %>% table
}
names(ls_tb) <- names(wvs7_trust)
ls_tb

Build a recursive loop

  • e.g., calculate how many numeric variables there are in wvs7
count_num <- 0
for (variable in names(wvs7)) {
if (is.numeric(wvs7[[variable]])) {
count_num <- count_num + 1
}
}
count_num

Condition Loop (WHILE)

The purpose of conditional loops is to repeatedly run commands. Logic diagram of conditional loop statement:

  • e.g. A rewrite of the previous contingency table example

Previous example of using for:

Rewrite using while:

ls_tb2 <-vector(mode = "list")
j <- 1
while(j <= length(wvs7_trust)) {
ls_tb2[[j]] <- pull(wvs7_trust, j) %>% table
j <- j + 1
}
names(ls_tb2) <- names(wvs7_trust)
ls_tb2
identical(ls_tb, ls_tb2)

Condition Loop (WHILE)

Conversely, not all conditional loop statements can be rewritten as traversal loops.

i <- 0
while (TRUE) {
if (runif(1) < 0.01)
break
i <- i + 1
}
i

This type of statement cannot be rewritten into a traversal loop statement, but R also provides a shorthand mode, which is a repeated statement.

Repeating Loop (REPEAT)

Repeating loop statements can be viewed as conditional loops with “continue running” as the termination condition.

Repeating Loop (REPEAT)

Let’s rewrite the following example of the conditional statement above using a repeated statement:

Example using while:

Use repeat to rewrite:

repeat{
if (runif(1) < 0.01)
break
i <- i + 1
}

Another example of a conditional loop can also be rewritten as a repeated loop.

Example using while:

Use repeat to rewrite:

k<-1
repeat{
ls_tb3[[k]] <- pull(wvs7_trust, k) %>% table
k <- k + 1
if(k == length(wvs7_trust) + 1) break
}
names(ls_tb3) <- names(wvs7_trust)
identical(ls_tb2, ls_tb3)