Waving Hello to R

R Workshop 2024 in UTokyo

SUN Yufei(Adrian)

UTokyo & Tsinghua University

About Me

  • PhD Candidate in Political Science, Tsinghua University

  • Visiting Researcher at ISS

  • Research interests: Political Psychology, Natural Language Processing

  • Dissertation: How and why Hong Kong uses CCP discourse, and its empirical impact

  • Founder of Tsinghua R Workshop, 5-year R Workshop instructor
  • Github Campus Expert

About the Workshop

  • Week 1: Basic grammar
  • Week 2: Data cleaning & Basic visualization
  • Week 3: Loops and batch processing
  • Week 4: R & Quarto (for HTML, PDF, Word, etc.)

Outline

  • R Basics
  • R Packages
  • Data Import
  • Data Export
  • Data Structure
  • Variable Extraction
  • Variable Characteristics
  • Variable Attributes

What is R? → Just like your 📱

What is R? → Just like your 📱

R → Your Phone

R Studio/VS Code → iOS/Android

R packages → Apps

R Basic → iMessage

packages from other source → Line

function → Message Sending

object → Contact

:::

R Studio/VS Code → iOS/Android

IDEs (Integrated Development Environments)

R Studio/VS Code → iOS/Android

and … VS Code/pycharm/Positron

  • latest plugins
  • multiple languages
  • Not customized for R
  • Unstable

R packages → Apps

install

install.packages("adrianlp")
remotes::install_github("syfyufei/adrianlp")

call

library(adrianlp)
require(adrianlp)
adrianlp::tokenlize(text_data)

Basic R & Packages

R Basic → iMessage

  • Data Manipulation
    • Create, subset, and manipulate data frames
    • Use functions like subset(), merge(), apply()
  • Statistical Analysis
    • Perform basic statistical tests
    • Use functions like t.test(), lm(), summary()
  • Data Visualization
    • Create basic plots and charts
    • Use functions like plot(), hist(), boxplot()
  • Programming
    • Write functions and loops
    • Use control structures like if, for, while

Classic R Packages → Line

  • dplyr
    • Data manipulation and transformation
    • Functions: filter(), select(), mutate(), summarize()
  • ggplot2
    • Data visualization
    • Functions: ggplot(), geom_point(), geom_line()
  • tidyr
    • Data tidying
    • Functions: gather(), spread(), separate(), unite()
  • shiny
    • Interactive web applications
    • Functions: shinyApp(), fluidPage(), server()

function → Message Sending

Functions in R

  • Built-in Functions
    • Predefined functions available in R
    • Examples: mean(), sum(), length()
  • User-defined Functions
    • Custom functions created by users
    • Syntax: function_name <- function(arg1, arg2) { ... }

Components of a Function

  • Function Name
    • The identifier used to call the function
    • Example: my_function
  • Arguments
    • Inputs to the function, specified within parentheses
    • Example: function(arg1, arg2)
  • Body
    • The code block that defines the function’s operations
    • Enclosed within curly braces { ... }
  • Return Value
    • The output of the function, specified using return()
    • Example: return(result)

function → Message Sending

Code to Access lm() Help Documentation

  • Use help() function
    • Code: help(lm)
  • Use ? symbol
    • Code: ?lm

Usage

Usage

lm(formula, data, subset, weights, na.action,
   method = "qr", model = TRUE, x = FALSE, y = FALSE, qr = TRUE,
   singular.ok = TRUE, contrasts = NULL, offset, ...)

Let’s Practice (Hot Potato)

Hot Potato Game

Hot Potato(“ばくだんゲーム”; “击鼓传花”) but one by one version

Keyboard

Hot Keyboard Time

Question

How to display the help documentation for the sum() function? Provide two methods.

Answer

help(sum)
?sum

<-

Assignment operator, the shorthand for the assign() command

Syntax: <variable name> <- <object>

aValidObject <- 1:5
aValidObject

Why <-

  • Intuitive
  • Will not be confused with “=”
  • Shortcut input
    • PC: Alt + -
    • Mac: option + -
a <- 12
25 -> b

Naming Rules

  1. Don’t start with a number (Error: 1stday).
  2. No special symbols except . and _ (Error: M&M).
  3. Case sensitive (X != x) ! means "not"/"no",!=` means “not equal to”.
  4. Don’t override built-in commands if necessary(avoid: list <- c(1:5)).

Hot Keyboard Time!

Data Input

Built-in Data

data()

Hot Keyboard Time!

###Data Types That Can Be Read Directly

  • .RDS (single object)
  • .RData (multiple objects)
  • .txt
  • .csv

Syntax: <name><- <read command>(<data path>)

df_rds <- readRDS("aDataset.rds")
df_txt <- read.table("D:/aDataset.txt")
df_csv <- read.csv("./aDataset.csv")

Data Types Need To Call The Package To Read

Call the package through library or require, and then use the commands in it.

# SPSS, Stata, SAS
library(haven)
df_spss <- read_spss("<FileName>.sav")
df_stata <- read_dta("<FileName>.dta")
df_sas <- read_sas("<FileName>.sas7bdat")  

# Quick Import of Forms
library(reader)
df_csv <- read.csv("<FileName>.csv")
df_table <- read.table("<FileName>.csv/txt")

# Excel
library(readxl)
df_excel <- read_excel("<FileName>.xls")
df_excel2 <- read_excel("<FileName>.xlsx")

# JSON (JavaScript Object Notation)
library(rjson)
df_json <- fromJSON(file = "<FileName>.json" )

# XML/Html
library(xml)
df_xml <- xmlTreeParse("<url>")
df_html <- readHTMLTable(url, which=3)

Hot Keyboard Time!

Data Input

The Swiss Army Knife of data reading:rio

library(rio)
df_anything <- import(<AnyTypeOfData>)

Data Output

Syntax: (, file = )

Saving as R Data

saveRDS(df_toy, file = "df_toy.rds")
save(df_toy, ls_monks, file = "test.rdata")

Saving as CSV File

write.csv(df_toy, file = "toy.csv")

Note: If your data contains CJK characters, you may encounter encoding issues when saving as a CSV file.

Data Munging

load("/Users/adrian/Documents/Yufei_Sun/THU/projects/slides/course/rworkshop_in_UTokyo/wvs7.rda")

Data Structure

  • Observations
  • Variables
  • Data Structure

Data Structure

load("/Users/adrian/Documents/Yufei_Sun/THU/projects/slides/course/rworkshop_in_UTokyo/wvs7.rda")

wvs7

nrow(wvs7) # Get the number of rows in the data
ncol(wvs7) # Get the number of columns in the data
names(wvs7) # Get the variable/column names
str(wvs7) # Get variable names, types, number of rows and columns

Variable Extraction

To extract a column from a data frame, there are at least two common methods:

[a,b]

wvs7[, "country"]

<data frame>$<variable name>

wvs7$country

Variable Characteristics

numeric variables

table

table(wvs7$age)

summary

summary(wvs7$age)

Non-numeric Variables: Summary Tables

table(wvs7$female)
table(wvs7$marital)

For factor variables, we can also extract their level information

levels(wvs7$religious)
levels(wvs7$marital)

Variable Attributes

Attributes are attributes of all types of variables.

length(wvs7$age) # Get the length of the variable (here it is the number of rows)
unique(wvs7$age) # Get the unique values of the variable

summary(wvs7$age) # Get all the above information about the year
class(wvs7$age) # Check the structure of the year: vector, matrix, array, dataframe, list
typeof(wvs7$age) # Check the type of the year elements

Summary

summary(wvs7$age)
summary(wvs7)

Take-Home Points

  • R Basics
    • RStudio/VS Code are like operating systems (IDEs)
    • R packages are similar to apps
    • Functions in R are comparable to “Message Sending”
  • Loading Packages
    • install.packages(): Installing packages
    • library() or require(): Loading packages
    • help() or ?: Accessing function documentation
  • Data Import:
    • readRDS(), read.table(), read.csv(): Reading various file formats
  • Data Export:
    • saveRDS(), save(), write.csv(): Saving in different formats
    • rio::import(): The “Swiss Army Knife” for data reading
  • Data Structure:
    • nrow(), ncol(): Get number of rows and columns names(): Get variable/column names
    • str(): Get variable names, types, number of rows and columns
  • Data Extraction:
    • dataframe[, "variable"] or dataframe$variable: Extracting variables
  • Variable Characteristics:
    • table(), summary(): Examining variable characteristics
    • levels(): Understanding factor variables
  • Variable Attributes:
    • length(), unique(), class(), typeof()

Thank you

Github Page: https://github.com/syfyufei

Email: sunyf20@mails.tsinghua.edu.cn

Personal Website:https://syfyufei.github.io/