46 R Command Reference

This appendix provides a comprehensive reference for R commands, covering both base R and tidyverse functions.

46.1 Base R Fundamentals

Assignment and Basic Operations

Operation	Syntax	Example
Assignment	`<-` or `=`	`x <- 5`
Print value	Variable name or `print()`	`x` or `print(x)`
Comment	`#`	`# This is a comment`
Help	`?` or `help()`	`?mean`
Examples	`example()`	`example(mean)`
Search help	`??` or `help.search()`	`??regression`

Arithmetic Operators

Operator	Description	Example
`+`	Addition	`5 + 3`
`-`	Subtraction	`5 - 3`
`*`	Multiplication	`5 * 3`
`/`	Division	`5 / 3`
`^` or `**`	Exponentiation	`5^3`
`%%`	Modulo (remainder)	`5 %% 3`
`%/%`	Integer division	`5 %/% 3`

Comparison Operators

Operator	Description	Example
`==`	Equal to	`x == 5`
`!=`	Not equal to	`x != 5`
`<`	Less than	`x < 5`
`>`	Greater than	`x > 5`
`<=`	Less than or equal	`x <= 5`
`>=`	Greater than or equal	`x >= 5`

Logical Operators

Operator	Description	Example
`&`	AND (element-wise)	`x > 0 & x < 10`
`\|`	OR (element-wise)	`x < 0 \| x > 10`
`!`	NOT	`!is.na(x)`
`&&`	AND (single value)	`cond1 && cond2`
`\|\|`	OR (single value)	`cond1 \|\| cond2`
`%in%`	Value in set	`x %in% c(1, 2, 3)`
`xor()`	Exclusive OR	`xor(TRUE, FALSE)`

46.2 Vector Operations

Creating Vectors

Function	Description	Example
`c()`	Combine values	`c(1, 2, 3, 4, 5)`
`:`	Sequence	`1:10`
`seq()`	Sequence with step	`seq(0, 1, by = 0.1)`
`seq_len()`	Sequence of length n	`seq_len(10)`
`seq_along()`	Sequence along object	`seq_along(x)`
`rep()`	Repeat values	`rep(1, times = 5)`
`rep()`	Repeat each	`rep(1:3, each = 2)`
`vector()`	Create empty vector	`vector("numeric", 10)`

Vector Indexing

Syntax	Description	Example
`x[i]`	Element at position i	`x[3]`
`x[c(i,j)]`	Multiple elements	`x[c(1, 3, 5)]`
`x[-i]`	Exclude element	`x[-1]`
`x[condition]`	Logical subsetting	`x[x > 5]`
`x[1:n]`	Range of elements	`x[1:5]`
`x["name"]`	By name	`x["first"]`

Vector Functions

Function	Description	Example
`length()`	Number of elements	`length(x)`
`sum()`	Sum of elements	`sum(x)`
`mean()`	Arithmetic mean	`mean(x)`
`median()`	Median value	`median(x)`
`sd()`	Standard deviation	`sd(x)`
`var()`	Variance	`var(x)`
`min()`	Minimum	`min(x)`
`max()`	Maximum	`max(x)`
`range()`	Range (min and max)	`range(x)`
`quantile()`	Quantiles	`quantile(x, 0.5)`
`sort()`	Sort values	`sort(x)`
`order()`	Indices for sorting	`order(x)`
`rev()`	Reverse order	`rev(x)`
`unique()`	Unique values	`unique(x)`
`table()`	Frequency table	`table(x)`
`cumsum()`	Cumulative sum	`cumsum(x)`
`diff()`	Differences	`diff(x)`
`which()`	Indices where TRUE	`which(x > 5)`
`which.max()`	Index of max	`which.max(x)`
`which.min()`	Index of min	`which.min(x)`

46.3 Data Frames

Creating Data Frames

Function	Description	Example
`data.frame()`	Create data frame	`data.frame(x = 1:3, y = c("a", "b", "c"))`
`tibble()`	Create tibble	`tibble(x = 1:3, y = c("a", "b", "c"))`
`as.data.frame()`	Convert to data frame	`as.data.frame(matrix)`
`as_tibble()`	Convert to tibble	`as_tibble(df)`

Data Frame Indexing

Syntax	Description	Example
`df$col`	Column by name	`df$age`
`df[, "col"]`	Column as vector	`df[, "age"]`
`df["col"]`	Column as data frame	`df["age"]`
`df[i, ]`	Row by position	`df[1, ]`
`df[i, j]`	Element	`df[1, 2]`
`df[condition, ]`	Filter rows	`df[df$age > 25, ]`

Data Frame Functions

Function	Description	Example
`nrow()`	Number of rows	`nrow(df)`
`ncol()`	Number of columns	`ncol(df)`
`dim()`	Dimensions	`dim(df)`
`names()`	Column names	`names(df)`
`colnames()`	Column names	`colnames(df)`
`rownames()`	Row names	`rownames(df)`
`head()`	First rows	`head(df, 10)`
`tail()`	Last rows	`tail(df, 10)`
`str()`	Structure	`str(df)`
`summary()`	Summary statistics	`summary(df)`
`glimpse()`	Tidyverse structure	`glimpse(df)`
`View()`	Open viewer	`View(df)`

46.4 Reading and Writing Data

Base R I/O

Function	Description	Example
`read.csv()`	Read CSV	`read.csv("file.csv")`
`read.table()`	Read table	`read.table("file.txt", header = TRUE)`
`read.delim()`	Read tab-delimited	`read.delim("file.tsv")`
`write.csv()`	Write CSV	`write.csv(df, "file.csv", row.names = FALSE)`
`write.table()`	Write table	`write.table(df, "file.txt")`
`saveRDS()`	Save R object	`saveRDS(obj, "file.rds")`
`readRDS()`	Read R object	`readRDS("file.rds")`
`save()`	Save multiple objects	`save(x, y, file = "data.RData")`
`load()`	Load RData file	`load("data.RData")`

Tidyverse I/O (readr)

Function	Description	Example
`read_csv()`	Read CSV (fast)	`read_csv("file.csv")`
`read_tsv()`	Read TSV	`read_tsv("file.tsv")`
`read_delim()`	Read with delimiter	`read_delim("file.txt", delim = "\|")`
`read_fwf()`	Read fixed-width	`read_fwf("file.txt", col_positions)`
`write_csv()`	Write CSV	`write_csv(df, "file.csv")`
`write_tsv()`	Write TSV	`write_tsv(df, "file.tsv")`

46.5 Tidyverse: dplyr

Core Verbs

Function	Description	Example
`filter()`	Filter rows	`filter(df, age > 25)`
`select()`	Select columns	`select(df, name, age)`
`mutate()`	Create/modify columns	`mutate(df, age_sq = age^2)`
`arrange()`	Sort rows	`arrange(df, age)`
`summarise()`	Summarize data	`summarise(df, mean_age = mean(age))`
`group_by()`	Group data	`group_by(df, category)`

Selection Helpers

Function	Description	Example
`starts_with()`	Columns starting with	`select(df, starts_with("temp"))`
`ends_with()`	Columns ending with	`select(df, ends_with("_id"))`
`contains()`	Columns containing	`select(df, contains("score"))`
`matches()`	Columns matching regex	`select(df, matches("^x[0-9]"))`
`everything()`	All columns	`select(df, name, everything())`
`where()`	Columns where condition	`select(df, where(is.numeric))`
`all_of()`	All specified columns	`select(df, all_of(col_names))`
`any_of()`	Any of specified columns	`select(df, any_of(col_names))`

Additional dplyr Functions

Function	Description	Example
`distinct()`	Unique rows	`distinct(df, category)`
`count()`	Count occurrences	`count(df, category)`
`slice()`	Select rows by position	`slice(df, 1:10)`
`slice_head()`	First n rows	`slice_head(df, n = 5)`
`slice_tail()`	Last n rows	`slice_tail(df, n = 5)`
`slice_sample()`	Random rows	`slice_sample(df, n = 10)`
`slice_max()`	Rows with max values	`slice_max(df, age, n = 3)`
`slice_min()`	Rows with min values	`slice_min(df, age, n = 3)`
`pull()`	Extract column as vector	`pull(df, name)`
`rename()`	Rename columns	`rename(df, new_name = old_name)`
`relocate()`	Reorder columns	`relocate(df, name, .before = age)`
`across()`	Apply to multiple columns	`mutate(df, across(where(is.numeric), scale))`
`rowwise()`	Row-wise operations	`rowwise(df)`
`ungroup()`	Remove grouping	`ungroup(df)`
`n()`	Count in group	`summarise(df, count = n())`
`n_distinct()`	Count unique values	`summarise(df, unique = n_distinct(x))`
`first()`	First value	`summarise(df, first = first(x))`
`last()`	Last value	`summarise(df, last = last(x))`
`nth()`	Nth value	`summarise(df, third = nth(x, 3))`

Joins

Function	Description	Example
`left_join()`	Keep all left rows	`left_join(df1, df2, by = "id")`
`right_join()`	Keep all right rows	`right_join(df1, df2, by = "id")`
`inner_join()`	Keep matching rows	`inner_join(df1, df2, by = "id")`
`full_join()`	Keep all rows	`full_join(df1, df2, by = "id")`
`semi_join()`	Filter left by right	`semi_join(df1, df2, by = "id")`
`anti_join()`	Filter left, no match in right	`anti_join(df1, df2, by = "id")`
`bind_rows()`	Stack data frames	`bind_rows(df1, df2)`
`bind_cols()`	Combine columns	`bind_cols(df1, df2)`

46.6 Tidyverse: tidyr

Function	Description	Example
`pivot_longer()`	Wide to long	`pivot_longer(df, cols = -id, names_to = "var", values_to = "val")`
`pivot_wider()`	Long to wide	`pivot_wider(df, names_from = var, values_from = val)`
`separate()`	Split column	`separate(df, col, into = c("a", "b"), sep = "_")`
`separate_wider_delim()`	Split by delimiter	`separate_wider_delim(df, col, delim = "_", names = c("a", "b"))`
`unite()`	Combine columns	`unite(df, new_col, a, b, sep = "_")`
`drop_na()`	Remove NA rows	`drop_na(df)`
`fill()`	Fill NA with previous	`fill(df, column, .direction = "down")`
`replace_na()`	Replace NA values	`replace_na(df, list(x = 0))`
`complete()`	Complete missing combinations	`complete(df, x, y)`
`expand()`	Create all combinations	`expand(df, x, y)`
`nest()`	Nest data	`nest(df, data = -group)`
`unnest()`	Unnest data	`unnest(df, data)`

46.7 Tidyverse: ggplot2

Basic Structure

ggplot(data, aes(x = x_var, y = y_var)) +
  geom_*() +
  labs() +
  theme_*()

Geometries

Function	Plot Type	Usage
`geom_point()`	Scatter plot	Continuous x and y
`geom_line()`	Line plot	Continuous x and y
`geom_smooth()`	Smoothed line	Add trend line
`geom_bar()`	Bar chart	Counts
`geom_col()`	Bar chart	Values
`geom_histogram()`	Histogram	Distribution
`geom_density()`	Density plot	Smooth distribution
`geom_boxplot()`	Box plot	Distribution by group
`geom_violin()`	Violin plot	Distribution shape
`geom_jitter()`	Jittered points	Avoid overplotting
`geom_area()`	Area plot	Filled area
`geom_tile()`	Heatmap tiles	Grid data
`geom_text()`	Text labels	Add text
`geom_label()`	Text with background	Labeled points
`geom_errorbar()`	Error bars	Uncertainty
`geom_hline()`	Horizontal line	Reference line
`geom_vline()`	Vertical line	Reference line
`geom_abline()`	Diagonal line	y = a + bx

Aesthetics

Aesthetic	Description	Example
`x`	X-axis variable	`aes(x = var)`
`y`	Y-axis variable	`aes(y = var)`
`color`	Point/line color	`aes(color = group)`
`fill`	Fill color	`aes(fill = group)`
`size`	Point/line size	`aes(size = value)`
`shape`	Point shape	`aes(shape = group)`
`linetype`	Line type	`aes(linetype = group)`
`alpha`	Transparency	`aes(alpha = value)`
`group`	Grouping	`aes(group = id)`

Scales and Labels

Function	Description	Example
`labs()`	Add labels	`labs(title = "Title", x = "X", y = "Y")`
`scale_x_continuous()`	Continuous x scale	`scale_x_continuous(limits = c(0, 100))`
`scale_y_continuous()`	Continuous y scale	`scale_y_continuous(breaks = seq(0, 10, 2))`
`scale_x_log10()`	Log10 x scale	`scale_x_log10()`
`scale_color_manual()`	Manual colors	`scale_color_manual(values = c("red", "blue"))`
`scale_fill_brewer()`	ColorBrewer palette	`scale_fill_brewer(palette = "Set1")`
`scale_fill_viridis_d()`	Viridis discrete	`scale_fill_viridis_d()`
`coord_flip()`	Flip coordinates	`coord_flip()`
`coord_polar()`	Polar coordinates	`coord_polar()`

Function	Description	Example
`facet_wrap()`	Wrap into panels	`facet_wrap(~variable)`
`facet_grid()`	Grid of panels	`facet_grid(rows ~ cols)`

46.8 Statistical Functions

Function	Description	Example
`t.test()`	T-test	`t.test(x, y)`
`cor()`	Correlation	`cor(x, y)`
`cor.test()`	Correlation test	`cor.test(x, y)`
`lm()`	Linear model	`lm(y ~ x, data = df)`
`glm()`	Generalized linear model	`glm(y ~ x, family = binomial)`
`aov()`	ANOVA	`aov(y ~ group, data = df)`
`chisq.test()`	Chi-squared test	`chisq.test(table)`
`wilcox.test()`	Wilcoxon test	`wilcox.test(x, y)`
`ks.test()`	Kolmogorov-Smirnov test	`ks.test(x, y)`
`summary()`	Model summary	`summary(model)`
`coef()`	Model coefficients	`coef(model)`
`residuals()`	Model residuals	`residuals(model)`
`predict()`	Predictions	`predict(model, newdata)`

46.9 String Functions (stringr)

Function	Description	Example
`str_length()`	String length	`str_length("hello")`
`str_sub()`	Substring	`str_sub("hello", 1, 3)`
`str_c()`	Concatenate	`str_c("a", "b", sep = "-")`
`str_detect()`	Detect pattern	`str_detect(x, "pattern")`
`str_replace()`	Replace first match	`str_replace(x, "old", "new")`
`str_replace_all()`	Replace all matches	`str_replace_all(x, "old", "new")`
`str_split()`	Split string	`str_split(x, ",")`
`str_trim()`	Remove whitespace	`str_trim(x)`
`str_to_lower()`	Lowercase	`str_to_lower(x)`
`str_to_upper()`	Uppercase	`str_to_upper(x)`
`str_to_title()`	Title case	`str_to_title(x)`
`str_extract()`	Extract match	`str_extract(x, "[0-9]+")`
`str_count()`	Count matches	`str_count(x, "a")`

46.10 Control Flow

Conditionals

# if-else
if (condition) {
  # code if TRUE
} else if (other_condition) {
  # code if other TRUE
} else {
  # code if all FALSE
}

# Vectorized if-else
ifelse(condition, value_if_true, value_if_false)

# dplyr case_when
case_when(
  condition1 ~ value1,
  condition2 ~ value2,
  TRUE ~ default_value
)

Loops

# for loop
for (i in 1:10) {
  print(i)
}

# while loop
while (condition) {
  # code
}

# Apply functions (preferred)
lapply(list, function)   # Returns list
sapply(list, function)   # Returns vector
map(list, function)      # purrr, returns list
map_dbl(list, function)  # purrr, returns double vector

46.11 Package Management

Function	Description	Example
`install.packages()`	Install from CRAN	`install.packages("dplyr")`
`library()`	Load package	`library(dplyr)`
`require()`	Load (returns TRUE/FALSE)	`require(dplyr)`
`installed.packages()`	List installed	`installed.packages()`
`update.packages()`	Update all	`update.packages()`
`remove.packages()`	Remove package	`remove.packages("dplyr")`
`packageVersion()`	Package version	`packageVersion("dplyr")`

# R Command Reference {#sec-appendix-r} This appendix provides a comprehensive reference for R commands, covering both base R and tidyverse functions. ## Base R Fundamentals {#sec-r-base} ### Assignment and Basic Operations {#sec-r-assign} | Operation | Syntax | Example | |:----------|:-------|:--------| | Assignment | `<-` or `=` | `x <- 5` | | Print value | Variable name or `print()` | `x` or `print(x)` | | Comment | `#` | `# This is a comment` | | Help | `?` or `help()` | `?mean` | | Examples | `example()` | `example(mean)` | | Search help | `??` or `help.search()` | `??regression` | ### Arithmetic Operators {#sec-r-arith} | Operator | Description | Example | |:---------|:------------|:--------| | `+` | Addition | `5 + 3` | | `-` | Subtraction | `5 - 3` | | `*` | Multiplication | `5 * 3` | | `/` | Division | `5 / 3` | | `^` or `**` | Exponentiation | `5^3` | | `%%` | Modulo (remainder) | `5 %% 3` | | `%/%` | Integer division | `5 %/% 3` | ### Comparison Operators {#sec-r-compare} | Operator | Description | Example | |:---------|:------------|:--------| | `==` | Equal to | `x == 5` | | `!=` | Not equal to | `x != 5` | | `<` | Less than | `x < 5` | | `>` | Greater than | `x > 5` | | `<=` | Less than or equal | `x <= 5` | | `>=` | Greater than or equal | `x >= 5` | ### Logical Operators {#sec-r-logical} | Operator | Description | Example | |:---------|:------------|:--------| | `&` | AND (element-wise) | `x > 0 & x < 10` | | `|` | OR (element-wise) | `x < 0 | x > 10` | | `!` | NOT | `!is.na(x)` | | `&&` | AND (single value) | `cond1 && cond2` | | `||` | OR (single value) | `cond1 || cond2` | | `%in%` | Value in set | `x %in% c(1, 2, 3)` | | `xor()` | Exclusive OR | `xor(TRUE, FALSE)` | ## Vector Operations {#sec-r-vectors} ### Creating Vectors {#sec-r-create-vec} | Function | Description | Example | |:---------|:------------|:--------| | `c()` | Combine values | `c(1, 2, 3, 4, 5)` | | `:` | Sequence | `1:10` | | `seq()` | Sequence with step | `seq(0, 1, by = 0.1)` | | `seq_len()` | Sequence of length n | `seq_len(10)` | | `seq_along()` | Sequence along object | `seq_along(x)` | | `rep()` | Repeat values | `rep(1, times = 5)` | | `rep()` | Repeat each | `rep(1:3, each = 2)` | | `vector()` | Create empty vector | `vector("numeric", 10)` | ### Vector Indexing {#sec-r-index} | Syntax | Description | Example | |:-------|:------------|:--------| | `x[i]` | Element at position i | `x[3]` | | `x[c(i,j)]` | Multiple elements | `x[c(1, 3, 5)]` | | `x[-i]` | Exclude element | `x[-1]` | | `x[condition]` | Logical subsetting | `x[x > 5]` | | `x[1:n]` | Range of elements | `x[1:5]` | | `x["name"]` | By name | `x["first"]` | ### Vector Functions {#sec-r-vec-funcs} | Function | Description | Example | |:---------|:------------|:--------| | `length()` | Number of elements | `length(x)` | | `sum()` | Sum of elements | `sum(x)` | | `mean()` | Arithmetic mean | `mean(x)` | | `median()` | Median value | `median(x)` | | `sd()` | Standard deviation | `sd(x)` | | `var()` | Variance | `var(x)` | | `min()` | Minimum | `min(x)` | | `max()` | Maximum | `max(x)` | | `range()` | Range (min and max) | `range(x)` | | `quantile()` | Quantiles | `quantile(x, 0.5)` | | `sort()` | Sort values | `sort(x)` | | `order()` | Indices for sorting | `order(x)` | | `rev()` | Reverse order | `rev(x)` | | `unique()` | Unique values | `unique(x)` | | `table()` | Frequency table | `table(x)` | | `cumsum()` | Cumulative sum | `cumsum(x)` | | `diff()` | Differences | `diff(x)` | | `which()` | Indices where TRUE | `which(x > 5)` | | `which.max()` | Index of max | `which.max(x)` | | `which.min()` | Index of min | `which.min(x)` | ## Data Frames {#sec-r-dataframes} ### Creating Data Frames {#sec-r-create-df} | Function | Description | Example | |:---------|:------------|:--------| | `data.frame()` | Create data frame | `data.frame(x = 1:3, y = c("a", "b", "c"))` | | `tibble()` | Create tibble | `tibble(x = 1:3, y = c("a", "b", "c"))` | | `as.data.frame()` | Convert to data frame | `as.data.frame(matrix)` | | `as_tibble()` | Convert to tibble | `as_tibble(df)` | ### Data Frame Indexing {#sec-r-df-index} | Syntax | Description | Example | |:-------|:------------|:--------| | `df$col` | Column by name | `df$age` | | `df[, "col"]` | Column as vector | `df[, "age"]` | | `df["col"]` | Column as data frame | `df["age"]` | | `df[i, ]` | Row by position | `df[1, ]` | | `df[i, j]` | Element | `df[1, 2]` | | `df[condition, ]` | Filter rows | `df[df$age > 25, ]` | ### Data Frame Functions {#sec-r-df-funcs} | Function | Description | Example | |:---------|:------------|:--------| | `nrow()` | Number of rows | `nrow(df)` | | `ncol()` | Number of columns | `ncol(df)` | | `dim()` | Dimensions | `dim(df)` | | `names()` | Column names | `names(df)` | | `colnames()` | Column names | `colnames(df)` | | `rownames()` | Row names | `rownames(df)` | | `head()` | First rows | `head(df, 10)` | | `tail()` | Last rows | `tail(df, 10)` | | `str()` | Structure | `str(df)` | | `summary()` | Summary statistics | `summary(df)` | | `glimpse()` | Tidyverse structure | `glimpse(df)` | | `View()` | Open viewer | `View(df)` | ## Reading and Writing Data {#sec-r-io} ### Base R I/O {#sec-r-base-io} | Function | Description | Example | |:---------|:------------|:--------| | `read.csv()` | Read CSV | `read.csv("file.csv")` | | `read.table()` | Read table | `read.table("file.txt", header = TRUE)` | | `read.delim()` | Read tab-delimited | `read.delim("file.tsv")` | | `write.csv()` | Write CSV | `write.csv(df, "file.csv", row.names = FALSE)` | | `write.table()` | Write table | `write.table(df, "file.txt")` | | `saveRDS()` | Save R object | `saveRDS(obj, "file.rds")` | | `readRDS()` | Read R object | `readRDS("file.rds")` | | `save()` | Save multiple objects | `save(x, y, file = "data.RData")` | | `load()` | Load RData file | `load("data.RData")` | ### Tidyverse I/O (readr) {#sec-r-readr} | Function | Description | Example | |:---------|:------------|:--------| | `read_csv()` | Read CSV (fast) | `read_csv("file.csv")` | | `read_tsv()` | Read TSV | `read_tsv("file.tsv")` | | `read_delim()` | Read with delimiter | `read_delim("file.txt", delim = "|")` | | `read_fwf()` | Read fixed-width | `read_fwf("file.txt", col_positions)` | | `write_csv()` | Write CSV | `write_csv(df, "file.csv")` | | `write_tsv()` | Write TSV | `write_tsv(df, "file.tsv")` | ## Tidyverse: dplyr {#sec-r-dplyr} ### Core Verbs {#sec-r-dplyr-core} | Function | Description | Example | |:---------|:------------|:--------| | `filter()` | Filter rows | `filter(df, age > 25)` | | `select()` | Select columns | `select(df, name, age)` | | `mutate()` | Create/modify columns | `mutate(df, age_sq = age^2)` | | `arrange()` | Sort rows | `arrange(df, age)` | | `summarise()` | Summarize data | `summarise(df, mean_age = mean(age))` | | `group_by()` | Group data | `group_by(df, category)` | ### Selection Helpers {#sec-r-select-helpers} | Function | Description | Example | |:---------|:------------|:--------| | `starts_with()` | Columns starting with | `select(df, starts_with("temp"))` | | `ends_with()` | Columns ending with | `select(df, ends_with("_id"))` | | `contains()` | Columns containing | `select(df, contains("score"))` | | `matches()` | Columns matching regex | `select(df, matches("^x[0-9]"))` | | `everything()` | All columns | `select(df, name, everything())` | | `where()` | Columns where condition | `select(df, where(is.numeric))` | | `all_of()` | All specified columns | `select(df, all_of(col_names))` | | `any_of()` | Any of specified columns | `select(df, any_of(col_names))` | ### Additional dplyr Functions {#sec-r-dplyr-more} | Function | Description | Example | |:---------|:------------|:--------| | `distinct()` | Unique rows | `distinct(df, category)` | | `count()` | Count occurrences | `count(df, category)` | | `slice()` | Select rows by position | `slice(df, 1:10)` | | `slice_head()` | First n rows | `slice_head(df, n = 5)` | | `slice_tail()` | Last n rows | `slice_tail(df, n = 5)` | | `slice_sample()` | Random rows | `slice_sample(df, n = 10)` | | `slice_max()` | Rows with max values | `slice_max(df, age, n = 3)` | | `slice_min()` | Rows with min values | `slice_min(df, age, n = 3)` | | `pull()` | Extract column as vector | `pull(df, name)` | | `rename()` | Rename columns | `rename(df, new_name = old_name)` | | `relocate()` | Reorder columns | `relocate(df, name, .before = age)` | | `across()` | Apply to multiple columns | `mutate(df, across(where(is.numeric), scale))` | | `rowwise()` | Row-wise operations | `rowwise(df)` | | `ungroup()` | Remove grouping | `ungroup(df)` | | `n()` | Count in group | `summarise(df, count = n())` | | `n_distinct()` | Count unique values | `summarise(df, unique = n_distinct(x))` | | `first()` | First value | `summarise(df, first = first(x))` | | `last()` | Last value | `summarise(df, last = last(x))` | | `nth()` | Nth value | `summarise(df, third = nth(x, 3))` | ### Joins {#sec-r-joins} | Function | Description | Example | |:---------|:------------|:--------| | `left_join()` | Keep all left rows | `left_join(df1, df2, by = "id")` | | `right_join()` | Keep all right rows | `right_join(df1, df2, by = "id")` | | `inner_join()` | Keep matching rows | `inner_join(df1, df2, by = "id")` | | `full_join()` | Keep all rows | `full_join(df1, df2, by = "id")` | | `semi_join()` | Filter left by right | `semi_join(df1, df2, by = "id")` | | `anti_join()` | Filter left, no match in right | `anti_join(df1, df2, by = "id")` | | `bind_rows()` | Stack data frames | `bind_rows(df1, df2)` | | `bind_cols()` | Combine columns | `bind_cols(df1, df2)` | ## Tidyverse: tidyr {#sec-r-tidyr} | Function | Description | Example | |:---------|:------------|:--------| | `pivot_longer()` | Wide to long | `pivot_longer(df, cols = -id, names_to = "var", values_to = "val")` | | `pivot_wider()` | Long to wide | `pivot_wider(df, names_from = var, values_from = val)` | | `separate()` | Split column | `separate(df, col, into = c("a", "b"), sep = "_")` | | `separate_wider_delim()` | Split by delimiter | `separate_wider_delim(df, col, delim = "_", names = c("a", "b"))` | | `unite()` | Combine columns | `unite(df, new_col, a, b, sep = "_")` | | `drop_na()` | Remove NA rows | `drop_na(df)` | | `fill()` | Fill NA with previous | `fill(df, column, .direction = "down")` | | `replace_na()` | Replace NA values | `replace_na(df, list(x = 0))` | | `complete()` | Complete missing combinations | `complete(df, x, y)` | | `expand()` | Create all combinations | `expand(df, x, y)` | | `nest()` | Nest data | `nest(df, data = -group)` | | `unnest()` | Unnest data | `unnest(df, data)` | ## Tidyverse: ggplot2 {#sec-r-ggplot} ### Basic Structure {#sec-r-ggplot-basic} ```r ggplot(data, aes(x = x_var, y = y_var)) + geom_*() + labs() + theme_*() ``` ### Geometries {#sec-r-geoms} | Function | Plot Type | Usage | |:---------|:----------|:------| | `geom_point()` | Scatter plot | Continuous x and y | | `geom_line()` | Line plot | Continuous x and y | | `geom_smooth()` | Smoothed line | Add trend line | | `geom_bar()` | Bar chart | Counts | | `geom_col()` | Bar chart | Values | | `geom_histogram()` | Histogram | Distribution | | `geom_density()` | Density plot | Smooth distribution | | `geom_boxplot()` | Box plot | Distribution by group | | `geom_violin()` | Violin plot | Distribution shape | | `geom_jitter()` | Jittered points | Avoid overplotting | | `geom_area()` | Area plot | Filled area | | `geom_tile()` | Heatmap tiles | Grid data | | `geom_text()` | Text labels | Add text | | `geom_label()` | Text with background | Labeled points | | `geom_errorbar()` | Error bars | Uncertainty | | `geom_hline()` | Horizontal line | Reference line | | `geom_vline()` | Vertical line | Reference line | | `geom_abline()` | Diagonal line | y = a + bx | ### Aesthetics {#sec-r-aes} | Aesthetic | Description | Example | |:----------|:------------|:--------| | `x` | X-axis variable | `aes(x = var)` | | `y` | Y-axis variable | `aes(y = var)` | | `color` | Point/line color | `aes(color = group)` | | `fill` | Fill color | `aes(fill = group)` | | `size` | Point/line size | `aes(size = value)` | | `shape` | Point shape | `aes(shape = group)` | | `linetype` | Line type | `aes(linetype = group)` | | `alpha` | Transparency | `aes(alpha = value)` | | `group` | Grouping | `aes(group = id)` | ### Scales and Labels {#sec-r-scales} | Function | Description | Example | |:---------|:------------|:--------| | `labs()` | Add labels | `labs(title = "Title", x = "X", y = "Y")` | | `scale_x_continuous()` | Continuous x scale | `scale_x_continuous(limits = c(0, 100))` | | `scale_y_continuous()` | Continuous y scale | `scale_y_continuous(breaks = seq(0, 10, 2))` | | `scale_x_log10()` | Log10 x scale | `scale_x_log10()` | | `scale_color_manual()` | Manual colors | `scale_color_manual(values = c("red", "blue"))` | | `scale_fill_brewer()` | ColorBrewer palette | `scale_fill_brewer(palette = "Set1")` | | `scale_fill_viridis_d()` | Viridis discrete | `scale_fill_viridis_d()` | | `coord_flip()` | Flip coordinates | `coord_flip()` | | `coord_polar()` | Polar coordinates | `coord_polar()` | ### Faceting {#sec-r-facets} | Function | Description | Example | |:---------|:------------|:--------| | `facet_wrap()` | Wrap into panels | `facet_wrap(~variable)` | | `facet_grid()` | Grid of panels | `facet_grid(rows ~ cols)` | ## Statistical Functions {#sec-r-stats} | Function | Description | Example | |:---------|:------------|:--------| | `t.test()` | T-test | `t.test(x, y)` | | `cor()` | Correlation | `cor(x, y)` | | `cor.test()` | Correlation test | `cor.test(x, y)` | | `lm()` | Linear model | `lm(y ~ x, data = df)` | | `glm()` | Generalized linear model | `glm(y ~ x, family = binomial)` | | `aov()` | ANOVA | `aov(y ~ group, data = df)` | | `chisq.test()` | Chi-squared test | `chisq.test(table)` | | `wilcox.test()` | Wilcoxon test | `wilcox.test(x, y)` | | `ks.test()` | Kolmogorov-Smirnov test | `ks.test(x, y)` | | `summary()` | Model summary | `summary(model)` | | `coef()` | Model coefficients | `coef(model)` | | `residuals()` | Model residuals | `residuals(model)` | | `predict()` | Predictions | `predict(model, newdata)` | ## String Functions (stringr) {#sec-r-stringr} | Function | Description | Example | |:---------|:------------|:--------| | `str_length()` | String length | `str_length("hello")` | | `str_sub()` | Substring | `str_sub("hello", 1, 3)` | | `str_c()` | Concatenate | `str_c("a", "b", sep = "-")` | | `str_detect()` | Detect pattern | `str_detect(x, "pattern")` | | `str_replace()` | Replace first match | `str_replace(x, "old", "new")` | | `str_replace_all()` | Replace all matches | `str_replace_all(x, "old", "new")` | | `str_split()` | Split string | `str_split(x, ",")` | | `str_trim()` | Remove whitespace | `str_trim(x)` | | `str_to_lower()` | Lowercase | `str_to_lower(x)` | | `str_to_upper()` | Uppercase | `str_to_upper(x)` | | `str_to_title()` | Title case | `str_to_title(x)` | | `str_extract()` | Extract match | `str_extract(x, "[0-9]+")` | | `str_count()` | Count matches | `str_count(x, "a")` | ## Control Flow {#sec-r-control} ### Conditionals {#sec-r-conditionals} ```r # if-else if (condition) { # code if TRUE } else if (other_condition) { # code if other TRUE } else { # code if all FALSE } # Vectorized if-else ifelse(condition, value_if_true, value_if_false) # dplyr case_when case_when( condition1 ~ value1, condition2 ~ value2, TRUE ~ default_value ) ``` ### Loops {#sec-r-loops} ```r # for loop for (i in 1:10) { print(i) } # while loop while (condition) { # code } # Apply functions (preferred) lapply(list, function) # Returns list sapply(list, function) # Returns vector map(list, function) # purrr, returns list map_dbl(list, function) # purrr, returns double vector ``` ## Package Management {#sec-r-packages} | Function | Description | Example | |:---------|:------------|:--------| | `install.packages()` | Install from CRAN | `install.packages("dplyr")` | | `library()` | Load package | `library(dplyr)` | | `require()` | Load (returns TRUE/FALSE) | `require(dplyr)` | | `installed.packages()` | List installed | `installed.packages()` | | `update.packages()` | Update all | `update.packages()` | | `remove.packages()` | Remove package | `remove.packages("dplyr")` | | `packageVersion()` | Package version | `packageVersion("dplyr")` |