Luc Clair
University of Winnipeg | GECON 7201
By now, you should have intalled the following software:
Highly extensible through packages
The strength and flexibility of R largely come from its vast package ecosystem
R packages are collections of functions, data sets, and documentation bundled together to extend the functionality of base R
stats, graphics, and utils)It acts as the direct interface between the user and the R interpreter
rm()| Command | Purpose |
|---|---|
getwd() |
Get the current working directory |
s etwd("path/to/folder") |
Set a new working directory |
list.files() |
List files in the current working directory |
regression_analysis.R, clean_data.RCtrl+Enter (or Cmd on Mac) to run a line of code
Ctrl+Enter or press the A package is a collection of:
?function_nameinstall.packages() to install, library() to loadpackage_name::function() syntax| Operator | Description |
|---|---|
+ |
Addition |
- |
Subtraction |
* |
Multiplication |
/ |
Division |
^ or ** |
Exponentiation |
%% |
Modulo (remainder) |
%/% |
Integer division (quotient) |
| Operator | Meaning | Example | Result |
|---|---|---|---|
== |
Equal to | 5 == 5 |
TRUE |
!= |
Not equal to | 5 != 3 |
TRUE |
< |
Less than | 3 < 5 |
TRUE |
<= |
Less than or equal to | 5 <= 5 |
TRUE |
> |
Greater than | 7 > 4 |
TRUE |
>= |
Greater than or equal to | 4 >= 4 |
TRUE |
TRUE or FALSEif statements)| Operator | Name | Description |
|---|---|---|
! |
NOT | Reverses a logical value (TRUE ⇄ FALSE) |
& |
AND (vectorized) | TRUE only if both conditions are TRUE |
| |
OR (vectorized) | TRUE if either condition is TRUE |
&& |
AND (first element only) | Evaluates only the first element |
| ` | OR (first element only) | Evaluates only the first element |
!! as a short hand for negation%in%%in%>,==, etc) are evaluated before Boolean operators (& and |)<- or = to handle assignment<- is normally read aloud as “gets”->), though it is less common= for assignment<- for assignment, since = also has specific role for evaluation within functionspi=2)c()Types of vectors:
c(1.5, 2.8)c(1L, 2L) (the L denotes integers)c("apple", "banana")c(TRUE, FALSE, TRUE): between the numbers or use seq()rep()| Function(s) | Description |
|---|---|
length(x) |
Number of elements |
sum(x) |
Total sum |
mean(x), median(x) |
Average, middle value |
var(x), sd(x) |
Variance and standard deviation |
min(x), max(x) |
Extremes |
sort(x), rank(x) |
Sorting and ranking |
which(x > 15) |
Indices where condition is true |
any(x > 10), all(x > 10) |
Logical checks |
data.frame() commanddf$varname syntax, e.g.,df$varname syntax and assign the variable values, e.g.,| Function | Description |
|---|---|
str(df) |
Structure of the data frame |
summary(df) |
Summary statistics |
head(df) |
First 6 rows |
nrow(df) |
Number of rows |
ncol(df) |
Number of columns |
names(df) |
Column names |
df$varname |
Access a column |
subset(df, age > 25) |
Filter rows |
matrix() function%*%t()solve()det()diag()as.matrix() when you need to perform numerical matrix operations or use functions that require matrix inputsstr().RData or .rda filesload() to open a file that contains saved R objects (e.g., data frames, vectors, models).csv Fileread.csv() (comma-separated) or read.table() (more general).xlsx, .xls) requires: readxl (does not need Excel installed).dta), SPSS (.sav), and SAS (.sas7bdat) data into R requires the haven package.RDatasave() to save one or more R objects.csvwrite.csv().xlsx)writexl or openxlsxhaven| Format | Read Function | Write Function | Package Required |
|---|---|---|---|
.RData |
load (" file.RData") |
sa ve (d f, file = ...) |
Base R |
.csv |
read.c sv ("file.csv") |
write.c sv (d f, file = ...) |
Base R |
| Excel | read_excel() |
write_xlsx() |
readxl, writexl |
| Stata | read_dta() |
write_dta() |
haven |
| SPSS | read_sav() |
write_sav() |
haven |
| SAS | read_sas() |
write_sas() |
haven |
object[rows,columns]x[i] gives the ith object in the vectordf[row, col]x[1])df[i,j] will select the element in the ith row of the jth columndf[i,]df[,j]The plot() function is a versatile command in base R for creating simple visualizations, most commonly:
| Argument | Description |
|---|---|
main |
Title of the plot |
xlab |
Label for \(x\)-axis |
ylab |
Label for \(y\)-axis |
xlim |
Set \(x\)-axis range |
ylim |
Set \(y\)-axis range |
col |
Color of points or lines |
pch |
Plotting character (symbol shape) |
type |
"p" for points (default), "l" for lines, "b" for both |
| Plot Type | Command Example | Description |
|---|---|---|
| Histogram | hist(x) |
Distribution of a numeric variable |
| Boxplot | boxplot(x) |
Summary of distribution (median, IQR) |
| Barplot | bar p lot(table(x)) |
Frequencies of categorical values |
| Time Series | plot . ts(ts_object) |
Line plot optimized for time series |
| QQ Plot | qqnorm ( x); qqline(x) |
Compares data to a normal distribution |
| Pairs Plot | pai r s(data_frame) |
Matrix of scatterplots for multiple variables |
| Density Plot | pl o t(density(x)) |
Smoothed version of a histogram |
Control flow constructs are programming tools that allow your R code to:
if/else Statementsif/else statementsif/else Statements (cont.)Use if / else when you want your program to:
ifelseifelse()ifelse(test, yes, no)
test: A logical statementyes: Value of return if test is TRUEno: Value if test is FALSEifelse (cont.)for loop in R is used to repeat a block of code for each value in a sequencefunction() commandThe global environment is the main workspace in R where all your user-defined objects are stored during a session
d$x) are not the same as variables in the global environment (e.g., x)x belongs to df$, i.e., df$xwith(), e.g., with(df, mean(x))attach()rm() commandrm(list=ls())x1, tmp, or df1+, -, *, /, ==, <, >=, etc| Good Practice | Poor Practice |
|---|---|
plot() or lm()# to describe what your code is doing#>Cmd/Ctrl+Shift+R?