Luc Clair
University of Winnipeg | GECON 7201
By now, you should have intalled the following software:
Highly extensible through packages
The strength and flexibility of R largely come from its vast package ecosystem
R packages are collections of functions, data sets, and documentation bundled together to extend the functionality of base R
stats
, graphics
, and utils
)It acts as the direct interface between the user and the R interpreter
rm()
Command | Purpose |
---|---|
getwd() |
Get the current working directory |
s etwd("path/to/folder") |
Set a new working directory |
list.files() |
List files in the current working directory |
regression_analysis.R
, clean_data.R
Ctrl+Enter
(or Cmd
on Mac) to run a line of codeCtrl+Enter
or press the A package is a collection of:
?function_name
install.packages()
to install, library()
to loadpackage_name::function()
syntaxOperator | Description |
---|---|
+ |
Addition |
- |
Subtraction |
* |
Multiplication |
/ |
Division |
^ or ** |
Exponentiation |
%% |
Modulo (remainder) |
%/% |
Integer division (quotient) |
Operator | Meaning | Example | Result |
---|---|---|---|
== |
Equal to | 5 == 5 |
TRUE |
!= |
Not equal to | 5 != 3 |
TRUE |
< |
Less than | 3 < 5 |
TRUE |
<= |
Less than or equal to | 5 <= 5 |
TRUE |
> |
Greater than | 7 > 4 |
TRUE |
>= |
Greater than or equal to | 4 >= 4 |
TRUE |
TRUE
or FALSE
if
statements)Operator | Name | Description |
---|---|---|
! |
NOT | Reverses a logical value (TRUE ⇄ FALSE ) |
& |
AND (vectorized) | TRUE only if both conditions are TRUE |
| |
OR (vectorized) | TRUE if either condition is TRUE |
&& |
AND (first element only) | Evaluates only the first element |
` | OR (first element only) | Evaluates only the first element |
!
!
as a short hand for negation%in%
%in%
>
,==
, etc) are evaluated before Boolean operators (& and |)<-
or =
to handle assignment<-
is normally read aloud as “gets”->
), though it is less common=
for assignment<-
for assignment, since =
also has specific role for evaluation within functionspi=2
)c()
Types of vectors:
c(1.5, 2.8)
c(1L, 2L)
(the L
denotes integers)c("apple", "banana")
c(TRUE, FALSE, TRUE)
:
between the numbers or use seq()
rep()
Function(s) | Description |
---|---|
length(x) |
Number of elements |
sum(x) |
Total sum |
mean(x) , median(x) |
Average, middle value |
var(x) , sd(x) |
Variance and standard deviation |
min(x) , max(x) |
Extremes |
sort(x) , rank(x) |
Sorting and ranking |
which(x > 15) |
Indices where condition is true |
any(x > 10) , all(x > 10) |
Logical checks |
data.frame()
commanddf$varname
syntax, e.g.,df$varname
syntax and assign the variable values, e.g.,Function | Description |
---|---|
str(df) |
Structure of the data frame |
summary(df) |
Summary statistics |
head(df) |
First 6 rows |
nrow(df) |
Number of rows |
ncol(df) |
Number of columns |
names(df) |
Column names |
df$varname |
Access a column |
subset(df, age > 25) |
Filter rows |
matrix()
function%*%
t()
solve()
det()
diag()
as.matrix()
when you need to perform numerical matrix operations or use functions that require matrix inputsstr()
.RData
or .rda
filesload()
to open a file that contains saved R objects (e.g., data frames, vectors, models).csv
Fileread.csv()
(comma-separated) or read.table()
(more general).xlsx
, .xls
) requires: readxl
(does not need Excel installed).dta
), SPSS (.sav
), and SAS (.sas7bdat
) data into R requires the haven
package.RData
save()
to save one or more R objects.csv
write.csv()
.xlsx
)writexl
or openxlsx
haven
Format | Read Function | Write Function | Package Required |
---|---|---|---|
.RData |
load (" file.RData") |
sa ve (d f, file = ...) |
Base R |
.csv |
read.c sv ("file.csv") |
write.c sv (d f, file = ...) |
Base R |
Excel | read_excel() |
write_xlsx() |
readxl , writexl |
Stata | read_dta() |
write_dta() |
haven |
SPSS | read_sav() |
write_sav() |
haven |
SAS | read_sas() |
write_sas() |
haven |
object[rows,columns]
x[i]
gives the ith object in the vectordf[row, col]
x[1]
)df[i,j]
will select the element in the ith row of the jth columndf[i,]
df[,j]
The plot()
function is a versatile command in base R for creating simple visualizations, most commonly:
Argument | Description |
---|---|
main |
Title of the plot |
xlab |
Label for \(x\)-axis |
ylab |
Label for \(y\)-axis |
xlim |
Set \(x\)-axis range |
ylim |
Set \(y\)-axis range |
col |
Color of points or lines |
pch |
Plotting character (symbol shape) |
type |
"p" for points (default), "l" for lines, "b" for both |
Plot Type | Command Example | Description |
---|---|---|
Histogram | hist(x) |
Distribution of a numeric variable |
Boxplot | boxplot(x) |
Summary of distribution (median, IQR) |
Barplot | bar p lot(table(x)) |
Frequencies of categorical values |
Time Series | plot . ts(ts_object) |
Line plot optimized for time series |
QQ Plot | qqnorm ( x); qqline(x) |
Compares data to a normal distribution |
Pairs Plot | pai r s(data_frame) |
Matrix of scatterplots for multiple variables |
Density Plot | pl o t(density(x)) |
Smoothed version of a histogram |
Control flow constructs are programming tools that allow your R code to:
if
/else
Statementsif
/else
statementsif
/else
Statements (cont.)Use if / else when you want your program to:
ifelse
ifelse()
ifelse(test, yes, no)
test
: A logical statementyes
: Value of return if test
is TRUE
no
: Value if test
is FALSE
ifelse
(cont.)for
loop in R is used to repeat a block of code for each value in a sequencefunction()
commandThe global environment is the main workspace in R where all your user-defined objects are stored during a session
d$x
) are not the same as variables in the global environment (e.g., x
)x
belongs to df
$
, i.e., df$x
with()
, e.g., with(df, mean(x))
attach()
rm()
commandrm(list=ls())
x1
, tmp
, or df1
+
, -
, *
, /
, ==
, <
, >=
, etcGood Practice | Poor Practice |
---|---|
plot()
or lm()
#
to describe what your code is doing#>
Cmd/Ctrl+Shift+R
?