• newscoding@gmail.com

Cheatsheet for R statistics

Cheatsheet for R statistics

By Chenyan Jia

Here is the R cheatsheet if you are interested in using R to do statistical analysis.

§  Install Packages









§  Functions

FunctionWhat it Calculates
setwd(<file path>)Setting a working directory
getwd()If you ever forget the path to your working directory, type getwd()
read.table()Importing a data set.i.e.  Duncan < – read.table(“Duncan.txt”, header=TRUE)
x < – c( , , …)i.e. x <- c(1, 2, 3, 4, 5)
data()If you have a data set, you can use built-in function
attach()Avoid repeating the data name (to use a specific variable name)
objects()Lists the names of variables and functions residing in R workspace
rm(list=ls())Remove everything in the environment
names()Lists the variables in data
head()/tail()Lists the first/last 6 data
str()structure of data
dim()dimension of data
sort(x)The numbers in vector x in increasing order
rank(x)Ranks of the numbers (in increasing order) in vector x
Univariate analysis
sum()Sum of the numbers in vector x.
mean(x)Mean of the numbers in vector x.
median(x)Median of the numbers in vector x
var(x)Estimated variance of the population from which the numbers in vector x are sampled
sd(x)Estimated standard deviation of the population from which the numbers in vector x are sampled
length(x) Sample size of x
hist(x)Histogram of x
Bivariate analysis
cor(x,y)Correlation coefficient between the numbers in vector x and the numbers in vector y
cov(x,y)The covariance of the x and ycorrelation
cor.test(x,y)Test for correlation between paired samplesi.e. cor.test(X, Y, alternative=”two.sided”, method=”spearman”)
plot(x,y)Plot of x and y
For t-test
qt()i.e. Critical value (2-tailed) qt(1 – alpha/2, df=n-2)
pt()i.e. p-value (2-tailed)  ( 1 – pt(test_stat, df=n-2) ) * 2
For z-test
qnorm()i.e. Critical value (two-tailed)  qnorm(1-alpha/2)
pnorm()i.e. p-value (two-tailed) pnorm(test_stat)*2
fisherz()Convert correlations to Fishers z’s
fisherz2r()Convert Fishers z’s to r
For chi-squre
qchisq()i.e. Critical value (two-tailed) qchisq(1-alpha, df=1)
pchisq()i.e. p-value ( 1 – pchisq(test_stat, df=1) )
Three Variables
Multiple Correlation  (use the linear regression)
lm(Y~X + Z)mod <- lm(Y~X + Z)
Partial correlation between Y and X after controlling for the effect of Z
pcor(cbind(Y, X, Z))Each cell gives pairwise partial correlations for each pair of variables given others.
pcor.test(Y, X, Z)Significance testing
Semi-partial correlation between X and Y after controlling for Z
ppcor::spcor(cbind(Y, X, Z))Gives the semi-partial correlation
ppcor::spcor.test(Y, X, Z)Significance testing
Linear Regression Model
lm(Y ~ X)Linear regression analysis with the numbers in vector y as the dependent variable and the numbers in vector x as the independent variable.
Logistic Regression Model
glm(Y ~ X, data = , family = binomial)glm is used to fit generalized linear models, specified by giving a symbolic description of the linear predictor and a description of the error distribution.
Hierarchical Linear Model
lmer()Fit a linear mixed-effects model (LMM) to data, via REML or maximum likelihood.
Principal Components Analysis
prcomp()Performs a principal components analysis on the given data matrix and returns the results as an object of class prcomp.

Keep updating.