• newscoding@gmail.com

Yearly Archive2020

Cheatsheet for R statistics

By Chenyan Jia

Here is the R cheatsheet if you are interested in using R to do statistical analysis.

§  Install Packages









§  Functions

FunctionWhat it Calculates
setwd(<file path>)Setting a working directory
getwd()If you ever forget the path to your working directory, type getwd()
read.table()Importing a data set.i.e.  Duncan < – read.table(“Duncan.txt”, header=TRUE)
x < – c( , , …)i.e. x <- c(1, 2, 3, 4, 5)
data()If you have a data set, you can use built-in function
attach()Avoid repeating the data name (to use a specific variable name)
objects()Lists the names of variables and functions residing in R workspace
rm(list=ls())Remove everything in the environment
names()Lists the variables in data
head()/tail()Lists the first/last 6 data
str()structure of data
dim()dimension of data
sort(x)The numbers in vector x in increasing order
rank(x)Ranks of the numbers (in increasing order) in vector x
Univariate analysis
sum()Sum of the numbers in vector x.
mean(x)Mean of the numbers in vector x.
median(x)Median of the numbers in vector x
var(x)Estimated variance of the population from which the numbers in vector x are sampled
sd(x)Estimated standard deviation of the population from which the numbers in vector x are sampled
length(x) Sample size of x
hist(x)Histogram of x
Bivariate analysis
cor(x,y)Correlation coefficient between the numbers in vector x and the numbers in vector y
cov(x,y)The covariance of the x and ycorrelation
cor.test(x,y)Test for correlation between paired samplesi.e. cor.test(X, Y, alternative=”two.sided”, method=”spearman”)
plot(x,y)Plot of x and y
For t-test
qt()i.e. Critical value (2-tailed) qt(1 – alpha/2, df=n-2)
pt()i.e. p-value (2-tailed)  ( 1 – pt(test_stat, df=n-2) ) * 2
For z-test
qnorm()i.e. Critical value (two-tailed)  qnorm(1-alpha/2)
pnorm()i.e. p-value (two-tailed) pnorm(test_stat)*2
fisherz()Convert correlations to Fishers z’s
fisherz2r()Convert Fishers z’s to r
For chi-squre
qchisq()i.e. Critical value (two-tailed) qchisq(1-alpha, df=1)
pchisq()i.e. p-value ( 1 – pchisq(test_stat, df=1) )
Three Variables
Multiple Correlation  (use the linear regression)
lm(Y~X + Z)mod <- lm(Y~X + Z)
Partial correlation between Y and X after controlling for the effect of Z
pcor(cbind(Y, X, Z))Each cell gives pairwise partial correlations for each pair of variables given others.
pcor.test(Y, X, Z)Significance testing
Semi-partial correlation between X and Y after controlling for Z
ppcor::spcor(cbind(Y, X, Z))Gives the semi-partial correlation
ppcor::spcor.test(Y, X, Z)Significance testing
Linear Regression Model
lm(Y ~ X)Linear regression analysis with the numbers in vector y as the dependent variable and the numbers in vector x as the independent variable.
Logistic Regression Model
glm(Y ~ X, data = , family = binomial)glm is used to fit generalized linear models, specified by giving a symbolic description of the linear predictor and a description of the error distribution.
Hierarchical Linear Model
lmer()Fit a linear mixed-effects model (LMM) to data, via REML or maximum likelihood.
Principal Components Analysis
prcomp()Performs a principal components analysis on the given data matrix and returns the results as an object of class prcomp.

Keep updating.

Fake news or misinformation detection algorithms and datasets

By Chenyan Jia

In this post, newscoding recommends several fake news or misinformation detection algorithms or datasets (especially misinformation related to COVID-19) that are used by researchers or Internet companies (*the following list is in no particular order of importance).

No. 1

Twitter: Updating our Approach to Misleading Information

In this article, Twitter introduces new labels and warning messages that will provide additional context and information on some Tweets containing disputed or misleading information related to COVID-19.


Triple Branch BERT Siamese Network for fake news classification on LIAR-PLUS dataset

A research paper published in the Proceedings of the First Workshop on Fact Extraction and VERification (FEVER) Where is your Evidence: Improving Fact-checking by Justification Modeling” extended the LIAR dataset to the LIAR-PLUS dataset. The LIAR dataset was introduced by (Wang, 2017) and consists of 12,836 short statements taken from POLITIFACT and labeled by humans (Alhindi, Petridis, Muresan, 2018).



Metafact is a health fact-checking platform using a community of verified experts. The website has an intuitive interface and contains highly COVID-19 related content.


Neural Covidex applies state-of-the-art neural network models and AI techniques to answer questions using the COVID-19 Open Research Dataset (CORD-19) provided by the Allen Institute for AI (data release of May 26, 2020), which currently contains over 47,000 scholarly articles. In addition, Neural Covidex also supports search on randomized controlled trials related to COVID-19 provided by Trialstreamer.


Facebook: Using AI to detect COVID-19 misinformation and exploitative content

Facebook works with over 60 fact-checking organizations that review content in more than 50 languages in order to prevent the spread of misinformation during the COVID-19 pandemic.


COVID-19 related misinformation test sets

Researchers from the Center for Artificial Intelligence Research (CAiRE) posted COVID-19 related misinformation test sets newly proposed in their “Misinformation has High Perplexity” paper.


USC Melady Lab: Coronavirus on Social Media Misinformation Analysis

USC Melady Lab identifies unreliable, misleading and clickbait information shared on Twitter regarding COVID-19 from 2020-03-01 – 2020-05-03.

(keep updating)


Alhindi, T, Petridis, S, & Muresan, S. (2018). Where is your Evidence: Improving Fact-checking by Justification Modeling. In Proceedings of the First Workshop on Fact Extraction and VERification (FEVER), Brussels, Belgium.

Wang, Y. W. (2017). Liar, liar pants on fire: A new benchmark dataset for fake news detection. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL 2017), Vancouver, BC, Canada.

Mediation Package in R

By Chenyan Jia


<code>results <- mediate(model.mediator = mod3, model.y = mod2, treat='exercise', mediator='food', boot=TRUE, sims=500)</code>
results <- mediate(model.mediator = mod3, model.y = mod2, treat='exercise', mediator='food', boot=TRUE, sims=500)
# "model.mediator": a fitted model object for mediator.
# "model.y": a fitted model object for outcome (using both the focal and mediator variables)
# "treat" a character string indicating the name of the treatment variable
# "mediator": a character string indicating the name of the mediator variable

# Typically bootstrap sample size ranges between 1000 ~ 5000. Remember, only use small simulations because our data are small.

How to decipher the results?
## ACME: Average Causal Mediation Effects
## ADE: Average Direct Effects
## Total Effect: Sum of a mediation (indirect) effect and a direct effect
## Prop. Mediated: Size of the average causal mediation effects relative to the total effect.
## When ACME is significant and ADE is not significant, a complete mediation happens (Direct effects are not significant any more because of the mediator ) 

An Example of Results

EgoWeb 2.0: a tool for social network analysis

By Chenyan Jia

If you are interested in using social network analysis to conduct research, you might want to explore this tool called EgoWeb 2.0 developed by David P. Kennedy.

Website Link: https://www.qualintitative.com/egoweb/
GitHub Link: https://github.com/qualintitative/egoweb
Install Instructions: https://www.qualintitative.com/wiki/doku.php/install

In order to use EgoWeb 2.0, the first step is to install AMPPS. Right now, EgoWeb 2.0 has upgraded its Mac version to 64-bit and works well in the latest Mac operating system. If you are Windows users, EgoWeb 2.0 functions well too.


  1. Allows researchers to use R to process data and provide baseline R codes
  2. Detailed instructions and updates


  1. Many installation steps (8-9 steps), including creating database and import database structure from SQL file
  2. Less intuitive than some other tools