• newscoding@gmail.com

Author Archive

Cheatsheet for R statistics

By Chenyan Jia

Here is the R cheatsheet if you are interested in using R to do statistical analysis.

§  Install Packages

install.packages(“psych”)

library(psych)

install.packages(“ltm”)

library(ltm)

install.packages(“ppcor”)

library(ppcor)

install.packages(“lme4”)

library(lme4)

§  Functions

FunctionWhat it Calculates
Dataset
setwd(<file path>)Setting a working directory
getwd()If you ever forget the path to your working directory, type getwd()
read.table()Importing a data set.i.e.  Duncan < – read.table(“Duncan.txt”, header=TRUE)
x < – c( , , …)i.e. x <- c(1, 2, 3, 4, 5)
summary()Summary
data()If you have a data set, you can use built-in function
attach()Avoid repeating the data name (to use a specific variable name)
objects()Lists the names of variables and functions residing in R workspace
rm(list=ls())Remove everything in the environment
names()Lists the variables in data
head()/tail()Lists the first/last 6 data
str()structure of data
dim()dimension of data
sort(x)The numbers in vector x in increasing order
rank(x)Ranks of the numbers (in increasing order) in vector x
Univariate analysis
sum()Sum of the numbers in vector x.
mean(x)Mean of the numbers in vector x.
median(x)Median of the numbers in vector x
var(x)Estimated variance of the population from which the numbers in vector x are sampled
sd(x)Estimated standard deviation of the population from which the numbers in vector x are sampled
length(x) Sample size of x
hist(x)Histogram of x
Bivariate analysis
cor(x,y)Correlation coefficient between the numbers in vector x and the numbers in vector y
cov(x,y)The covariance of the x and ycorrelation
cor.test(x,y)Test for correlation between paired samplesi.e. cor.test(X, Y, alternative=”two.sided”, method=”spearman”)
plot(x,y)Plot of x and y
For t-test
qt()i.e. Critical value (2-tailed) qt(1 – alpha/2, df=n-2)
pt()i.e. p-value (2-tailed)  ( 1 – pt(test_stat, df=n-2) ) * 2
For z-test
qnorm()i.e. Critical value (two-tailed)  qnorm(1-alpha/2)
pnorm()i.e. p-value (two-tailed) pnorm(test_stat)*2
fisherz()Convert correlations to Fishers z’s
fisherz2r()Convert Fishers z’s to r
For chi-squre
qchisq()i.e. Critical value (two-tailed) qchisq(1-alpha, df=1)
pchisq()i.e. p-value ( 1 – pchisq(test_stat, df=1) )
Three Variables
Multiple Correlation  (use the linear regression)
lm(Y~X + Z)mod <- lm(Y~X + Z)
Partial correlation between Y and X after controlling for the effect of Z
pcor(cbind(Y, X, Z))Each cell gives pairwise partial correlations for each pair of variables given others.
pcor.test(Y, X, Z)Significance testing
Semi-partial correlation between X and Y after controlling for Z
ppcor::spcor(cbind(Y, X, Z))Gives the semi-partial correlation
ppcor::spcor.test(Y, X, Z)Significance testing
Modeling
Linear Regression Model
lm(Y ~ X)Linear regression analysis with the numbers in vector y as the dependent variable and the numbers in vector x as the independent variable.
anova()Anova
Logistic Regression Model
glm(Y ~ X, data = , family = binomial)glm is used to fit generalized linear models, specified by giving a symbolic description of the linear predictor and a description of the error distribution.
Hierarchical Linear Model
lmer()Fit a linear mixed-effects model (LMM) to data, via REML or maximum likelihood.
Principal Components Analysis
prcomp()Performs a principal components analysis on the given data matrix and returns the results as an object of class prcomp.

Keep updating.

newscoding

Fake news or misinformation detection algorithms and datasets

By Chenyan Jia

In this post, newscoding recommends several fake news or misinformation detection algorithms or datasets (especially misinformation related to COVID-19) that are used by researchers or Internet companies (*the following list is in no particular order of importance).

No. 1

Twitter: Updating our Approach to Misleading Information

In this article, Twitter introduces new labels and warning messages that will provide additional context and information on some Tweets containing disputed or misleading information related to COVID-19.

No.2

Triple Branch BERT Siamese Network for fake news classification on LIAR-PLUS dataset

A research paper published in the Proceedings of the First Workshop on Fact Extraction and VERification (FEVER) Where is your Evidence: Improving Fact-checking by Justification Modeling” extended the LIAR dataset to the LIAR-PLUS dataset. The LIAR dataset was introduced by (Wang, 2017) and consists of 12,836 short statements taken from POLITIFACT and labeled by humans (Alhindi, Petridis, Muresan, 2018).

No.3

https://metafact.io/

Metafact is a health fact-checking platform using a community of verified experts. The website has an intuitive interface and contains highly COVID-19 related content.

No.4

Neural Covidex applies state-of-the-art neural network models and AI techniques to answer questions using the COVID-19 Open Research Dataset (CORD-19) provided by the Allen Institute for AI (data release of May 26, 2020), which currently contains over 47,000 scholarly articles. In addition, Neural Covidex also supports search on randomized controlled trials related to COVID-19 provided by Trialstreamer.

No.5

Facebook: Using AI to detect COVID-19 misinformation and exploitative content

Facebook works with over 60 fact-checking organizations that review content in more than 50 languages in order to prevent the spread of misinformation during the COVID-19 pandemic.

No.6

COVID-19 related misinformation test sets

Researchers from the Center for Artificial Intelligence Research (CAiRE) posted COVID-19 related misinformation test sets newly proposed in their “Misinformation has High Perplexity” paper.

No.7

USC Melady Lab: Coronavirus on Social Media Misinformation Analysis

USC Melady Lab identifies unreliable, misleading and clickbait information shared on Twitter regarding COVID-19 from 2020-03-01 – 2020-05-03.

(keep updating)

References:

Alhindi, T, Petridis, S, & Muresan, S. (2018). Where is your Evidence: Improving Fact-checking by Justification Modeling. In Proceedings of the First Workshop on Fact Extraction and VERification (FEVER), Brussels, Belgium.

Wang, Y. W. (2017). Liar, liar pants on fire: A new benchmark dataset for fake news detection. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL 2017), Vancouver, BC, Canada.

newscoding

Mediation Package in R

By Chenyan Jia

install.packages("mediation")
library(mediation)

<code>results <- mediate(model.mediator = mod3, model.y = mod2, treat='exercise', mediator='food', boot=TRUE, sims=500)</code>
results <- mediate(model.mediator = mod3, model.y = mod2, treat='exercise', mediator='food', boot=TRUE, sims=500)
# "model.mediator": a fitted model object for mediator.
# "model.y": a fitted model object for outcome (using both the focal and mediator variables)
# "treat" a character string indicating the name of the treatment variable
# "mediator": a character string indicating the name of the mediator variable

# Typically bootstrap sample size ranges between 1000 ~ 5000. Remember, only use small simulations because our data are small.

How to decipher the results?
## ACME: Average Causal Mediation Effects
## ADE: Average Direct Effects
## Total Effect: Sum of a mediation (indirect) effect and a direct effect
## Prop. Mediated: Size of the average causal mediation effects relative to the total effect.
## When ACME is significant and ADE is not significant, a complete mediation happens (Direct effects are not significant any more because of the mediator ) 

An Example of Results

newscoding

EgoWeb 2.0: a tool for social network analysis

By Chenyan Jia

If you are interested in using social network analysis to conduct research, you might want to explore this tool called EgoWeb 2.0 developed by David P. Kennedy.

Website Link: https://www.qualintitative.com/egoweb/
GitHub Link: https://github.com/qualintitative/egoweb
Install Instructions: https://www.qualintitative.com/wiki/doku.php/install

In order to use EgoWeb 2.0, the first step is to install AMPPS. Right now, EgoWeb 2.0 has upgraded its Mac version to 64-bit and works well in the latest Mac operating system. If you are Windows users, EgoWeb 2.0 functions well too.

Strengths

  1. Allows researchers to use R to process data and provide baseline R codes
  2. Detailed instructions and updates

Downsides

  1. Many installation steps (8-9 steps), including creating database and import database structure from SQL file
  2. Less intuitive than some other tools

newscoding

Knight Center: Hands-on machine learning solutions for journalists

Newscoding recommendation: Machine learning is a buzzword nowadays. John Keefe, the technical architect for bots and machine learning at Quartz, will guide you step-by-step through the concepts and codes of machine learning in the journalism field.

Registration link: https://journalismcourses.org/MACH0919.html

JournalismCourses.org is is an online training platform of the Knight Center for Journalism in the Americas at the University of Texas at Austin. This program of free and low-cost online courses is possible in part thanks to a generous grant from the Knight Foundation.

newscoding

Top websites for journalists interested in coding

By Chenyan Jia

GitHub: These codes are from IRE’s multi-day Python Bootcamp for journalists.

https://github.com/ireapps/coding-for-journalists

https://coding-for-journalists.readthedocs.io/en/latest

Jonathan Soma: A blog written by Professor Jonathan Soma from Columbia University’s Journalism School.

http://jonathansoma.com/tutorials

Professor Soma also has a website named investigate.ai.

https://investigate.ai

Stack Overflow: The largest, most trusted online community for developers to learn, share their knowledge, and build their careers.

https://stackoverflow.com

IPTC: Open standards for the news media

http://iptc.org/standards/

IRE (Investigative Reporters and Editors)

http://www.ire.org/

NICAR: The National Institute for Computer-Assisted Reporting maintains a library of federal databases, employs journalism students, and trains journalists in the practical skills of getting and analyzing electronic information.

https://www.ire.org/nicar

DataJournalism: DataJournalism.com is created by the European Journalism Centre and supported by Google News Initiative. This website provides data journalists with free resources, materials, online video courses, and community forums. 

https://datajournalism.com

CodeActually: CodeActually is a blog-style website developed by Cindy Royal as part of the Knight Journalism Fellowship at Stanford University

http://codeactually.com

newscoding

Mooc: News Algorithms: The Impact of Automation and AI on Journalism

Newscoding recommends: Nicholas Diakopoulos, assistant professor at Northwestern University and director of its Computational Journalism Lab has an open online course (MOOC) teaching how news media are using algorithms, automation and AI to do journalism and how they can apply these tools in their own work.

This four-week course was from Feb. 11 to March 10, 2019, supported by the Knight Center for Journalism in the Americas at the University of Texas at Austin.

See details below:

https://journalismcourses.org/ALG0119.html

https://knightcenter.utexas.edu/blog/00-20484-news-algorithms-sign-now-free-online-course-learn-about-impact-ai-and-automation-journ

newscoding

Conference: Hands-on Machine Learning for Journalists in ONA19

Newscoding recommends: ONA19 is going to hold a 90-minute training session providing practical, hands-on experience using machine learning to manage documents, images and data records.

Speakers are listed below.

John Keefe – Technical Architect, Bots & Machine Learning, Quartz
@jkeefe | https://johnkeefe.net

Jeremy B. Merrill – Machine Learning Journalist, Quartz – AI Studio
@jeremybmerrill | http://jeremybmerrill.com

Victoria Cabales – AI Studio Fellow, Quartz

  • Friday – 11:00 AM – 12:30 PM
  • Treme – 2nd Floor
  • #ONA19
Click to see details
newscoding

Article: How A.I. was used in Hong Kong Protests

Newscoding Recommends: The New York Times recently published an article How A.I. Helped Improve Crowd Counting in Hong Kong Protests.

This is an example created by The New York Times showing how artificial intelligence can be used to detect moving people and objects.

Read more in The New York Times:

How A.I. Helped Improve Crowd Counting in Hong Kong Protests.

newscoding

Bootcamp: Practical Machine Learning for Journalists (Oct 26 and 27)

Newscoding recommends: John Keefe announced the dates of his machine learning workshop:

https://www.nytimes.com/interactive/2019/07/03/world/asia/hong-kong-protest-crowd-ai.html

Description

BOOTCAMP: PRACTICAL MACHINE LEARNING FOR JOURNALISTS with John Keefe, the technical architect for bots and machine learning at Quartz

This intensive two-day bootcamp meets Saturday, October 26 from 10 am to 4 pm and Sunday, October 27 from 11:30 am to 5:30 pm.

The cost for this workshop is $750; $600 early bird rate before September 9

Level: Advanced

Welcome to the next generation of data journalism: Recognize cases when machine learning can help in investigations, use existing and custom-made tools to tackle real-world reporting issues, and avoid bias and error in your work!

Sifting through terabytes of documents or images might take years — unless you teach a computer to do it for you. Like a bloodhound, a machine-learning algorithm can take a “sniff,” or sample, of what you’re looking for and find “more like this.” In this class, students will learn to recognize cases when machine learning might help solve such reporting problems, to use existing and custom-made tools to tackle real-world issues, and to identify and avoid bias and error in their work. Through hands-on experience, students will get an introduction to using these methods on any beat.

WHO IS IT FOR?

Take this class if you are a data journalist or anyone looking to learn more about the practical journalistic applications of artificial intelligence.

Some familiarity with coding will make this class much more useful to you. The class will use coding “notebooks” that allow you to run and tinker with code on powerful machines. You will need a laptop, but it doesn’t have to be fancy. Also you’ll be able to keep everything you do in class.

We’ll focus on using the free, open-source “fast.ai” machine learning library. We’ll be working in Python, but if that’s not your main coding language, that’s okay. Your notebook will be preloaded with the code you need.

CLASS PLAN

Friday

  • Evening: Optional meetup. For those in town, drinks and snacks gathering near the school. Meet each other and talk about possibilities.

Saturday

  • Morning: We’ll get your laptops ready to go, and dive right in — using machine learning to classify images.
  • Lunch: Real-world examples of how machine learning has helped journalists, including some unexpected examples of how image-detection can be helpful.
  • Early Afternoon: More work with custom image sorting.
  • Break
  • Late Afternoon: A basic, accessible tutorial of how machine learning works behind the scenes, followed by an hands-on introduction to using machine learning for text documents.

Sunday

  • Morning: Practical machine learning to help sort, explore, and get insights from gigabytes of text documents.
  • Lunch: Demos of third-party tools useful for simple analysis.
  • Early Afternoon: Follow-up discussions and help with anything learned over the weekend and a discussion about spotting and managing issues of data bias.

About John Keefe

John Keefe the technical architect for bots and machine learning at Quartz. There he has designed and created the AI Studio, a “teach-by-example” effort to help journalists at Quartz and other news organizations use machine learning in their reporting. He also teaches classes on bots and product prototyping at the Craig Newmark Graduate School of Journalism at CUNY.

Before joining Quartz, Keefe was Senior Editor for Data News at public radio station WNYC, leading a team of journalists who specialize in data reporting, coding, and design for visualizations and investigations. He was previously WNYC’s news director for nearly a decade.

A self-described “professional beginner,” Keefe is the author of Family Projects for Smart Objects: Tabletop Projects That Respond to Your Worldfrom Maker Media, which grew from his effort to make something new every week for a year. Keefe has led classes and workshops at Columbia University, Stanford University, the New School University, and New York University. He also has served as an Innovator in Residence at West Virginia University’s Reed College of Media. Keefe blogs at johnkeefe.net and tweets as @jkeefe.

Date And Time

Sat, Oct 26, 2019, 10:00 AM –

Sun, Oct 27, 2019, 5:30 PM EDT

Location

Newmark Graduate School of Journalism at CUNY

230 West 41st Street

New York, NY 10036

Refund Policy

Refunds up to 1 day before the event

newscoding