• newscoding@gmail.com

Category ArchiveRecommendation

Fake news or misinformation detection algorithms and datasets

By Chenyan Jia

In this post, newscoding recommends several fake news or misinformation detection algorithms or datasets (especially misinformation related to COVID-19) that are used by researchers or Internet companies (*the following list is in no particular order of importance).

No. 1

Twitter: Updating our Approach to Misleading Information

In this article, Twitter introduces new labels and warning messages that will provide additional context and information on some Tweets containing disputed or misleading information related to COVID-19.


Triple Branch BERT Siamese Network for fake news classification on LIAR-PLUS dataset

A research paper published in the Proceedings of the First Workshop on Fact Extraction and VERification (FEVER) Where is your Evidence: Improving Fact-checking by Justification Modeling” extended the LIAR dataset to the LIAR-PLUS dataset. The LIAR dataset was introduced by (Wang, 2017) and consists of 12,836 short statements taken from POLITIFACT and labeled by humans (Alhindi, Petridis, Muresan, 2018).



Metafact is a health fact-checking platform using a community of verified experts. The website has an intuitive interface and contains highly COVID-19 related content.


Neural Covidex applies state-of-the-art neural network models and AI techniques to answer questions using the COVID-19 Open Research Dataset (CORD-19) provided by the Allen Institute for AI (data release of May 26, 2020), which currently contains over 47,000 scholarly articles. In addition, Neural Covidex also supports search on randomized controlled trials related to COVID-19 provided by Trialstreamer.


Facebook: Using AI to detect COVID-19 misinformation and exploitative content

Facebook works with over 60 fact-checking organizations that review content in more than 50 languages in order to prevent the spread of misinformation during the COVID-19 pandemic.


COVID-19 related misinformation test sets

Researchers from the Center for Artificial Intelligence Research (CAiRE) posted COVID-19 related misinformation test sets newly proposed in their “Misinformation has High Perplexity” paper.


USC Melady Lab: Coronavirus on Social Media Misinformation Analysis

USC Melady Lab identifies unreliable, misleading and clickbait information shared on Twitter regarding COVID-19 from 2020-03-01 – 2020-05-03.

(keep updating)


Alhindi, T, Petridis, S, & Muresan, S. (2018). Where is your Evidence: Improving Fact-checking by Justification Modeling. In Proceedings of the First Workshop on Fact Extraction and VERification (FEVER), Brussels, Belgium.

Wang, Y. W. (2017). Liar, liar pants on fire: A new benchmark dataset for fake news detection. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL 2017), Vancouver, BC, Canada.

Mediation Package in R

By Chenyan Jia


<code>results <- mediate(model.mediator = mod3, model.y = mod2, treat='exercise', mediator='food', boot=TRUE, sims=500)</code>
results <- mediate(model.mediator = mod3, model.y = mod2, treat='exercise', mediator='food', boot=TRUE, sims=500)
# "model.mediator": a fitted model object for mediator.
# "model.y": a fitted model object for outcome (using both the focal and mediator variables)
# "treat" a character string indicating the name of the treatment variable
# "mediator": a character string indicating the name of the mediator variable

# Typically bootstrap sample size ranges between 1000 ~ 5000. Remember, only use small simulations because our data are small.

How to decipher the results?
## ACME: Average Causal Mediation Effects
## ADE: Average Direct Effects
## Total Effect: Sum of a mediation (indirect) effect and a direct effect
## Prop. Mediated: Size of the average causal mediation effects relative to the total effect.
## When ACME is significant and ADE is not significant, a complete mediation happens (Direct effects are not significant any more because of the mediator ) 

An Example of Results

EgoWeb 2.0: a tool for social network analysis

By Chenyan Jia

If you are interested in using social network analysis to conduct research, you might want to explore this tool called EgoWeb 2.0 developed by David P. Kennedy.

Website Link: https://www.qualintitative.com/egoweb/
GitHub Link: https://github.com/qualintitative/egoweb
Install Instructions: https://www.qualintitative.com/wiki/doku.php/install

In order to use EgoWeb 2.0, the first step is to install AMPPS. Right now, EgoWeb 2.0 has upgraded its Mac version to 64-bit and works well in the latest Mac operating system. If you are Windows users, EgoWeb 2.0 functions well too.


  1. Allows researchers to use R to process data and provide baseline R codes
  2. Detailed instructions and updates


  1. Many installation steps (8-9 steps), including creating database and import database structure from SQL file
  2. Less intuitive than some other tools

Knight Center: Hands-on machine learning solutions for journalists

Newscoding recommendation: Machine learning is a buzzword nowadays. John Keefe, the technical architect for bots and machine learning at Quartz, will guide you step-by-step through the concepts and codes of machine learning in the journalism field.

Registration link: https://journalismcourses.org/MACH0919.html

JournalismCourses.org is is an online training platform of the Knight Center for Journalism in the Americas at the University of Texas at Austin. This program of free and low-cost online courses is possible in part thanks to a generous grant from the Knight Foundation.

Top websites for journalists interested in coding

By Chenyan Jia

GitHub: These codes are from IRE’s multi-day Python Bootcamp for journalists.



Jonathan Soma: A blog written by Professor Jonathan Soma from Columbia University’s Journalism School.


Professor Soma also has a website named investigate.ai.


Stack Overflow: The largest, most trusted online community for developers to learn, share their knowledge, and build their careers.


IPTC: Open standards for the news media


IRE (Investigative Reporters and Editors)


NICAR: The National Institute for Computer-Assisted Reporting maintains a library of federal databases, employs journalism students, and trains journalists in the practical skills of getting and analyzing electronic information.


DataJournalism: DataJournalism.com is created by the European Journalism Centre and supported by Google News Initiative. This website provides data journalists with free resources, materials, online video courses, and community forums. 


CodeActually: CodeActually is a blog-style website developed by Cindy Royal as part of the Knight Journalism Fellowship at Stanford University


Mooc: News Algorithms: The Impact of Automation and AI on Journalism

Newscoding recommends: Nicholas Diakopoulos, assistant professor at Northwestern University and director of its Computational Journalism Lab has an open online course (MOOC) teaching how news media are using algorithms, automation and AI to do journalism and how they can apply these tools in their own work.

This four-week course was from Feb. 11 to March 10, 2019, supported by the Knight Center for Journalism in the Americas at the University of Texas at Austin.

See details below:



Conference: Hands-on Machine Learning for Journalists in ONA19

Newscoding recommends: ONA19 is going to hold a 90-minute training session providing practical, hands-on experience using machine learning to manage documents, images and data records.

Speakers are listed below.

John Keefe – Technical Architect, Bots & Machine Learning, Quartz
@jkeefe | https://johnkeefe.net

Jeremy B. Merrill – Machine Learning Journalist, Quartz – AI Studio
@jeremybmerrill | http://jeremybmerrill.com

Victoria Cabales – AI Studio Fellow, Quartz

  • Friday – 11:00 AM – 12:30 PM
  • Treme – 2nd Floor
  • #ONA19
Click to see details

Article: How A.I. was used in Hong Kong Protests

Newscoding Recommends: The New York Times recently published an article How A.I. Helped Improve Crowd Counting in Hong Kong Protests.

This is an example created by The New York Times showing how artificial intelligence can be used to detect moving people and objects.

Read more in The New York Times:

How A.I. Helped Improve Crowd Counting in Hong Kong Protests.

Bootcamp: Practical Machine Learning for Journalists (Oct 26 and 27)

Newscoding recommends: John Keefe announced the dates of his machine learning workshop:



BOOTCAMP: PRACTICAL MACHINE LEARNING FOR JOURNALISTS with John Keefe, the technical architect for bots and machine learning at Quartz

This intensive two-day bootcamp meets Saturday, October 26 from 10 am to 4 pm and Sunday, October 27 from 11:30 am to 5:30 pm.

The cost for this workshop is $750; $600 early bird rate before September 9

Level: Advanced

Welcome to the next generation of data journalism: Recognize cases when machine learning can help in investigations, use existing and custom-made tools to tackle real-world reporting issues, and avoid bias and error in your work!

Sifting through terabytes of documents or images might take years — unless you teach a computer to do it for you. Like a bloodhound, a machine-learning algorithm can take a “sniff,” or sample, of what you’re looking for and find “more like this.” In this class, students will learn to recognize cases when machine learning might help solve such reporting problems, to use existing and custom-made tools to tackle real-world issues, and to identify and avoid bias and error in their work. Through hands-on experience, students will get an introduction to using these methods on any beat.


Take this class if you are a data journalist or anyone looking to learn more about the practical journalistic applications of artificial intelligence.

Some familiarity with coding will make this class much more useful to you. The class will use coding “notebooks” that allow you to run and tinker with code on powerful machines. You will need a laptop, but it doesn’t have to be fancy. Also you’ll be able to keep everything you do in class.

We’ll focus on using the free, open-source “fast.ai” machine learning library. We’ll be working in Python, but if that’s not your main coding language, that’s okay. Your notebook will be preloaded with the code you need.



  • Evening: Optional meetup. For those in town, drinks and snacks gathering near the school. Meet each other and talk about possibilities.


  • Morning: We’ll get your laptops ready to go, and dive right in — using machine learning to classify images.
  • Lunch: Real-world examples of how machine learning has helped journalists, including some unexpected examples of how image-detection can be helpful.
  • Early Afternoon: More work with custom image sorting.
  • Break
  • Late Afternoon: A basic, accessible tutorial of how machine learning works behind the scenes, followed by an hands-on introduction to using machine learning for text documents.


  • Morning: Practical machine learning to help sort, explore, and get insights from gigabytes of text documents.
  • Lunch: Demos of third-party tools useful for simple analysis.
  • Early Afternoon: Follow-up discussions and help with anything learned over the weekend and a discussion about spotting and managing issues of data bias.

About John Keefe

John Keefe the technical architect for bots and machine learning at Quartz. There he has designed and created the AI Studio, a “teach-by-example” effort to help journalists at Quartz and other news organizations use machine learning in their reporting. He also teaches classes on bots and product prototyping at the Craig Newmark Graduate School of Journalism at CUNY.

Before joining Quartz, Keefe was Senior Editor for Data News at public radio station WNYC, leading a team of journalists who specialize in data reporting, coding, and design for visualizations and investigations. He was previously WNYC’s news director for nearly a decade.

A self-described “professional beginner,” Keefe is the author of Family Projects for Smart Objects: Tabletop Projects That Respond to Your Worldfrom Maker Media, which grew from his effort to make something new every week for a year. Keefe has led classes and workshops at Columbia University, Stanford University, the New School University, and New York University. He also has served as an Innovator in Residence at West Virginia University’s Reed College of Media. Keefe blogs at johnkeefe.net and tweets as @jkeefe.

Date And Time

Sat, Oct 26, 2019, 10:00 AM –

Sun, Oct 27, 2019, 5:30 PM EDT


Newmark Graduate School of Journalism at CUNY

230 West 41st Street

New York, NY 10036

Refund Policy

Refunds up to 1 day before the event

Book: Reporting with Data in R

By Christian McDonald

This image has an empty alt attribute; its file name is faculty-jou-mcdonald-1.jpg
Professor Christian McDonald

Newscoding’s Recommend: Christian McDonald is Assistant Professor of Practice in the School of Journalism at the University of Texas at Austin. Before joining the University of Texas at Austin faculty full-time in Fall 2018, Professor McDonald is a career journalist who most recently served as Data and Projects Editor at the Austin American-Statesman.

Reporting with Data in R is a series of lessons and instructions used in the course REPORTING WITH DATA taught by Professor McDonald. This book is a perfect resource for beginning students and journalists who want to use R in data visualization. It starts with basic knowledge such as the installation of RStudio and data cleaning, then introduces a variety of codes about data visualization. The following link is the access to the whole book.