You gain wisdom only when you look back, so think regressively more!

Recent Posts

Recently, the Data Science Team at AccuWeather has been promoting modular coding to better our coding skills and make our work more efficient. So what is modular coding? Per wikipedia, modular programming is a software design technique that emphasizes separating the functionality of a program into independent, interchangeable modules, such that each contains everything necessary to execute only one aspect of the desired functionality. To simply put, it means that anything you will need to use repeatedly should be written into a function or even a module that you can just easily call/import it.


Google BigQuery and Azure Cloud are both powerful platforms to store data. Google BigQuery can process a couple TB of data within a couple minutes and you pay when you query, store and process. The detailed pricing is here.Azure SQL data base provides fast and convienient data for the first 32 GB/month at ~$5/month. The detailed pricing is here.Normally business intelligence analysts, database managers or data scientists access the two platforms from the two consoles separately.


Recently, I’ve been playing with the deep learning python package ‘tensorflow’. I ran a simple linear regression model and had some success. Tensorflow is great with unstructured data and image recognization problem. Therefore, it usually runs better in a GPU supported computer. However, given my model is rather simple and won’t need to rely on too much image processing power like GPU. I did it on my windows 7 professional/10 machine and it predicted some values for me.


If you have viewed my bio, you probably noticed that I don’t have a science or engineering background, so how come I end up with data scientist? Well, it turns out that you don’t need a science or engineering background to become a data scientist! Here’s my personal trajectory. I think it’s highly reproducible:) In Sep 2014, I was hired as a marketing research analyst by AccuWeather to work on AW’s new adventure: IoT or we call it emerging platform projects.



Memory Illusion

Memory Illusion lead us to believe in fake news


User Susceptibility to Fake News

Selected Publications

(1) susceptible users are correlated with a combination of user, network, and content features; (2) one can build a reasonably accurate prediction model with 0.82 in AUC-ROC for the multinomial classification task; and (3) there exists a correlation between the dominant susceptibility level of center nodes and that of the entire network.
In WebSci, 2019

Recent Publications

. How Gullible Are You? Predicting Susceptibility to Fake News. In WebSci, 2019.

Preprint PDF Code Dataset Project Poster

Recent & Upcoming Talks

Open Data Science Conference 2019 (West)
Oct 30, 2019 2:00 PM
2019 Grace Hopper Celebration (GHC 19)
Oct 1, 2019 2:00 PM
11th ACM Conference on Web Science
Jun 30, 2019 2:00 PM
2019 WiCYS(Women in Cyber Security) Conference
Mar 28, 2019 2:00 PM
2018 CSSA Grad School Application Seminar
Oct 25, 2018 6:00 PM
2018 SIA Career Panel
Oct 19, 2018 1:00 PM
2016 SIA Career Panel
Aug 26, 2016 2:00 PM


I was a teaching assistant for the following courses at Penn State University:

SRA468:Spatial Analysis of Risk-Fall 2018:

  • Time: Tue&Thr 3:05-4:20pm EST
  • Office Hour: Wed 1:30-2:30pm EST

DS220:Data Management-Data Science-Spring 2019:

  • Time: Mon&Wed 6-7:30pm EST
  • Office Hour: Tue 3-5pm EST