Data Science: Projects and Publications

Dissertation Research: Feature Extraction for Network Visualization, Focus Groups with Human Computation, and Knowledge Graph Creation

Createad a customized image and metadata collector script using the Instaloader package built in Python. Collected 2,000+ images and metadata from Instagram.
Narrative Frame Network Visualization (forthcoming)
- Image feature extraction and information diffusion analysis
Visual interpretation and focus group (forthcoming)
Knowledge Graph Implementation (forthcoming)
- Knowledge base and scene graph building

Bibliometric Data: Publishing Behavior of University of Rochester Researchers

This repository holds data and final analysis for the project, Bibliometric Data: Publishing Behavior of Rochester University Researchers. A project hosted by University of Rochester and backed by the LEADING Fellowship from Drexel University. The research question states where are University of Rochester researchers publishing/depositing data? The goal is to get a better understanding of how and where University of Rochester researchers are saving publishing their data. We attempted to analyze researcher publishing behavior and understand who is depositing data, where are they depositing it, how large are the datasets, and what formats are submitted/supported. The overall objectives were to find out where researchers are making their data publicly available. Identify common topics, relations, and overall trends. Using APIs, refine data collection techniques and conduct analyses on University of Rochester researcher data deposits into disciplinary data repositories.

Kewords per Repository Topic

Authors Per Repository Topics

Keywords Betweeness Centrality

Go to Project Repository

Street Art Network Analysis: Applications of Bi-Partite and Bi-Dynamic Line Graphs

In this analysis, Street Art images are considered as a type of visual information that can represent a specific perception of a community as a member of a community space. Dynamic By-partite network analysis was used to understand how different neighborhoods are connected through artist attributes and how they might differ. The results show that specific neighborhood traits, urban, population, culture contribute to stronger ties within the Street Art community network. Street Art as Visual Information: Mixed Methods Approach to Analyzing Community Spaces

Github code: Tucson_Street-Art

Poster PDF

Natural Language Processing: Model Comparison between BERT and Hierarchical Attention Networks

This project two natural language processing models on a dataset composed of labeled propaganda data. We reviewed off the shelf BERT and the Hierarchical Attention Network (HAN) models and found they both provide different accuracy levels, with BERT maintaining better results for the binary data classification problem. Identifying Propaganda: Comparing NLP Machine Learning Models on Propagandistic News Articles

HAN Architecture

BERT pre-training and fine-tuning procedures

HAN Image taken from Yang et al 2016, BERT Image taken from Delvin et al. 2019

Github code: Identifying Propaganda

This framework combines computational applications with visual methodologies to discover frames of meaning making in a large image collection. Frame analysis and Critical Visual Methodology (Rose, 2016) are reviewed and used in the framework to work in tangent with quantitative research methods. The methods framework is presented in the form of a matrix that enables researchers to identify applications for looking at social movements online through theoretical and computational approaches.

“Mixed Methods Framework for Understanding Visual Frames in Social Movements” - Long Paper Accepted to ASIS&T 2023
Theoretical Matrix Presented at Society for the Social Studies of Science Conference in 2022
Program can be found here

Data Science: Private Datasets and Code

Here I worked with university data to tackle questions that support decision making. I used predictive modeling, statistics, Machine Learning, Data Mining, and other data analysis techniques to collect, explore, and extract insights from structured and unstructured data. Topics include revenue, retention/attrition, and student sentiment and experience.

Sentiment Analysis: Student Course Survey Comments

I analyzed 30K+ student course surveys using sentiment analysis packages in R to identify sentiment for all comments provided in student course surveys across campus. I also used SQL to pull the data from the Oracle database to be used for a personalized instructor dashboard.

Survival Analysis: University Retention and Attrition

I used survival and churn analysis to analyze the expected duration of attrition for female students and The University of Arizona. Here, Kaplan Mier and the supporting cox regression analysis were used to study retention. Churn analysis including methods of logistic regression, decision trees and random forests were used to study attrition. Read the report here

Internal Dashboard: Net Tuition Revenue

I developed an end-to-end production process of tuition and headcount dashboard visualizations and analysis using R and SQL. This consisted of aggregated data tables from an internal Oracle data warehouse that include descriptive statistics and inflation information of campus-wide Net Tuition Revenue (NTR).

Campus Climate Survey: Statistical Analysis and Inference

Applied statistical analysis and inference to surveys and collected data to review and test the University of Arizona’s Medical School’s performance for future accreditation. Types of data included survey data, raw archived data, collected government and academic data from large data systems. Used both R and Python to use analysis of variance, chi-square tests, post hoc and assumptions checks, regression analysis, correlation analysis, visualizations.