Here are some of the Data Science / Machine Learning projects that I have been working on

In December of 2021 I built a Natural Language Processing Application in PyTorch by fine-tuning a BERT model through HuggingFace. I pulled data from the pushshift api from Reddit, created a labelled dataset with 5 categories, and then in the first week of January 2022 I created a RESTFul API using FastAPI to make live predictions on new data using my model. Finally I created an app in Streamlit to serve the model to users.

The model was built for Cloud Brigade, and I named it Hermes after the Greek messenger of the gods. Here’s a short demo, you should watch it right now!

Short demonstration of the NLP App I built to predict the subject of an incoming message, utilizing a RESTful API and a frontend application. (Model built in PyTorch fine-tunes a BERT Transformer model)

In October 2021 I completed phase 2 of Cloud Brigade’s RL Traffic Control project, employing a Reinforcement Learning Model utilizing a Deep Q Network (DQN) routine with RL Coach in a Custom OpenAI Gym Environment inside of an Amazon SageMaker RLEstimator with an Apache MXNet Framework. Check it out!

In May, 2021, I presented a talk on Modernizing Artificial Intelligence with Chris Miller from Cloud Brigade, an Agile Software Consulting firm

Introducing The SpaceLab

The SpaceLab is a people incubator that brings together early-career tech professionals and allows them to work in cross-functional teams to build real-world applications relating to outer-space.

Our first project builds on work started by Dr. Andrew Vanderburg (former Sagen Fellow at NASA and currently a professor at the University of Wisconsin) and Chris Shallue (Google Brain) and iterated upon by Anne Datillo (PhD Candidate at UCSC) and separately by Dr. Megan Ansdell of NASA, we are using a Convolutional Neural Network to identify possible exoplanets in the data from NASA’s TESS mission.

SpaceLab AMA w/ Emily Webber, AWS Machine Learning Specialist

SpaceLab’s first cohort was lucky enough to meet Emily Webber, AWS Machine Learning Specialist

I also discovered 39 planets and named one after my partner!

New Exoplanet Named SumNiva Identified Using ML!

I recently employed an algorithm that uses a Recurrent Neural Network, a Random Forest Classifier, and a Logistic Regression Model to predict the existence of 664 Exoplanets! I then ran each prediction through a python function that eliminated 100% of my false-positive predictions from the training data, and as a result was able to positively identify 39 exoplanets form previously unclassified data.

I found SumNiva, a planet that lives at Declination 40.934769, Right Ascension 293.123410 and orbiting a star that is about 9/10 the size of the sun, by using this algorithm on the Kepler Objects of Interest Data Set. I do not work for NASA, but hope that in the future I will have the opportunity to work at an outer-space focused company such as NASA, Blue Origin, Virgin Galactic or SpaceX.

A special thanks to NASA and Cal Tech for doing the really hard work of compiling the Kepler Objects of Interest Dataset over the past decade and making the data publicly available so that I could run this experiment.

updated-10-29-2020.PNG

COVID-19 and it’s racial disparity

Correlation does not define causation, but our study clearly shows that the strongest single factor correlating the per-capita death rate from COVID-19 in the US is the percentage of non-white residents in a county.

 

KMeans Clustering using static Covid-19 data, Socio-Economic Data, and ethnicity/demographics data.

In this study we employed a KMeans clustering algorithm to Covid-19 data, socio-economic data, and racial population data in the counties of the United States and were very surprised at what the data told us. We used data pulled from Google’s BigQuery, a SQL database, as well as data pulled from the US Bureau of Economic Analysis. The above dashboard (on the left in desktop) is built in tableau and you can visit it at Tableau Public.

Below you can see the resulting clusters of a refined dataset from the Jupyter Notebook, viewable on github.