Comp550FinalProject

The src directory contains all the code used in our analysis.

The final_data_with_ratios folder contains the datasets we collected and used. Note: a few subreddits csv files were too large to upload to github so they were removed. The purpose of this folder is to give a sample of the data collected.

The weights folder contains logistic regression weights for each term in each subreddit

The similarity_data folder contains calculated subreddit similarity results