In this live coding session, we leverage the Python Reddit API Wrapper (PRAW
) to retrieve data from subreddits on Reddit, and perform sentiment analysis using pipelines
from HuggingFace ( 🤗 the GitHub of Machine Learning ), powered by transformer.
At the end of this session, you will be able to:
- Know how to work with APIs
- Feel more comfortable navigating thru documentation, even inspecting the source code
- Understand what a
pipeline
object is in HuggingFace - perform sentiment analysis using
pipeline
- Run a python script in command line and get the results
- Examine the quality of data
- Understand data lineage
Let's get started by creating a Week 3 Repository!
Creating your Week 3 Repository
- The first thing to do is create a new empty public repository!
- Be sure to fill out your repository name, description, and ensure it's public! NOTE: DO NOT ADD A README OR LICENSE
-
Now that you've done the required set-up on GitHub.com, let's move to our terminal and clone the MLE-8 repository!
git clone git@github.com:FourthBrain/MLE-8.git
-
OPTIONAL: If you've already cloned the MLE-8 repository - feel free to pull the recent changes by
cd
ing into the directory that contains the MLE-8 repo, and running this command. Be sure to return to the parent directory before moving on! (cd ..
)git pull origin main
-
Now, we're going to copy the contents of the assignment to a new folder using the following command
cp -r MLE-8/assignments/week-3-analyze-sentiment-reddit .
-
Once that is complete, we'll
cd
into the newly created folder withcd week-3-analyze-sentiment-reddit
-
Now, let's init our repository in this folder using
git init
-
We'll add the contents of the folder using
git add .
-
Let's create an initial commit!
git commit -m "Initial Commit"
-
Now we can add our created repository as a remote using the following command. Don't forget, you can get the SSH address from your repository by clicking the green
Code
button on GitHub.com!git remote add main git@github.com:<YOUR GITHUB USERNAME>/<YOUR REPOSITORY NAME>
-
Now we'll set our branch to
main
git branch -M main
-
Last, but not least, let's push the contents of our commit to our repo!
git push -u main main
-
That's it, that's all!
Create a new Conda environment for sentiment anaylsis (sa). If you already have this environment, continue to the next step.
conda create -n sa python=3.8 jupyter -y
Activate your new environment
conda activate sa
Open the jupyter-notebook
jupyter-notebook
Navigate through the repo in the notebook to find imports.ipynb
for this week and open it.
Run all of the cells in the notebook.
Please review the weekly narrative here