An attempt to understand YouTube collaborations.
Explore the docs »
View Demo
·
Report Bug
·
Request Feature
Table of Contents
- YouTube API Scraper (./youtube_api_scrapper)
- Entity Recognition Model (./collab_labelling/entity_recognition)
- Machine Learning Modelling (./collab_labelling/machine_learning)
- Get a free API Key for YouTube Data API v3
- Clone the repo
git clone https://github.com/Yoshi275/ism-youtube-scraper.git
- Install Python packages in a virtualenv
pip install virtualenv virtualenv venv source venv/bin/activate pip install -r requirements.txt
- Enter your API in a new file at
.\youtube_api_scrapper\.env
const DEVELOPER_KEY = 'ENTER YOUR API';
- If using spaCy entity recognition model, download model
python -m spacy download en_core_web_lg
- Enter the
youtube_api_scrapper
folder - Run
python <FILE_NAME.py>
with the following files:channel_info_table_populator.py
- gives channel information based on channel ID or usernamevideo_info_table_populator.py
- gives video information based on video IDchannels_to_ids.py
- converts channel usernames into unique YouTube channel IDsid_to_uploads_playlist.py
- gets all videos from specified channel ID
- Enter the
.\collab_labelling\entity_recognition
folder - Ensure that there is a
gservice_account.json
file within the folder, and an input file named according to theINPUT_CSV_FILE_NAME
inentity_recognition_model.py
in the same folder. The default input file name to be read istest_videos.csv
- In the
entity_recognition_model.py
, change the following variables as relevant:- ENTITY_RECOGNITION_MODEL - options: Model.GOOGLE_MODEL, Model.SPACY_MODEL. This determines the ER model being used
- INPUT_CSV_FILE_NAME and OUTPUT_CSV_FILE_NAME - This determines the input and output files. The input file reads the same structure as that of the video data in OneDrive. The output file returns
- SELECTED_COLS - This determines the names of the relevant columns, which has text that you want to run through the ER model. In the background, we simply combine all texts into one to send through the ER model.
- Enter the
.\collab_labelling\machine_learning
folder - Run
jupyter notebook
within the folder - Run code on Jupyter notebook environment
Your Name - @icherylcode - cheryl.nqj@gmail.com.com
Project Link: https://github.com/Yoshi275/ism-youtube-scraper
- Project Partner - Shauna Tan
- Supervisor - Dr Kokil Jaidka