ism-youtube-scraper: A Jupyter Notebook repository from Yoshi275

An attempt to understand YouTube collaborations.
Explore the docs »

View Demo · Report Bug · Request Feature

Table of Contents

About The Project

Clone the repo

git clone https://github.com/Yoshi275/ism-youtube-scraper.git

Install Python packages in a virtualenv

pip install virtualenv
virtualenv venv
source venv/bin/activate
pip install -r requirements.txt

Enter your API in a new file at .\youtube_api_scrapper\.env
```
const DEVELOPER_KEY = 'ENTER YOUR API';
```
If using spaCy entity recognition model, download model
```
python -m spacy download en_core_web_lg
```

Enter the youtube_api_scrapper folder
Run python <FILE_NAME.py> with the following files:
- channel_info_table_populator.py - gives channel information based on channel ID or username
- video_info_table_populator.py - gives video information based on video ID
- channels_to_ids.py - converts channel usernames into unique YouTube channel IDs
- id_to_uploads_playlist.py - gets all videos from specified channel ID

Enter the .\collab_labelling\entity_recognition folder
Ensure that there is a gservice_account.json file within the folder, and an input file named according to the INPUT_CSV_FILE_NAME in entity_recognition_model.py in the same folder. The default input file name to be read is test_videos.csv
In the entity_recognition_model.py, change the following variables as relevant:
- ENTITY_RECOGNITION_MODEL - options: Model.GOOGLE_MODEL, Model.SPACY_MODEL. This determines the ER model being used
- INPUT_CSV_FILE_NAME and OUTPUT_CSV_FILE_NAME - This determines the input and output files. The input file reads the same structure as that of the video data in OneDrive. The output file returns
- SELECTED_COLS - This determines the names of the relevant columns, which has text that you want to run through the ER model. In the background, we simply combine all texts into one to send through the ER model.