/Predicting-Repo-popularity

This is a machine learning model that can predict the popularity of the GitHub repository just by giving your username and repo_name in the input (repository must be public).

Primary LanguageJupyter NotebookMIT LicenseMIT

Predicting-Repo-popularity

Machine learning model that can predict the popularity of GitHub repository just by giving your repo URL in the input. Here, popularity means the number of stars ✨ it can get in the future. So, for data we use scripts to scrap data from github.

Folder Notebooks contains data and script to extract data, analysis of data or the model creation code. We have used github api and Kaggle to collect the github data stored in the file github_api.csv and kaggle_data.csv respectively which has columns repo_name, star, fork, watch, issue, tags, most_used_lang, discription, contributors, license, and repo_url.

data_extraction.ipynb file contains script to extract the information from repositories, analysis.ipynb file contains cleaning and visualization operations on the dataset. model.ipynb building a machine learning model that can predict which repositories will gain how much stars in the future. 😃

Run on Local System

  • Create an virual environment:
python -m venv "evironment_name"

For more details follow this link.

  • Activate the Environment:

    • For Windows:

      ."evironment_name"\Scripts\activate

    • For Mac or Linux:

      source "evironment_name"/bin/activate

  • Install the required dependencies:

pip install -r requirement.txt
  • Clone the repository:
git clone https://github.com/pcsingh/Predicting-Repo-popularity.git
  • Enter into the directory:
cd Predicting-Repo-popularity
  1. To extract the github repo data using github api run data_extraction.ipynb notebook.

Github has limits on the number of requests using github api, so you need to use your github token in order to extract data. To generate your github token go to https://github.com/settings/tokens.

GitHub api requires headers for authorization.
header={'Accept':'application/vnd.github.mercy-preview+json',
'visibility':'PUBLIC',
"Authorization": "token PASTE_YOUR_GITHUB_TOKEN_HERE"
} 

Replace the PASTE_YOUR_GITHUB_TOKEN_HERE with your github token.

  1. To visualize some insight of the dataset run analysis.ipynb

  2. For training the model run model.ipynb file, we have used multiple regressions model, but one with the best R2 score is used for making prediction.

  • Run streamlit in order to make prediction using trained model:
streamlit run app.py

Note: Remember to paste the github token in the model.ipynb notebook and app.py file.


Click here to try now..... 🤗