This repository houses the source code for a research project aimed at exploring user-item fairness tradeoffs in recommendation systems. The project encompasses both theoretical frameworks and empirical evaluations to assess and improve fairness in recommendations.
- explore_data.py: Script for getting main and sub categories information, and separate train and test datasets.
- get_authors_papers.py: Script to fetch all authors and their published papers from Semantic Scholar.
- get_paper_citations.py: Retrieves citation data for the recommended papers from Semantic Scholar.
- get_paper_details.py: Fetches information about papers such as semantic scholar ID etc for the citation/references.
- get_references.py: Collects references for the recommended papers from Semantic Scholar.
- import_metadata.py: Script for importing and processing metadata from Kaggle.
- model_evaluation.py: Contains functions to evaluate the recommendation model.
- requirements.txt: Lists all the dependencies required to run the scripts.
- sentence_transformer_authors.py: Get recommendations using Sentence Transformer and cosine similarity.
- stopwords.txt: Text file containing stopwords used in text processing.
- tfidf_authors.py: Get recommendations using TF-IDF embeddings and cosine similarity.
- utils.py: Utility functions used across the project.
- Python 3.8 or newer
- pip
- Semantic Scholar API Key
Clone the repository and install the required dependencies:
pip install -r requirements.txt
The original dataset was sourced from the public ArXiv Dataset available on Kaggle.
python3 import_metadata.py
python3 explore_data.py
python3 get_authors_papers.py
python3 get_paper_citations.py
python3 get_paper_details.py
python3 get_references.py
python3 tfidf_authors.py