LTRS-scraper: A Python repository from avrtt

A pipeline that collects, preprocesses and labels raw user-item interaction data to build a hybrid recommender system using collaborative filtering (SVD, ALS) and learning-to-rank (XGBoost ranking) methods, evaluated with NDCG and MAP metrics.

End-to-end features:

Data collection & labeling: simulated web scraping and synthetic data generation to mimic user-item interactions.
Data preprocessing: techniques for handling missing values, duplicates and outliers.
Collaborative filtering: implementation of matrix factorization approaches (SVD, ALS) using the Surprise library.
Learning-to-rank: XGBoost with ranking objectives to refine recommendations.
Evaluation metrics: calculation of ranking metrics including Normalized Discounted Cumulative Gain and Mean Average Precision.
Visualization & analysis: tools for in-depth performance evaluation and visualization.

Structure:

data_pipeline.py contains modules for synthetic data generation, web scraping simulation and preprocessing.
recommender.py implements collaborative filtering models (SVD, ALS) and the learning-to-rank model using XGBoost.
metrics.py contains functions for calculating NDCG, MAP and other ranking metrics.
evaluation.py provides evaluation functions and visualization routines to assess model performance.
utils.py: general utility functions including logging, data splitting and configuration management.
main.py: the main driver script that integrates the entire pipeline.

Ensure you have Python 3.8+ installed. Install the required packages using:

pip install -r requirements.txt

To run the full pipeline:

python main.py

License

Apache 2.0

avrtt/LTRS-scraper

License