A recommender system implementation using LightFM, designed for experimentation and evaluation of different recommendation strategies. This provides a pipeline for generating synthetic data, training models, and evaluating performance with various configurations.
-
Hybrid Recommendation Model
- Matrix factorization with LightFM
- Support for user and item features
- Multiple loss functions (WARP, BPR, etc.)
-
Synthetic Data Generation
- Realistic user profiles and demographics
- Item features and characteristics
- Configurable interaction patterns
-
Comprehensive Evaluation
- Cross-validation support
- Multiple evaluation metrics
- Statistical analysis of results
- MLflow experiment tracking
-
Experiment Management
- Configuration-based experiments
- Result persistence and analysis
- Automated reporting
- Experiment comparison and visualization
.
├── data/ # Data generation and processing
├── evaluation/ # Evaluation metrics and tools
├── examples/ # Example scripts and notebooks
├── models/ # Model implementations
├── schemas/ # Data schemas and validation
├── utils/ # Utility functions and logging
├── mlruns/ # MLflow tracking directory
└── config.yaml # Default configuration
- Python 3.11+
- Conda (recommended for environment management)
- Make (for using Makefile commands)
-
Clone the repository:
git clone <repository-url> cd factorization-recs
-
Create and setup conda environment:
make setup-conda
-
Activate the environment:
conda activate recs
-
Install pre-commit hooks (for development):
make setup-pre-commit
For a complete setup including conda environment and pre-commit hooks:
make setup
-
Review and modify
config.yaml
for experiment parameters:model: learning_rate: 0.05 loss: "warp" no_components: 64 ... training: num_epochs: 10 num_threads: 4 ...
-
Run experiments:
python examples/run_experiments.py
Results are saved in experiment_results/
with the following structure:
experiment_results/
├── default/
│ ├── config.yaml # Experiment configuration
│ ├── results.json # Detailed results
│ └── synthetic_data/ # Generated datasets
│ ├── users.csv
│ ├── items.csv
│ └── interactions.csv
├── high_lr/
│ └── ...
└── summary_results.csv # Overall experiment summary
The project uses MLflow for experiment tracking and visualization:
-
Start the MLflow UI:
make mlflow-ui
-
View experiments at http://localhost:5001
MLflow tracks:
- Model parameters
- Training metrics
- Dataset statistics
- Performance metrics
- Artifacts (configs, results)
Compare experiments by:
- Parameter values
- Metric performance
- Cross-validation results
- Dataset characteristics
Format code using isort and black:
make format
Check formatting without making changes:
make check-format
Run all linting checks:
make check-lint
Individual linting tools:
make lint-flake8 # Run flake8
make lint-pylint # Run pylint
Run mypy type checks:
make test-mypy
Check for missing type hints:
make check-missing-type
Run pre-commit checks:
make test-pre-commit # Test pre-commit hooks
make test-pre-push # Test pre-push hooks
Remove generated files and caches:
make clean # Clean all temporary files and results
make clean-conda # Remove conda environment