Matrix Factorization Based Recommender System

A recommender system implementation using LightFM, designed for experimentation and evaluation of different recommendation strategies. This provides a pipeline for generating synthetic data, training models, and evaluating performance with various configurations.

Features

Hybrid Recommendation Model
- Matrix factorization with LightFM
- Support for user and item features
- Multiple loss functions (WARP, BPR, etc.)
Synthetic Data Generation
- Realistic user profiles and demographics
- Item features and characteristics
- Configurable interaction patterns
Comprehensive Evaluation
- Cross-validation support
- Multiple evaluation metrics
- Statistical analysis of results
- MLflow experiment tracking
Experiment Management
- Configuration-based experiments
- Result persistence and analysis
- Automated reporting
- Experiment comparison and visualization

Project Structure

.
├── data/                   # Data generation and processing
├── evaluation/             # Evaluation metrics and tools
├── examples/               # Example scripts and notebooks
├── models/                 # Model implementations
├── schemas/                # Data schemas and validation
├── utils/                  # Utility functions and logging
├── mlruns/                 # MLflow tracking directory
└── config.yaml             # Default configuration

Installation

Prerequisites

Python 3.11+
Conda (recommended for environment management)
Make (for using Makefile commands)

Setup Using Conda (Recommended)

Clone the repository:

git clone <repository-url>
cd factorization-recs

Create and setup conda environment:
```
make setup-conda
```
Activate the environment:
```
conda activate recs
```
Install pre-commit hooks (for development):
```
make setup-pre-commit
```

Quick Setup

For a complete setup including conda environment and pre-commit hooks:

make setup

Running Experiments

Configuration

Review and modify config.yaml for experiment parameters:

model:
  learning_rate: 0.05
  loss: "warp"
  no_components: 64
  ...

training:
  num_epochs: 10
  num_threads: 4
  ...

Run experiments:
```
python examples/run_experiments.py
```

Experiment Output

Results are saved in experiment_results/ with the following structure:

experiment_results/
├── default/
│   ├── config.yaml           # Experiment configuration
│   ├── results.json          # Detailed results
│   └── synthetic_data/       # Generated datasets
│       ├── users.csv
│       ├── items.csv
│       └── interactions.csv
├── high_lr/
│   └── ...
└── summary_results.csv       # Overall experiment summary

Experiment Tracking

The project uses MLflow for experiment tracking and visualization:

Start the MLflow UI:
```
make mlflow-ui
```
View experiments at http://localhost:5001

MLflow tracks:

Model parameters
Training metrics
Dataset statistics
Performance metrics
Artifacts (configs, results)

Compare experiments by:

Parameter values
Metric performance
Cross-validation results
Dataset characteristics

Development

Code Formatting

Format code using isort and black:

make format

Check formatting without making changes:

make check-format

Linting

Run all linting checks:

make check-lint

Individual linting tools:

make lint-flake8     # Run flake8
make lint-pylint     # Run pylint

Type Checking

Run mypy type checks:

make test-mypy

Check for missing type hints:

make check-missing-type

Pre-commit Hooks

Run pre-commit checks:

make test-pre-commit     # Test pre-commit hooks
make test-pre-push      # Test pre-push hooks

Cleaning Up

Remove generated files and caches:

make clean              # Clean all temporary files and results
make clean-conda        # Remove conda environment

sksoumik/factorization-recs