/EnvisEdge

Deploy recommendation engines with Edge Computing

Primary LanguagePythonApache License 2.0Apache-2.0


EnvisEdge EnvisEdge
Bringing Recommendations to the Edge

Lisence Activity Discord OpenIssues

Sparkline

A one-stop solution to build your recommendation models, train them and, deploy them in a privacy-preserving manner-- right on the users' devices.

EnvisEdge allows you to easily explore new federated learning algorithms and deploy them into production.

The steps to building an awesome recommendation system are:

  1. 🔩 Standard ML training: Pick up any ML model and benchmark it using standard settings.
  2. 🎮 Federated Learning Simulation: Once you are satisfied with your model, explore a host of FL algorithms with the simulator.
  3. 🏭 Industrial Deployment: After all the testing and simulation, deploy easily using NimbleEdge suite
  4. 🚀 Edge Computing: Leverage all the benefits of edge computing

Repo Structure 🏢

NimbleEdge/EnvisEdge
├── CONTRIBUTING.md           <-- Please go through the contributing guidelines before starting 🤓
├── README.md                 <-- You are here 📌
├── docs                      <-- Tutorials and walkthroughs 🧐
├── experiments               <-- Recommendation models used by our services
└── fedrec                    <-- Whole magic takes place here 😜 
     ├── communications          <-- Modules for communication interfaces eg. Kafka
     ├── multiprocessing         <-- Modules to run parallel worker jobs
     ├── python_executors        <-- Contains worker modules eg. trainer and aggregator
     ├── serialization           <-- Message serializers
     └── utilities               <-- Helper modules
├── fl_strategies             <-- Federated learning algorithms for our services.
└── notebooks                 <-- Jupyter Notebook examples

QuickStart

Let's train Facebook AI's DLRM on the edge. DLRM has been a standard baseline for all neural network based recommendation models.

Clone this repo and change the argument datafile in configs/dlrm_fl.yml to the above path.

git clone https://github.com/NimbleEdge/EnvisEdge
model :
  name : 'dlrm'
  ...
  preproc :
    datafile : "<Path to Criteo>/criteo/train.txt"
 

Install the dependencies with conda or pip

mkdir env
cd env
virtualenv envisedge 
source envisedge/bin/activate 
pip3 install -r requirements.txt

Download kafka from Here 👈 and start the kafka server using the following commands

bin/zookeeper-server-start.sh config/zookeeper.properties
bin/kafka-server-start.sh config/server.properties

Create kafka topics for the job executor

bin/kafka-topics.sh --create --topic job-request-aggregator --bootstrap-server localhost:9092 --partitions 1 --replication-factor 1
bin/kafka-topics.sh --create --topic job-request-trainer --bootstrap-server localhost:9092 --partitions 1 --replication-factor 1
bin/kafka-topics.sh --create --topic job-response-aggregator --bootstrap-server localhost:9092 --partitions 1 --replication-factor 1
bin/kafka-topics.sh --create --topic job-response-trainer --bootstrap-server localhost:9092 --partitions 1 --replication-factor 1

To start the multiprocessing executor run the following command:

python executor.py --config configs/dlrm_fl.yml

Change the path in Dlrm_fl.yml to your data path.

preproc :
    datafile : "<Your path to data>/criteo_dataset/train.txt"

Run data preprocessing with preprocess_data and supply the config file. You should be able to generate per-day split from the entire dataset as well a processed data file

python preprocess_data.py --config configs/dlrm_fl.yml --logdir $HOME/logs/kaggle_criteo/exp_1

Begin Training

python train.py --config configs/dlrm_fl.yml --logdir $HOME/logs/kaggle_criteo/exp_3 --num_eval_batches 1000 --devices 0

Run tensorboard to view training loss and validation metrics at localhost:8888

tensorboard --logdir $HOME/logs/kaggle_criteo --port 8888

Contribute

  1. Please go through our CONTRIBUTING guidelines before starting.
  2. Star, fork, and clone the repo.
  3. Do your work.
  4. Push to your fork.
  5. Submit a PR to NimbleEdge/EnvisEdge

We welcome you to the Discord for queries related to the library and contribution in general.