Grid currently has two modes.
--tree
-- experimental
Tree is the federated learning mode. The long term goal is to have node workers store data locally, and train public models to make them better.
--compute
Compute mode lets users offer compute to train models. Data scientists can easily publish models from a jupyter notebook and train them remotely. Grid also offers easy utilities to try n number of configurations concurrently.
Grid worker daemon requires an IPFS daemon to be running with pubsub support turned on.
./bin/start_ipfs
Make sure you have installed all dependencies
pip3 install -r requirements.txt
python3 setup.py install
You can then run the worker daemon. If you want to run in compute
mode run
./bin/start_worker --compute
and if you want to run in tree
mode, run
./bin/start_worker --tree
Finally, to run the notebooks - in a different tab start Jupyter by running:
jupyter notebook
navigate to notebooks/pubsub/
and open Keras Grid Client and Worker.ipynb
and
follow along in that notebook.
If you see this connect error ERROR: could not connect to IPFS. Is your daemon running with pubsub support at 127.0.0.1 on port 5001
make sure you run the following command:
ipfs daemon --enable-pubsub-experiment
If you have any troubles running an experiment such as the other peers not learning about your jobs make sure you're connected to the peer. You can check if you're connected to the peer by running:
ipfs pubsub peers | grep <ipfs address>
And then to connect to the peer if you're not connected:
ipfs swarm connect <ipfs_address>
The swarm connect IPFS address should look something like this /p2p-circuit/ipfs/QmXbV8HZwKYkkWAAZr1aYT3nRMCUnpp68KaxP3vccirUc8
. And can be found in the output of the daemon when you start it.