Training across devices
Opened this issue · 5 comments
The Collaborative Learning talk by @wmaass concludes with lessons learned, an extract:
Then, on the machine learning side, it would be really great if also on the JavaScript side, we would have some support for federated learning, distributed learning and reinforcement learning, this is currently missing.
The Enabling Distributed DNNs for the Mobile Web Over Cloud, Edge and End Devices talk by Yakun Huang (@Nov1102) makes a point:
To accelerate distributed DNNs for the web, it is natural to consider the use of partition-offloading approach to leverage computing resources of the end devices and the edge server.
Also raises the following questions:
•What role should the edge server play in providing processing support for intelligent web applications requiring heavy computation?
•How to web developers use the edge server more easily for accelerating DNNs and collaborating with Web apps?
•How can the edge server deploy and offload DNN computations more easily?
Maybe browser-instantiated workers running on edge devices could help here? There has been some exploration in the Web & Networks IG around this space in edge computing workstream. Cc @zolkis
What role should the edge server play
Edge devices / edge cloud need more consistent definitions, but in general here is how I currently see it.
Privacy was named as a reason - IMHO for that use case it would make sense if the user would be able to define the privacy zone(s) so that the compute offload to one or more edge devices/servers falls under the user's privacy control.
The compute offload mechanisms should include the possibility that users could select the compute offload target (e.g. an edge server within the privacy zone).
The authors made a DNN-specific proposal for splitting the load.
So IMHO we'd need these things:
- definition of privacy zone vs edge server,
- user selectable compute offload targets with
a) generic compute offload mechanism(s),
b) DNN- (or other compute-) specific compute offload distribution (either on top of a) or instead of a)).
ONNX.js - A Javascript library to run ONNX models in browsers and Node.js talk by @EmmaNingMS explains how ONNX.js benefits from parallelization using web workers:
Furthermore, ONNX.js utilize a web worker to provide multi-thread environment for operator parallelization.
Originally, web worker was introduced to unblock UI rendering.
It allows you to create additional thread to run other long-run computation separately.
ONNX.js leverages web worker to enable parallelization within heavy operators, which significantly improve the performance on machines with multicores.
By taking full advantage of WebAssembly and web worker, the final result shows over 19 times speedup on CPU with four cores.
What are the obstacles in scaling this architecture to run JS or WebAssembly modules on the edge to enable training across devices? Some CDN providers (e.g. Cloudflare, Fastly) seem to have products for running JS and Wasm modules on their edge network.
The Machine Learning on the Web for content filtering applications talk by @shoniko brings up a point on how federated learning could help in the context of content filtering applications, quoting:
And lastly, I wanted to say that, as I started with that, everything starts with the community of people who write the filter lists.
And we certainly want to ensure that the community of people as always is also able to maintain the models.
So for that, we are very interested in federated learning problems and how they relate to Web Neural Networks.
And we have experimented with TFJS a little bit, but we wanted to understand how Web Neural Networks API would interact with federated learning problems.
Training is out of scope for the initial version of the Web Neural Network API, but could be considered in a future version. In order to make a stronger case for inclusions of training capabilities, real-world usages such as those discussed in this issue help raise the priority. An additional consideration is the availability of respective platform APIs.
There are a couple of relevant papers, for instance,
- mobile web workers, presented in the Web&Networks IG (see slides) and published in ACM (with a good overview and tests on the workload migration process),
- Liquid Web Workers,
- Akamai Edge Workers,
- Cloudflare workers,
and many others.
The APIs, challenges and findings are (not surprisingly) somewhat similar, so they provide a consistent background for compute offload, be it for training or inference. These are generic compute offload mechanisms, though.
From the training use case point of view, app-specific job splitting, orchestration, privacy aspects of compute offload need to be explored yet.
I think this is a great idea. Esp., the DNN, which accuracy generally depends on the number of the learning data would have a big impact to improve. To maintain the quality of data, the Blockchain technology will help much.