Multi-GPU automated partial offloading
dumblob opened this issue · 0 comments
Describe the feature
Scientific computations often work with matrices or "not so conditionals/branch heavy" algorithms.
Using purely CPUs (which seems what VSL currently does and unfortunately seems to aim for) will "penalize" VSL compared to other hybrid frameworks whose API is made to accommodate for running the designated computations also on GPUs without user intervention.
I.e. in runtime it automatically detects available GPUs with the needed capabilities, quickly benchmarks them incl. the upload & download bus performance - think "bogomips" from Linux, then chooses the parts of the code which make sense to run there, and if there is no compile-time appropriate binary snippet to run on GPU it compiles the snippets by something like JIT and finally plumb everything to offload the chosen work to GPUs.
Use Case
-
To save time to researchers (yes, I often hear complains "I can not use more features/variables in my model as one run with 2500 of rows of input data already takes 2 hours on my modern PC and there are not enough money in the grant to pay for an AWS GPU cluster. So I can not do a full proper analysis but have to instead assume too many things instead of actually analysing them.").
-
To use ubiquitous HW to its max.
Proposed Solution
No response
Other Information
No response
Acknowledgements
- I may be able to implement this feature request
- This feature might incur a breaking change
Version used
latest git master
Environment details (OS name and version, etc.)
Any (at least macOS, Windows, Linux).