Multi-GPU automated partial offloading

Question

Multi-GPU automated partial offloading

dumblob opened this issue a year ago · 0 comments

Describe the feature

Scientific computations often work with matrices or "not so conditionals/branch heavy" algorithms.

Using purely CPUs (which seems what VSL currently does and unfortunately seems to aim for) will "penalize" VSL compared to other hybrid frameworks whose API is made to accommodate for running the designated computations also on GPUs without user intervention.

I.e. in runtime it automatically detects available GPUs with the needed capabilities, quickly benchmarks them incl. the upload & download bus performance - think "bogomips" from Linux, then chooses the parts of the code which make sense to run there, and if there is no compile-time appropriate binary snippet to run on GPU it compiles the snippets by something like JIT and finally plumb everything to offload the chosen work to GPUs.

Use Case

To save time to researchers (yes, I often hear complains "I can not use more features/variables in my model as one run with 2500 of rows of input data already takes 2 hours on my modern PC and there are not enough money in the grant to pay for an AWS GPU cluster. So I can not do a full proper analysis but have to instead assume too many things instead of actually analysing them.").
To use ubiquitous HW to its max.

Proposed Solution

No response

Other Information

No response

Acknowledgements

I may be able to implement this feature request
This feature might incur a breaking change

Version used

latest git master

Environment details (OS name and version, etc.)

Any (at least macOS, Windows, Linux).