[Chatllama] Support Inference for trained models.

Question

[Chatllama] Support Inference for trained models.

PierpaoloSorbellini opened this issue 2 years ago · 1 comments

PierpaoloSorbellini commented 2 years ago

Description

Currently to perform inference of the models generated the user needs to interact with the model generated writing a small python script accordingly to how the model is saved by library, by loading the resulting checkpoint or model saved after training.

Moreover a lot of optimization can be integrated to speed-up the inference such as:

CPU Offloading.
llama.ccp implementation
accelerate / deepspeed distributed inference.

TODO

Implement Inference Class to make inference very easy and even possible from CLI.
Implement Inference with the optimisations available from deepspeed
Implement inference with the optimisations available from accelerate
Implement fast lama inference with known library llama.ccp implementation

Answer 1 · 2023-04-07T17:36:15.000Z

@PierpaoloSorbellini The inference section is tagged with WIP. Do we have any basic inference code available in chatllama to load actor_rl model and run few queries ?