[Chatllama] Evaluation Function and Loop with metrics

Question

[Chatllama] Evaluation Function and Loop with metrics

PierpaoloSorbellini opened this issue 2 years ago · 0 comments

PierpaoloSorbellini commented 2 years ago

Description

Currently each training loop has an evaluation loop but it is not debugged nor used so far.

It needs to be generalised to be launched also outside the training activities, and to support specific language modelling metrics.
It would be nice if a report can be generated highlighting the performance achieved also in comparison with other models.

TODO

Understand that libraries such as openai/evals or FastChat can be adapted to be used as an evaluation tool
Debug Evaluation of the model.
Collect and Compute relevant metrics.
Launch the evaluation loop also outside the training.
Produce a meaningful report that can compare the performance of one or more models.