Tanuki/tanuki.py

Optionally delegate classifiers to XGBoost for finetuning and inference

JackHopkins opened this issue · 0 comments

Is your feature request related to a problem? Please describe.
LLMs are extremely inefficient at classification. XGBoost is better if the data is available. We could use the aligned data from the LLM to train an XGBoost model, which would be much faster to run.

Describe the solution you'd like
When the output types denote a classification task (i.e where the goal is to sample one type in a union of literal types, or an enum), we optionally distil the teacher model into a decision forest using the XGBoost library.

Additional context
We could represent student models as optional packages, sort of like drivers, that the user could install through PIP.

E.g pip3 install tanuki.py[xgboost]