Integrations of Models from BEND, data loaders, and fine-tuning methods
Zehui127 opened this issue · 1 comments
Zehui127 commented
In order to benchmark the existing genomic models with our dataset, the following task needs to be compeleted:
- GenomicVariants Dataloader: this data loader should take the name of the dataset as the input and output the standard format of a batch. E.g.
(x,y) = GenomicVariantsDataloader('clivar')
, where x is a triple: (ref, alt,annote), ref is the reference genome, alt is the mutated sequence, and annotate is the annotations about the sequence, it should be the species name, organism names, ... - Bend Models Integration: It should be a wrapper class of the embedder class in BEND. Assume we name the wrapper class as
BaseModel
, we should be able to perform evaluation and fine-tuning with the followingmodel = BaseModel('DNABERT'), y = model(x)
- Finetunning: this should include a training script and training class, it should take the BaseModel as the input and trigger the fine-tuning procedure. The fine-tuning algorithms include linear head fine-tuning, soft-prompting, and full-size fine-tuning. As an extension, it could include low-rank fine-tuning methods, such as Lora.
Zehui127 commented
initial finetuning framework is done. Open new issue for adding additional features for fine-tuning