How to divide our own dataset into test, dev and train data and assign them labels for fine tuning process

Question

How to divide our own dataset into test, dev and train data and assign them labels for fine tuning process

smruti241 opened this issue 2 years ago · 2 comments

Hi @jerryji1993 , @Zhihan1996 , @project-delphi , @hjgwak , @timlautk ,

I read your paper and its very interesting. I have a dataset which consists of 6-mers only. I want to divide my dataset into test, dev and train data and assign them labels for fine tuning process directly (no pre-training required, I will use pre-trained models). Can you please tell me the procedure or any script is available in the folders of this tool? Please let me know. Thanks!

Answer 1 · 2023-03-20T17:56:59.000Z

Hi yes there is a way to load the models with HuggingFace I have done it in this repository: https://github.com/Moeinh77/Virus-DNA-Classification

Answer 2 · 2023-03-20T18:57:32.000Z

@Moeinh77 can you please tell me how to use it? I didnt understand properly. I have kmer data already (6-mer data). I want to use pre-trained models for fine tuning. I dont have labels added in my kmer data