help with the input files
bitcometz opened this issue · 3 comments
bitcometz commented
hello, thanks for your hard work to redo the GeneFormer examples, very greatful job !!!
I want to redo the analysis too, but it is difficult for me to download the whole dataset files.
Just like in this notebook,
could you also provide the input files:"/content/drive/MyDrive/Genecorpus-30M/genecorpus_100K_2048.dataset"
Thanks !!!
kzkedzierska commented
I think they might have used this from the Geneformer dataset: https://huggingface.co/datasets/ctheodoris/Genecorpus-30M/tree/main/genecorpus_30M_2048.dataset
bitcometz commented
thanks!!!
but the original 100M is too big for me to download.
AnjaliS1 commented
Hi, could I also get the 100k subsampled version of the dataset as the original dataset is also too large for me to download?