Missing training dataset
VoVAllen opened this issue · 5 comments
VoVAllen commented
The training synthetic data in https://github.com/illuin-tech/colpali/blob/main/colpali_engine/dataset/hf_dataset_names.py#L9-L16 are not available on huggingface
ManuelFay commented
They will be soon !
VoVAllen commented
@ManuelFay Do you have any estimated time for it? I found the debug setting with docvqa dataset worked, but evaluations had some error saying dataset missing.
VoVAllen commented
And thanks for the code and the idea. Very well organized and easy to play with. Appreciate it!
ManuelFay commented
Evaluations should not require training data !
You can replace the dataset loading function with your own if need be :)
Also don't hesitate to check out the other repo for evaluation (vidore-benchmark)
It should be end of the month !
ManuelFay commented