ibm-aur-nlp/PubLayNet

Download questions

Closed this issue · 9 comments

Great job,thank you for sharing such large-scale document data. However,the speed which i download these datasets is very slow. And, it often disconnected downloads, is there any other way to get these datasets?

zhxgj commented

@phexic thanks for your interest. We will look into this issue and solve it asap.

@zhxgj Great!

zhxgj commented

@phexic we tested a few geographic regions and got decent downloading speed from Box. Can you please let us know which geographic region are you downloading the data from?

@zhxgj Maybe the reason I'm in China

zhxgj commented

@phexic Em, maybe Box does not well in China. Let me try to work out a solution for you.

@zhxgj Oh, Wow! thanks a million.
Will you public the pre-training models about document layout?

zhxgj commented

@zhxgj Oh, Wow! thanks a million.
Will you public the pre-training models about document layout?

@phexic This is a great suggestion. I will follow up with our legal team regarding releasing the pre-trained model and maybe the training config file.

@zhxgj Oh, Wow! thanks a million.
Will you public the pre-training models about document layout?

@phexic This is a great suggestion. I will follow up with our legal team regarding releasing the pre-trained model and maybe the training config file.

@zhxgj Hi, any news from your legal team whether you can release the pre-trained models?

zhxgj commented

Hi @phexic The data has been migrated to IBM DAX platform. I think the download should be more stable now. Please see the instructions in README