ibm-aur-nlp/PubLayNet

Checksum for PubLayNet_PDF.tar.gz

conjuncts opened this issue · 1 comments

Hello,
I tried downloading the pdf dataset, but I only unzipped around 10% before I ran into a data corruption issue. Are checksums or data splits available for the PubLayNet_PDF.tar.gz?

It sounds like you're encountering issues with downloading the PubLayNet dataset. Unfortunately, without specific details about where you're downloading the dataset from, it's challenging to provide a precise solution for me. However, I can offer some general advice for ya.

  1. Check for Official Sources: Ensure that you're downloading the dataset from the official source. This is very typical.
  2. Checksums: Check if the dataset provider offers checksums for the files.
  3. Data Splits: Some datasets are split into multiple parts for easier downloading. Ensure that you've downloaded all parts.
  4. Redownload: If you suspect the downloaded file is corrupted, try downloading it again. It works sometimes.
    Akif, the outlier