UB-Mannheim/tesseract

Link fo file download not found. (404 error) (https://digi.bib.uni-mannheim.de/tesseract/tesseract-ocr-w64-setup-5.3.1.20230401.exe)

Opened this issue · 12 comments

Current Behavior

No response

Expected Behavior

No response

Suggested Fix

No response

tesseract -v

No response

Operating System

No response

Other Operating System

No response

uname -a

No response

Compiler

No response

CPU

No response

Virtualization / Containers

No response

Other Information

No response

stweil commented

It works for me. When did you try the download? Do you still have a problem?

Link works for me too.

Had the same issue. Making my DNS automatic instead of Manual solved the problem. I was using CloudFare DNS

kaixxx commented

I had the same problem. Might have to do with the cookie policy. After I visited https://digi.bib.uni-mannheim.de/ and answered to the cookie question (I denied cookies), the download links worked fine.

stweil commented

The download link does not use any cookies. I think there is a DNS problem if downloads fail. Usually a retry (maybe later) should help. If you report the exact time (including time zone) of failing downloads I can also check the web server protocol for possible failures.

I encountered this about 30 min ago

Line |
   2 |  Invoke-WebRequest -Uri "https://digi.bib.uni-mannheim.de/tesseract/te …
     |  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
     | No such host is known.

Would it be better if there were mirrors that hosted these files? I would happily download from a mirror that's a little closer to Australia rather than pulling data from the other side of the planet (I'm grateful I actually can even download files from so far away and it works so well most of the time)

anphex commented

It's an on-off-behaviour guys. I tried some python experiments on multiple machines and it was hit or miss.
If an university in one of the most developed countries in the world isn't capable of running a basic website, that's all you need to know about IT progress in Germany.

stweil commented

@anphex, a more detailed bug report would be helpful. The web server is up more than 99.9% of the time, only restarted when necessary due to a new Linux kernel. What exactly is failing? Are you getting timeouts? Is name resolution failing? From which part of the world are downloads failing?

anphex commented

@anphex, a more detailed bug report would be helpful. The web server is up more than 99.9% of the time, only restarted when necessary due to a new Linux kernel. What exactly is failing? Are you getting timeouts? Is name resolution failing? From which part of the world are downloads failing?

Good morning! I was really annoyed yesterday because installing the tesseract exe was one of the last parts of finishing a script and it was already late. Sorry for my mean comment. The only thing I can "confirm" through my chrome history is that there was no connection possible at 22:25 German time.

@stweil

If it helps, I can give date/times when it failed to download in my build process vs when it download successfully

Times when file download failed:

Times when file download succeeded:

As you can see there is often only a few seconds between a "No such host is known." error or the file being downloaded.

I hope this is helpful in finding the issue,

Dave.

stweil commented

@damies13, that's a special case where the access was not possible most of the time because of a heavy denial of service attack which lasted more than 24 hours.