karel-brinda/Phylign

Downloading is not working and `make test` fails

leoisl opened this issue · 6 comments

leoisl commented

Download of compressed assemblies and COBS indexes is not working, and then make test fails, as would any execution of the pipeline. You might have not experienced this issue as the files were already downloaded so the download rules were skipped. It is failing due to a redirection, any file requested redirects to another url, and what we download is actually a HTML file describing this redirection, e.g.:

$ cat asms/bacillus_anthracis__01.tar.xz 
<!doctype html>
<html lang=en>
<title>Redirecting...</title>
<h1>Redirecting...</h1>
<p>You should be redirected automatically to the target URL: <a href="/records/4602622/files/bacillus_anthracis__01.tar.xz">/records/4602622/files/bacillus_anthracis__01.tar.xz</a>. If not, click the link.

Adding the -L parameter to curl fixes this, and I am pushing a PR right after opening this issue to fix it.
I am also re-enabling the CI tests so we ensure make test works on each PR.

Hi Leandro, thanks for reporting this issue! One question – when you experienced this, this Snakemake fail after it downloaded HTML? We must ensure that it's the download rule that fails in such a case.

Did Zenodo changed urls and is now giving 301 or 302?

leoisl commented

Hi Leandro, thanks for reporting this issue! One question – when you experienced this, this Snakemake fail after it downloaded HTML? We must ensure that it's the download rule that fails in such a case.

It did fail! So this is good! See this excerpt:

File cobs/actinobacillus_pleuropneumoniae__01.cobs_classic.xz is too small, likely corrupted
File cobs/bacillus_anthracis__01.cobs_classic.xz is too small, likely corrupted
File cobs/bacillus_anthracis__01.cobs_classic.xz is not a valid xz archive
File cobs/actinobacillus_pleuropneumoniae__01.cobs_classic.xz is not a valid xz archive
[Tue Nov 21 16:01:31 2023]
Error in rule download_cobs_batch:
    jobid: 6
    output: cobs/bacillus_anthracis__01.cobs_classic.xz
    shell:
        
        curl "https://zenodo.org/record/6845083/files/bacillus_anthracis__01.cobs_classic.xz"  > cobs/bacillus_anthracis__01.cobs_classic.xz
        scripts/test_xz.py cobs/bacillus_anthracis__01.cobs_classic.xz
        
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Removing output files of failed job download_cobs_batch since they might be corrupted:
cobs/bacillus_anthracis__01.cobs_classic.xz
[Tue Nov 21 16:01:31 2023]
Error in rule download_cobs_batch:
    jobid: 4
    output: cobs/actinobacillus_pleuropneumoniae__01.cobs_classic.xz
    shell:
        
        curl "https://zenodo.org/record/6845083/files/actinobacillus_pleuropneumoniae__01.cobs_classic.xz"  > cobs/actinobacillus_pleuropneumoniae__01.cobs_classic.xz
        scripts/test_xz.py cobs/actinobacillus_pleuropneumoniae__01.cobs_classic.xz
        
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Removing output files of failed job download_cobs_batch since they might be corrupted:
cobs/actinobacillus_pleuropneumoniae__01.cobs_classic.xz
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2023-11-21T160128.278236.snakemake.log
make: *** [Makefile:33: test] Error 1

Did Zenodo changed urls and is now giving 301 or 302?

I don't actually know... all I know is that I added the -L to curl and it is working again...

leoisl commented

It is giving 301 indeed:

$ curl -v "https://zenodo.org/record/6845083/files/actinobacillus_pleuropneumoniae__01.cobs_classic.xz"
*   Trying 188.184.98.238...
* TCP_NODELAY set
* Connected to zenodo.org (188.184.98.238) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/pki/tls/certs/ca-bundle.crt
  CApath: none
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, [no content] (0):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN, server did not agree to a protocol
* Server certificate:
*  subject: CN=*.zenodo.org
*  start date: May 10 00:00:00 2023 GMT
*  expire date: May 11 23:59:59 2024 GMT
*  subjectAltName: host "zenodo.org" matched cert's "zenodo.org"
*  issuer: C=GB; ST=Greater Manchester; L=Salford; O=Sectigo Limited; CN=Sectigo RSA Domain Validation Secure Server CA
*  SSL certificate verify ok.
* TLSv1.3 (OUT), TLS app data, [no content] (0):
> GET /record/6845083/files/actinobacillus_pleuropneumoniae__01.cobs_classic.xz HTTP/1.1
> Host: zenodo.org
> User-Agent: curl/7.61.1
> Accept: */*
> 
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* TLSv1.3 (IN), TLS app data, [no content] (0):
< HTTP/1.1 301 MOVED PERMANENTLY
< server: nginx
< date: Tue, 21 Nov 2023 16:06:10 GMT
< content-type: text/html; charset=utf-8
< content-length: 335
< location: /records/6845083/files/actinobacillus_pleuropneumoniae__01.cobs_classic.xz
< x-ratelimit-limit: 133
< x-ratelimit-remaining: 131
< x-ratelimit-reset: 1700582831
< retry-after: 60
< permissions-policy: interest-cohort=()
< x-frame-options: sameorigin
< x-xss-protection: 1; mode=block
< x-content-type-options: nosniff
< content-security-policy: default-src 'self' fonts.googleapis.com *.gstatic.com data: 'unsafe-inline' 'unsafe-eval' blob: zenodo-broker.web.cern.ch zenodo-broker-qa.web.cern.ch maxcdn.bootstrapcdn.com cdnjs.cloudflare.com ajax.googleapis.com webanalytics.web.cern.ch
< strict-transport-security: max-age=31556926; includeSubDomains
< referrer-policy: strict-origin-when-cross-origin
< set-cookie: session=851df15f180ed949_655cd572.CdMxRQYpFWimq1B8_kkMGx2uLDA; Expires=Fri, 22 Dec 2023 16:06:10 GMT; Secure; HttpOnly; Path=/; SameSite=Lax
< strict-transport-security: max-age=15768000
< x-request-id: db9b9a7e065be37f02793fa3333b4eda
< set-cookie: 5569e5a730cade8ff2b54f1e815f3670=b34ccc17b5fa6cc18b13e4d0c9212d0a; path=/; HttpOnly; Secure; SameSite=None
< cache-control: private
< 
<!doctype html>
<html lang=en>
<title>Redirecting...</title>
<h1>Redirecting...</h1>
<p>You should be redirected automatically to the target URL: <a href="/records/6845083/files/actinobacillus_pleuropneumoniae__01.cobs_classic.xz">/records/6845083/files/actinobacillus_pleuropneumoniae__01.cobs_classic.xz</a>. If not, click the link.
* Connection #0 to host zenodo.org left intact
leoisl commented

Closed via #235