Cannot download dataset
jaseleephd opened this issue · 14 comments
Downloading the dataset fails. I have read the previous issues (#9 and #11), but the problem doesn't seem to have been resolved. When I run ./generate.sh
, I get:
Downloading http://cs.mcgill.ca/~jpineau/datasets/ubuntu-corpus-1.0/ubuntu_dialogs.tgz to ./ubuntu_dialogs.tgz
Traceback (most recent call last):
File "create_ubuntu_dataset.py", line 404, in <module>
prepare_data_maybe_download(args.data_root)
File "create_ubuntu_dataset.py", line 260, in prepare_data_maybe_download
filepath, _ = urllib.request.urlretrieve(url, archive_path)
File "/usr/lib64/python2.7/urllib.py", line 98, in urlretrieve
return opener.retrieve(url, filename, reporthook, data)
File "/usr/lib64/python2.7/urllib.py", line 245, in retrieve
fp = self.open(url, data)
File "/usr/lib64/python2.7/urllib.py", line 213, in open
return getattr(self, name)(url)
File "/usr/lib64/python2.7/urllib.py", line 357, in open_http
'got a bad status line', None)
IOError: ('http protocol error', 0, 'got a bad status line', None)
The IOError comes from urlretrieve
on http://cs.mcgill.ca/~jpineau/datasets/ubuntu-corpus-1.0/ubuntu_dialogs.tgz
Doing wget http://cs.mcgill.ca/~jpineau/datasets/ubuntu-corpus-1.0/ubuntu_dialogs.tgz
also fails. Can anybody tell me how else to download the dataset? Thanks a lot in advance!
facing same issue. is it again related to mcgill servers?
@jasonleeinf have you solved the problem? i come across the same error
Same here: the dataset has the wrong permissions:
./generate.sh
Downloading http://cs.mcgill.ca/~jpineau/datasets/ubuntu-corpus-1.0/ubuntu_dialogs.tgz to ./ubuntu_dialogs.tgz
Successfully downloaded ./ubuntu_dialogs.tgz
Unpacking dialogs ...
Traceback (most recent call last):
File "create_ubuntu_dataset.py", line 404, in <module>
prepare_data_maybe_download(args.data_root)
File "create_ubuntu_dataset.py", line 266, in prepare_data_maybe_download
with tarfile.open(archive_path) as tar:
File "/home/dani/anaconda3/envs/ubuntudialogue/lib/python2.7/tarfile.py", line 1680, in open
raise ReadError("file could not be opened successfully")
tarfile.ReadError: file could not be opened successfully
Unpacking dialogs ...
Traceback (most recent call last):
File "create_ubuntu_dataset.py", line 404, in <module>
prepare_data_maybe_download(args.data_root)
File "create_ubuntu_dataset.py", line 266, in prepare_data_maybe_download
with tarfile.open(archive_path) as tar:
File "/home/dani/anaconda3/envs/ubuntudialogue/lib/python2.7/tarfile.py", line 1680, in open
raise ReadError("file could not be opened successfully")
tarfile.ReadError: file could not be opened successfully
Unpacking dialogs ...
Traceback (most recent call last):
File "create_ubuntu_dataset.py", line 404, in <module>
prepare_data_maybe_download(args.data_root)
File "create_ubuntu_dataset.py", line 266, in prepare_data_maybe_download
with tarfile.open(archive_path) as tar:
File "/home/dani/anaconda3/envs/ubuntudialogue/lib/python2.7/tarfile.py", line 1680, in open
raise ReadError("file could not be opened successfully")
tarfile.ReadError: file could not be opened successfully
The problem is with permissions, as shown by:
wget http://cs.mcgill.ca/~jpineau/datasets/ubuntu-corpus-1.0/ubuntu_dialogs.tgz
--2017-01-11 11:18:17-- http://cs.mcgill.ca/~jpineau/datasets/ubuntu-corpus-1.0/ubuntu_dialogs.tgz
Resolving cs.mcgill.ca (cs.mcgill.ca)... 132.206.51.10
Connecting to cs.mcgill.ca (cs.mcgill.ca)|132.206.51.10|:80... connected.
HTTP request sent, awaiting response... 403 Forbidden
2017-01-11 11:18:17 ERROR 403: Forbidden.
If I browse in chrome to http://cs.mcgill.ca/~jpineau/datasets/ubuntu-corpus-1.0/ubuntu_dialogs.tgz
I get
Forbidden
You don't have permission to access /~jpineau/datasets/ubuntu-corpus-1.0/ubuntu_dialogs.tgz on this server.
Server unable to read htaccess file, denying access to be safe
Apache/2.4.18 (Ubuntu) Server at cs.mcgill.ca Port 80
The directory itself is probably not readable by "others"; if I browse to http://cs.mcgill.ca/~jpineau/datasets/
I get:
See http://stackoverflow.com/questions/27890751/magento-new-host-403-forbidden-server-unable-to-read-htaccess-file
or http://stackoverflow.com/questions/31365981/server-unable-to-read-htaccess-file-denying-access-to-be-safe
on how to fix; basically, chmod -R o+r *
on the datasets/ubuntu-corpus-1.0
directory.
One more adding to the choir: is there any chance this will be available again?
@ryan-lowe Do you have access to the servers? There is apparently something wrong with permissions.
Sorry for the delayed reply. I do not have permissions, but I just sent an e-mail to Joelle Pineau and to the CS technical people at McGill who will be able to sort it out. I think it will take a most a few days
Again, apologies for the inconvenience. I think that if there is a chance that this keeps happening (it's the 2nd time at least), we will try to move it to a more permanent location. @rkadlec, would IBM be amenable to this?
Okay, so it turns out the tech admins have just fixed the issue -- apparently it was a permissions problem. If any problems persist, please let me know!
Great, thanks! I'm running generate.sh right now and it seems ok.
Closing this issue since the hosting works fine over the last month.
Getting empty response on the request 8 out of 10 times and even if the download starts, it just stops around 900KB.
Facing the issue ERR_EMPTY_RESPONSE as @thepsyntist reported. Is there something wrong with the server? @ryan-lowe Thanks a lot!
Thanks for the heads up, I'll ask the McGill tech support people to look into it.
Okay, it should be fixed now! @tomyoung96 @thepsyntist