gordicaleksa/pytorch-original-transformer

Error when running "python training_script.py --batch_size 100 --dataset_name IWSLT --language_direction G2E

minertom opened this issue · 2 comments

Not sure what is going on here but the best that I can tell is that there is a gzip file that seems to be missing.

Thank You
Tom

Traceback (most recent call last):
File "/home/tom/anaconda3/envs/pytorch-transformer/lib/python3.8/tarfile.py", line 1670, in gzopen
t = cls.taropen(name, mode, fileobj, **kwargs)
File "/home/tom/anaconda3/envs/pytorch-transformer/lib/python3.8/tarfile.py", line 1647, in taropen
return cls(name, mode, fileobj, **kwargs)
File "/home/tom/anaconda3/envs/pytorch-transformer/lib/python3.8/tarfile.py", line 1510, in init
self.firstmember = self.next()
File "/home/tom/anaconda3/envs/pytorch-transformer/lib/python3.8/tarfile.py", line 2311, in next
tarinfo = self.tarinfo.fromtarfile(self)
File "/home/tom/anaconda3/envs/pytorch-transformer/lib/python3.8/tarfile.py", line 1102, in fromtarfile
buf = tarfile.fileobj.read(BLOCKSIZE)
File "/home/tom/anaconda3/envs/pytorch-transformer/lib/python3.8/gzip.py", line 292, in read
return self._buffer.read(size)
File "/home/tom/anaconda3/envs/pytorch-transformer/lib/python3.8/_compression.py", line 68, in readinto
data = self.read(len(byte_view))
File "/home/tom/anaconda3/envs/pytorch-transformer/lib/python3.8/gzip.py", line 479, in read
if not self._read_gzip_header():
File "/home/tom/anaconda3/envs/pytorch-transformer/lib/python3.8/gzip.py", line 427, in _read_gzip_header
raise BadGzipFile('Not a gzipped file (%r)' % magic)
gzip.BadGzipFile: Not a gzipped file (b'<!')

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "training_script.py", line 192, in
train_transformer(training_config)
File "training_script.py", line 103, in train_transformer
train_token_ids_loader, val_token_ids_loader, src_field_processor, trg_field_processor = get_data_loaders(
File "/home/tom/Downloads/pytorch-original-transformer/utils/data_utils.py", line 223, in get_data_loaders
train_dataset, val_dataset, src_field_processor, trg_field_processor = get_datasets_and_vocabs(dataset_path, language_direction, dataset_name == DatasetType.IWSLT.name)
File "/home/tom/Downloads/pytorch-original-transformer/utils/data_utils.py", line 151, in get_datasets_and_vocabs
train_dataset, val_dataset, test_dataset = dataset_split_fn(
File "/home/tom/anaconda3/envs/pytorch-transformer/lib/python3.8/site-packages/torchtext/datasets/translation.py", line 144, in splits
path = cls.download(root, check=check)
File "/home/tom/anaconda3/envs/pytorch-transformer/lib/python3.8/site-packages/torchtext/data/dataset.py", line 191, in download
with tarfile.open(zpath, 'r:gz') as tar:
File "/home/tom/anaconda3/envs/pytorch-transformer/lib/python3.8/tarfile.py", line 1617, in open
return func(name, filemode, fileobj, **kwargs)
File "/home/tom/anaconda3/envs/pytorch-transformer/lib/python3.8/tarfile.py", line 1674, in gzopen
raise ReadError("not a gzip file")
tarfile.ReadError: not a gzip file

I got the same bug now,how to solve it?

Same problem here, is there any solutions?