gordicaleksa/pytorch-original-transformer

issue when command :python training_script.py --batchsize 2 -- dataset_name IWSLT --language_direction G2E

adamas-v opened this issue · 2 comments

downloading de-en.tgz

File "training_script.py", line 103, in train_transformer
train_token_ids_loader, val_token_ids_loader, src_field_processor, trg_field_processor = get_data_loaders(

tarfile.ReadError: not a gzip file

Use Multi30k instead of IWSLT.

Thanks!

Use Multi30k instead of IWSLT.

Thanks!

I met the same problem:
`
$ export CUDA_VISIBLE_DEVICES=3 && python training_script.py --batch_size 1500 --dataset_name IWSLT --language_direction G2E

downloading de-en.tgz
de-en.tgz: 96.9kB [00:00, 12.0MB/s]
Traceback (most recent call last):
File "/home/xueshengke/anaconda3/envs/transformer_pytorch/lib/python3.6/tarfile.py", line 1643, in gzopen
t = cls.taropen(name, mode, fileobj, **kwargs)
File "/home/xueshengke/anaconda3/envs/transformer_pytorch/lib/python3.6/tarfile.py", line 1619, in taropen
return cls(name, mode, fileobj, **kwargs)
File "/home/xueshengke/anaconda3/envs/transformer_pytorch/lib/python3.6/tarfile.py", line 1482, in init
self.firstmember = self.next()
File "/home/xueshengke/anaconda3/envs/transformer_pytorch/lib/python3.6/tarfile.py", line 2297, in next
tarinfo = self.tarinfo.fromtarfile(self)
File "/home/xueshengke/anaconda3/envs/transformer_pytorch/lib/python3.6/tarfile.py", line 1092, in fromtarfile
buf = tarfile.fileobj.read(BLOCKSIZE)
File "/home/xueshengke/anaconda3/envs/transformer_pytorch/lib/python3.6/gzip.py", line 276, in read
return self._buffer.read(size)
File "/home/xueshengke/anaconda3/envs/transformer_pytorch/lib/python3.6/_compression.py", line 68, in readinto
data = self.read(len(byte_view))
File "/home/xueshengke/anaconda3/envs/transformer_pytorch/lib/python3.6/gzip.py", line 463, in read
if not self._read_gzip_header():
File "/home/xueshengke/anaconda3/envs/transformer_pytorch/lib/python3.6/gzip.py", line 411, in _read_gzip_header
raise OSError('Not a gzipped file (%r)' % magic)
OSError: Not a gzipped file (b'<!')

tarfile.ReadError: not a gzip file
`

How do you mean "Use Multi30k"? this code can only support 'IWSLT' and 'WMT14' now.