Shakespeare file downloaded is not divisible by batch size; exposes flaw in data_utils.shakespeare
jkahn opened this issue · 1 comments
jkahn commented
Hi @eiderman !
In data_utils
file_name = maybe_download('http://cs.stanford.edu/people/karpathy/char-rnn/',
'shakespear.txt')
downloads a file of size 99993. But this is not compatible with the numpy.reshape
command if the chunk size is relatively prime to the file size. Downloading a different shakespeare file instead (shakespeare_input.txt
from the same directory) yields a file divisible by the chunk size, but the right fix is probably to pad or trim the array before reshaping.
Successfully downloaded shakespear.txt 99993 bytes.
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-12-7111020859c7> in <module>()
----> 1 data_utils.shakespeare(2)
.../python2.7/site-packages/prettytensor/tutorial/data_utils.pyc in shakespeare(chunk_size)
130 arr = np.array([convert_to_int(c) for c in shakespeare_full])[
131 0:len(shakespeare_full) / chunk_size * chunk_size]
--> 132 return arr.reshape((len(arr) / chunk_size, chunk_size))
133
134
ValueError: total size of new array must be unchanged
eiderman commented
Thanks for the bug! Fixed in the latest push; pip will follow shortly.