google/prettytensor

Shakespeare file downloaded is not divisible by batch size; exposes flaw in data_utils.shakespeare

jkahn opened this issue · 1 comments

jkahn commented

Hi @eiderman !

In data_utils

  file_name = maybe_download('http://cs.stanford.edu/people/karpathy/char-rnn/',
                             'shakespear.txt')

downloads a file of size 99993. But this is not compatible with the numpy.reshape command if the chunk size is relatively prime to the file size. Downloading a different shakespeare file instead (shakespeare_input.txt from the same directory) yields a file divisible by the chunk size, but the right fix is probably to pad or trim the array before reshaping.

Successfully downloaded shakespear.txt 99993 bytes.
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-12-7111020859c7> in <module>()
----> 1 data_utils.shakespeare(2)

.../python2.7/site-packages/prettytensor/tutorial/data_utils.pyc in shakespeare(chunk_size)
    130   arr = np.array([convert_to_int(c) for c in shakespeare_full])[
    131       0:len(shakespeare_full) / chunk_size * chunk_size]
--> 132   return arr.reshape((len(arr) / chunk_size, chunk_size))
    133 
    134 

ValueError: total size of new array must be unchanged

Thanks for the bug! Fixed in the latest push; pip will follow shortly.