nshepperd/gpt-2

Sampling structure looks weird. Maybe becuase I'm structuring my data wrongly?

dimeldo opened this issue · 0 comments

My data, before encoding, looks like this:

Person 1: Something something something.
Person 2: Something something something.
Person 1: Something something something.

<|endoftext|>

Person 1: Something something something.
Person 2: Something something something.
Person 1: Something something something.

<|endoftext|>


But while training I get something that looks like this when sampling:

Generating samples...
======== SAMPLE 2 ========

<|endoftext|>

Person 1: Something something something.
Person 2: Something something something.
Person 1: Something something something.

<|endoftext|>

Or:

Generating samples...
======== SAMPLE 2 ========
Something Something Something Something Something Something Something Something Something Something Something Something Something Something Something Something Something Something Something Something...

<|endoftext|>

Person 1: Something something something.
Person 2: Something something something.
Person 1: Something something something.

<|endoftext|>

As you can see, the sample starts weirdly, not in the same structure as my original data when starts sampling but after that, they sort of correct themselves. Does somebody know what causing this?