Couple of queries: 1) Fine tuned GPT2 2) BPE Encoding

Question

sb1992 opened this issue 5 years ago · 2 comments

Hi
I had a couple of queries.

I was wondering if you could direct me to the part of the code and recommend changes I could make so that i can also calculate this score on my own fine-tuned gpt2 model (which has its own path where it is saved)
I was also thinking that gpt2 uses BPE encoding. So when you return probability score it always returns the probability for the complete word (not the sub units). As far as i understand BPE it divides the token into sub pieces and gives the corresponding ids to those sub pieces. So do you know how is that working internally, that is able to assign probability to complete word ?

Thanks

Thank you

Answer 1 · 2020-05-18T22:46:49.000Z

If you pass the path to your model as model_name to the GPT2LMScorer class it should work.
Right now we already return the probability of each sub unit.