The conversion script doesn’t work

Question

The conversion script doesn’t work

StellaAthena opened this issue 4 years ago · 2 comments

Describe the bug
A clear and concise description of what the bug is.

To Reproduce
Steps to reproduce the behavior:

Run conversion script
Load results into the HuggingFace transformers library
Feed it a context of 450 tokens and then have it generate another 200
Observe that around the 500th token the coherency falls off a cliff

Expected behavior
Performance should not jump off a cliff

Proposed solution
It appears that the problem is the lack of compatibility between the local attention function used in GPT-Neo and the transformers library. While the transformers library does include models with local attention (longformer, for example) it’s not consistent with how the GPT-2 model is defined in the transformers library.

Screenshots
n/a

Environment (please complete the following information):

GPUs: v3-8s, Ti1080s, A100s
Configs: any config that has local attention

Additional context
Add any other context about the problem here.

Answer 1 · 2021-03-26T13:46:32.000Z

The amazing @patil-suraj and @LysandreJik have a preliminary PR for a HF implementation!

huggingface/transformers#10848

Answer 2 · 2021-03-31T15:17:57.000Z

It's live on HF!

https://huggingface.co/EleutherAI/gpt-neo-2.7B