EleutherAI/gpt-neo

The conversion script doesn’t work

StellaAthena opened this issue · 2 comments

Describe the bug
A clear and concise description of what the bug is.

To Reproduce
Steps to reproduce the behavior:

  1. Run conversion script
  2. Load results into the HuggingFace transformers library
  3. Feed it a context of 450 tokens and then have it generate another 200
  4. Observe that around the 500th token the coherency falls off a cliff

Expected behavior
Performance should not jump off a cliff

Proposed solution
It appears that the problem is the lack of compatibility between the local attention function used in GPT-Neo and the transformers library. While the transformers library does include models with local attention (longformer, for example) it’s not consistent with how the GPT-2 model is defined in the transformers library.

Screenshots
n/a

Environment (please complete the following information):

  • GPUs: v3-8s, Ti1080s, A100s
  • Configs: any config that has local attention

Additional context
Add any other context about the problem here.

The amazing @patil-suraj and @LysandreJik have a preliminary PR for a HF implementation!

huggingface/transformers#10848