length of transformer's output cannot match input's

Question

length of transformer's output cannot match input's

quailwwk opened this issue 4 years ago · 1 comments

Hi, I try to apply the pre-trained model "Transformer" on some fasta files.Just like:
tape-embed transformer 1a0b_1_A.fasta 1a0b_1_A.npz bert-base --full_sequence_embed
where 1a0b_1_A.fasta contains a 117-length sequence:

>1a0b_1_A
KSEALLDIPMLEQYLELVGPKLITDGLAVFEKMMPGYVSVLESNLTAQDKKGIVEEGHKIKGAAGSVGLRHLQQLGQQIQSPDLPAWEDNVGEWIEEMKEEWRHDVEVLKAWVAKAT

But the 'seq' result in the output npz file has a shape of [119, 768].

>>> np.load('1a0b_1_A.npz',allow_pickle=True)['1a0b_1_A'].item()['seq'].shape
(119, 768)

Is there any mistake in my usage? Or it's just a normal result?
If the latter, how can I map the result to the input sequence? Thanks!

Answer 1 · 2020-11-08T16:01:45.000Z

TAPE adds a start and end token to each sequence, so you can remove the first and last position to get a per-residue embedding.