input token tensor has been truncated

Question

input token tensor has been truncated

Closed this issue 4 years ago · 8 comments

When I try to train DuoRAT with a slightly tweaked duorat-finetune-bert-large.jsonnet, I received a number of warnings during training (~5 warnings per 10 steps). I want to confirm whether they are expected. Thank you.

2020-12-20 00:45:45 WARNING: input token tensor has been truncated to 512 tokens, original length was 516 tokens
2020-12-20 00:45:53 WARNING: input token tensor has been truncated to 512 tokens, original length was 1367 tokens
2020-12-20 00:45:56 WARNING: source length exceeds maximum source length, 398 > 200, skipping
2020-12-20 00:45:57 WARNING: input token tensor has been truncated to 512 tokens, original length was 524 tokens
2020-12-20 00:45:58 WARNING: input token tensor has been truncated to 512 tokens, original length was 1362 tokens
2020-12-20 00:46:02 WARNING: source length exceeds maximum source length, 393 > 200, skipping

Answer 1 · 2020-12-20T21:58:51.000Z

Hi @pckennethma, thanks for checking out our model code!
The warning you are seeing are normal.
BERT can only process sequences of up to 512 tokens in length which is exceeded by a small number of training and evaluation examples. This is the input token tensor warning you are seeing.
The source length warning is emitted because the number of representations into the RAT encoder layers is too large (200 is the hard maximum as determined by the base configuration). This can happen for very long questions and/or a large number of columns in the schema. Unless this happens a lot, this is also a minor concern. Very large schemas with many tables and columns (like Spider’s baseball1) are not something a model trained on typical Spider schemas can reason about.

Answer 2 · 2020-12-21T02:46:53.000Z

Thank you for your prompt reply!

Answer 3 · 2020-12-31T14:34:59.000Z

@tscholak Hi, I'm wondering how the `200 is the hard maximum as determined by the base configuration' to be determined? Could we change it during training?

Thanks in advance!

Answer 4 · 2020-12-31T14:40:50.000Z

@tscholak Hi, I'm wondering how the `200 is the hard maximum as determined by the base configuration' to be determined? Could we change it during training?

Thanks in advance!

If I am correct, it is configured in line 38 in configs/duorat/duorat-finetune-bert-large.jsonnet.

Answer 5 · 2020-12-31T14:50:28.000Z

@pckennethma Thanks for your reply.
Has the ~5 warnings per 10 steps influence the final results? It looks that the parser skip many items during training.
If not, could I set the max_source_length larger than 200.
If so, why this number is set to 200?

Answer 6 · 2020-12-31T14:56:10.000Z

For your reference. I followed the default max_source_length and got ~63% accuracy when disabling cv linking (due to my own need). I think the result is generally satisfying.

Answer 7 · 2020-12-31T15:42:19.000Z

@pckennethma Get it! Thank you

Answer 8 · 2020-12-31T15:50:18.000Z

Hi! The setting of 200 for the max source length was chosen so that:

most examples of the spider dataset would fit (except for a few outliers where the schema contains unusually many tables and/or columns);
batch lengths could be larger;
training would not run OOM for the hardware available to us.
It's a compromise.