Training Details
vateye opened this issue · 1 comments
vateye commented
Hi, I have a question about the pre-training stage for OBELISC. Did you use the whole document for pre-training or just the image and its paired text for training?
HugoLaurencon commented
Hi @vateye thanks for your question. We used the whole document when it was possible. However, some documents contain a number of tokens higher than the maximum allowed by the pre-trained LM, so in this case we had to truncate.