Vectorizing MNLI inference
forrestbao opened this issue · 4 comments
The two segments of code below for MNLI is too slow. Should use vectorized version to speed up.
The approach below computes a pair of sentences each time. It is too slow. Please see whether you can find an API that computes causality between multiple pairs each time.
Huggingface's zero-shot classification task can do it. See my example at the end.
In [1]: from transformers import pipeline
In [2]: classifier = pipeline("zero-shot-classification",
...: model="facebook/bart-large-mnli")
Downloading: 100%|███████████████████████████████████████████████████████████████████████████████████████| 1.13k/1.13k [00:00<00:00, 950kB/s]
Downloading: 100%|██████████████████████████████████████████████████████████████████████████████████████| 1.52G/1.52G [01:18<00:00, 20.8MB/s]
Downloading: 100%|████████████████████████████████████████████████████████████████████████████████████████| 26.0/26.0 [00:00<00:00, 16.7kB/s]
Downloading: 100%|████████████████████████████████████████████████████████████████████████████████████████| 878k/878k [00:00<00:00, 2.36MB/s]
Downloading: 100%|████████████████████████████████████████████████████████████████████████████████████████| 446k/446k [00:00<00:00, 1.34MB/s]
Downloading: 100%|██████████████████████████████████████████████████████████████████████████████████████| 1.29M/1.29M [00:00<00:00, 2.92MB/s]
In [3]: sequence_to_classify = ["one day I will see the world", "i love swing dance"]
In [5]: candidate_labels = ['This blog is about summer.', 'This is my Friday night plan.']
...: classifier(sequence_to_classify, candidate_labels)
Out[5]:
[{'sequence': 'one day I will see the world',
'labels': ['This blog is about summer.', 'This is my Friday night plan.'],
'scores': [0.7098779678344727, 0.2901219427585602]},
{'sequence': 'i love swing dance',
'labels': ['This is my Friday night plan.', 'This blog is about summer.'],
'scores': [0.6118907332420349, 0.3881092965602875]}]
and, in the final paper, we can show results using different LMs. BART-MNLI is one and original RoBERTA-MNLI is also one.
The zero-shot one gives a lot different result than the text-classification task even with the same labels. I will show you on tomorrow's meeting.
Maybe the reason is because the base model changes from RoBERTa to BART. Maybe we should use a RoBERTa-based model to be fair. https://huggingface.co/roberta-large-mnli
Ref: #10
so per the discussion this afternoon, we will just vectorize this code below and forget about the zero-shot approach which seems to have issue with long sentences.