Wanted to perform Classification task on Sentence Pair
nkathireshan opened this issue · 4 comments
Hi ThilinaRajapakse,
Greetings!
I wanted to build a classification model for Sentence Pair; to be precise its Question and Paragraph pair to predict whether the particular question and the paragraph are a match or not!
I am using Squad data set for training and wanted to use a custom data set for evaluation.
Also, the string length of the question will be very small as compared to the paragraph.
Can I use the same code set with slight changes or do we need to make a major change for it?
Please guide me.
This is rather a request than an Issue, pardon me for using this window for my requirement.
Kathir.
Data for eg:
Question:
When did Namibia stop being a German colony?
Context/Paragraph:
The dry lands of Namibia were inhabited since early times by San Damara and Namaqua and since about the 14th century AD by immigrating Bantu who came with the Bantu expansion Most of the territory became a German Imperial protectorate in 1884 and remained a German colony until the end of World War I In 1920 the League of Nations mandated the country to South Africa which imposed its laws and from 1948 its apartheid policy The port of Walvis Bay and the offshore Penguin Islands had been annexed by the Cape Colony under the British crown by 1878 and had become an integral part of the new Union of South Africa at its creation in 1910
I think this could be better framed as a sentence prediction task rather than a classification task. Something akin to this perhaps. What do you think?
Thank you very much for your response, the post that you have shared is about finding similarity based on its semantics, as mentioned in the post "If we can manually label some data, the results might be even better." I have created a labeled data set from Squad 2.0 and wanted to build a supervised model trained over Squad data and can predict unseen sentence pair.
Something like using the self.text_b = text_b component of the tools.py from your guide. Something similar to this from Gluon, the challenge with gluon's guide is that it is not mentioned how to evaluate the model.
Please advise.
I'm still not seeing how this could be a classification task, unless you were to just concatenate the answers with the questions (some with the corresponding answer, others with random answers), and create a dataset containing matches vs non-matches. That doesn't really feel like a good solution to me, but it is something that could be tried I suppose.
Personally, I would go with sentence prediction. You could try the BertForNextSentencePrediction model and fine-tune it for your task. See this.
I'm sorry, but I'm not familiar with Gluon. I'm not sure what you don't understand regarding evaluating the model though. Can you elaborate?