It analyses the text enterd by a user and analyses whether the text is sarcastic or not. Sarcasm, which is both positively funny and negatively nasty, plays an important part in human social interaction. Sarcasm detection is a very narrow research field in NLP, a specific case of sentiment analysis where instead of detecting a sentiment in the whole spectrum, the focus is on sarcasm. Therefore the task of this field is to detect if a given text is sarcastic or not.
Some people could disagree about its purpose, but there is a convention in that people use positive words in order to convey a negative message. Of course, it varies through person to person and is highly dependent on the culture, gender and many other aspects. Americans and Indians for example, perceive sarcasm differently. Moreover, someone being sarcastic doesn’t mean the other person perceiving it as the speaker intended. This subjectivity will have implications in the performance of DL models.
Past studies in Sarcasm Detection mostly make use of Twitter datasets collected using hashtag based supervision but such datasets are noisy in terms of labels and language. Furthermore, many tweets are replies to other tweets and detecting sarcasm in these requires the availability of contextual tweets.
To overcome the limitations related to noise in Twitter datasets, this News Headlines dataset for Sarcasm Detection is collected from two news website. TheOnion aims at producing sarcastic versions of current events and we collected all the headlines from News in Brief and News in Photos categories (which are sarcastic). We collect real (and non-sarcastic) news headlines from HuffPost.
This new dataset has following advantages over the existing Twitter datasets:
Since news headlines are written by professionals in a formal manner, there are no spelling mistakes and informal usage. This reduces the sparsity and also increases the chance of finding pre-trained embeddings.
Furthermore, since the sole purpose of TheOnion is to publish sarcastic news, we get high-quality labels with much less noise as compared to Twitter datasets.
Unlike tweets which are replies to other tweets, the news headlines we obtained are self-contained. This would help us in teasing apart the real sarcastic elements.
Each record consists of three attributes:
is_sarcastic: 1 if the record is sarcastic otherwise 0
headline: the headline of the news article
article_link: link to the original news article. Useful in collecting supplementary data
General statistics of data, instructions on how to read the data in python, and basic exploratory analysis could be found at this GitHub repo. A hybrid NN architecture trained on this dataset can be found at this GitHub repo.
Can you identify sarcastic sentences? Can you distinguish between fake news and legitimate news?
This chatbot was build using following frameworks, libraries and softwares.
To run this project you need to follow the following steps.
To run this project you need to follow the following steps.
These are the prerequisites you need to build this bot as well as run it.
cmd:\ pip install tensorflow
cmd:\ pip install keras
- Create conda environment and create project in this environment
- After installing requirements in above Modules LIST
- You need python idle such as Jupyter notebook or spyder
See the open issues for a list of proposed features (and known issues).
Contributions are what make the open source community such an amazing place to be learn, inspire, and create. Any contributions you make are greatly appreciated.
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature
) - Commit your Changes (
git commit -m 'Add some AmazingFeature'
) - Push to the Branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
- MIT license
- Copyright 2020 © Aditya Mangla.
Aditya Mangla - @aadimangla - aadimangla@gmail.com - adityamangla.com
Project Link: https://github.com/aadimangla/Sarcasm-Detector