We used this dataset to test android apps, efficacy of email spam filters on SMS and stacked classifier we developed for CCS-SPSM 2013 paper.
If you use this dataset, please cite our paper:
Akshay Narayan and Prateek Saxena. The Curse of 140 Characters: Evaluating The Efficacy of SMS Spam Detection on Android, In proceedings of ACM CCS-SPSM 13, pages 1–9, November 08 2013, Berlin, Germany.
This dataset was constructed using the following datasets:
- http://www.dt.fee.unicamp.br/~tiago/smsspamcollection/
- http://mtaufiqnzz.wordpress.com/british-english-sms-corpora/
- http://www.dit.ie/computing/research/resources/smsdata/
The above 3 datasets have been constructed using Grumbletext and NUS SMS Corpus (http://wing.comp.nus.edu.sg:8080/SMSCorpus/) and have many duplicates. We have removed the duplicates to the best of our availability.