/jacobs-data

Data gathered and published from public sources inside jacobs

jacobs-data

Data gathered and published from public sources inside jacobs

spammers.txt and keywords.txt

Data gathered from mercator-students email list from January 2012 to 15th May 2014.

keywords.txt has all the words which appeared at least 5 times in spams and it is not part of the list of the first 100 most common words. common_words = ["the", "be", "and", "of", "a", "in", "to", "have", "to", "it", "I", "that", "for", "you", "he", "with", "on", "do", "say", "this", "they", "at", "but", "we", "his", "from", "that", "not", "by", "she", "or", "as", "what", "go", "their", "can", "who", "get", "if", "would", "her", "all", "my", "make", "about", "know", "will", "as", "up", "one", "time", "there", "year", "so", "think", "when", "which", "them", "some", "me", "people", "take", "out", "into", "just", "see", "him", "your", "come", "could", "now", "than", "like", "other", "how", "then", "its", "our", "two", "more", "these", "want", "way", "look", "first", "also", "new", "because", "day", "more", "use", "no", "man", "find", "here", "thing", "give", "many", "well", "only"]