/SpamFilter

Course project: Statistical Computing

Primary LanguageJupyter Notebook

SpamFilter

Gannon University

Course project: Statistical Computing

Loading Data

data = pd.read_csv('spam.csv')
data.head()

Data Exploration

Data Preprocessing

  • Cleaning text
  • Tokenization
  • Removing stopwords
  • Lemmatization
  • Vectorization

Model Building

  • Naïve Bayes
  • RandomForestClassifier
  • KNeighborsClassifier
  • Support Vector Machines

Evaluating Models

Deploying Naïve Bayes Model on Streamlit

import pickle
pickle.dump(classifiers[0], open('SpamDetectorModelNB.pkl', 'wb'))
pickle.dump(tfidf, open('Vectorizer.pkl', 'wb'))

ps = PorterStemmer()

def Clean(text):
    sms = re.sub('[^a-zA-Z]', ' ', text) 
    sms = sms.lower() 
    sms = sms.split()
    sms = ' '.join(sms)
    return sms

def Tokenize(text):
    return nltk.word_tokenize(text)

def transform_text(text):
    text = Clean(text)
    text = Tokenize(text)

    text = [i for i in text if i.isalnum()]
    text = [i for i in text if i not in stopwords.words('english') and i not in string.punctuation]
    
    return " ".join([ps.stem(i) for i in text])

tfidf = pickle.load(open('Vectorizer.pkl','rb'))
model = pickle.load(open('SpamDetectorModelNB.pkl','rb'))

if st.button('Predict'):
    transformed_sms = transform_text(input_sms)
    vector_input = tfidf.transform([transformed_sms])
    result = model.predict(vector_input)[0]
    
    if result == 1:
        st.header("Spam")
    else:
        st.header("Not Spam")

Visit my app on streamlit https://share.streamlit.io/epuujee/spamfilter