/Fraud-Detection-Python

Using a variety of techniques (supervised and unsupervised learning, text mining) to perform fraud analysis

Primary LanguageJupyter Notebook

Fraud-Detection-Python

Author: Larissa Huang

This project demonstrates several fraud analysis techniques, including the following:

Working with imbalanced data

  • Highly imbalanced fraud data
  • Resampling data

minority oversampling comparision

  • Tools: SMOTE, scikit-learn, train-test-split, matplotlib

Fraud detection with labeled data

  • Logistic Regression, Decision Tree, Random Forest
  • Performance metrics
  • Hyperparamter optimization
  • Ensemble methods (model weight adjustments)

precision recall curve

  • Tools: confusion matrix, classification report, roc_auc_score, precision-recall curve, GridsSearchCV, VotingClassifier, Seaborn

Fraud detection without labels

  • Customer segmentation
  • K-means clustering to detect fraud using outliers and small clusters,
  • DB-scan clustering

elbow curve

  • Tools: MiniBatchKMeans, silhouette score, homogeneity score, elbow curve

Text mining

  • Clean text data (tokenization, stopwords, stemming, lemmatization)
  • Flag certain words and topics
  • Topic modeling for fraud detection
  • Topic visualization

topic modeling

  • Tools: nltk, LDA, bagofwords, doc2bow, pyLDAvis, gensim, corpora