/Topic_Modeling_of_Scientific_Publications_with_LDA

'Coleridge Initiative - Show US the Data' Competition on kaggle

Primary LanguageJupyter NotebookMIT LicenseMIT

Topic modeling of scientific publications with LDA

  • This competition aims to identify the mention of datasets within scientific publications.
  • WordCloud, Topic modeling, LDA
  • Mar. 24, 2021 ~ Apr. 5, 2021

Team Project for Data Analysis Club

  • This repo is maintained by 오서영, 최연석

| Presentation | Infographic |

Process

1. Explore datasets and Make Wordcloud with stemming, lemmatization and tokenization | Code


2. Publications Topic modeling using Latent Dirichlet Allocation | Code


Dataset

[1] Coleridge Initiative - Show US the Data, https://www.kaggle.com/c/coleridgeinitiative-show-us-the-data/overview

Reference

[1] Simple EDA and preprocessing of DataFrame, https://www.kaggle.com/tanlikesmath/simple-eda-and-preprocessing-of-dataframe  
[2] [ShowUsTheData] EDA & Visualization Utils, https://www.kaggle.com/subinium/showusthedata-eda-visualization-utils  
[3] [ShowUsTheData] Topic Modeling with LDA, https://www.kaggle.com/subinium/showusthedata-topic-modeling-with-lda  
[4] 딥러닝을 이용한 자연어처리 입문, https://wikidocs.net/30708