/gemastik21

Topic Modeling and Text Network Analysis for Indonesian Tweets on Cryptocurrencies

Primary LanguageJupyter NotebookMIT LicenseMIT

gemastikUnjani

This repository stores research results for Gemastik activities in 2021 for the Data Mining competition branch. We conducted research on the topic of cryptocurrency in Indonesia which was discussed on social media twitter. The goal is to determine what sub-topics are discussed from the tweet data that has been collected. Then by using LDA (latent dichellet allocation) for topic modeling and continued by doing a text network on each of the resulting sub-topics.

Dataset

You can download the dataset here: https://www.kaggle.com/wijatama/indonesiancryptotweets Data were collected using web-scraping technique (thanks to Hasan as our Mining Engineer). The data range to be used starts from January 1, 2021 to May 31, 2021.

Indonesian slang-words

We use Indonesian slang-words provided by nasalsabila. You can visit her repo here https://github.com/nasalsabila/kamus-alay

Team Member

guided by: Bpk. Rifqi Ma'arif