im-pek
Data Scientist | with exceptional focus on NLP, Text Analytics, Topic Modelling, Recommendation Systems, Trend Analysis, Pattern Recognition, Novel Algorithms
Singapore
Pinned Repositories
add-symbol-currency-2dp-to-every-excel-cell
Add symbol, currency, and/or 2 d.p. (to every excel cell)
amazon-textract-enhancer
This workshop demonstrates how to build a Document parser and query engine with Amazon Textract and other services, such as ElasticSearch and DynamoDB.
autofit-excel-cell-widths-using-xlwings
Autofit excel cell widths (using xlwings). Xlwings is an extremely efficient, state-of-the-art python library to manage, edit, and manipulate excel files & data. Its documentation can be viewed at https://readthedocs.org/projects/xlwings/downloads/pdf/stable.
bold-excel-cells-using-openpyxl
Bold Excel Cells (using openpyxl)
Doc2X_Topic_Modelling
Doc2X is a novel topic modelling technique created in June 2019, by yours truly, Pek Yun Ning. It hybridises the older Doc2Vec and Corex topic modelling algorithms to form this all-new algorithm.
LDA2Char_Topic_Modelling
LDA2Char is a novel topic modelling technique created in June 2019, by yours truly, Pek Yun Ning. It hybridises the older LDA and FastText topic modelling algorithms to form this all-new algorithm.
LDA2Word_Topic_Modelling
LDA2Word is a novel topic modelling technique created in June 2019, by yours truly, Pek Yun Ning. It hybridises the older LDA and Word2Vec topic modelling algorithms to form this all-new algorithm.
Logistic_Regression_-_Confusion_Matrix
An application of Logistic Regression and presentation of its accuracy and precision using Confusion Matrix. This was applied to a consumer complaints dataset.
NLP-NER
Natural Language Processing and Named Entity Recognition to automatically get specific structured entities from unstructured texts / data input.
Optimisation-Model_Augmentation
An Optimisation + Model Augmentation project to derive best plant growth recipe. Includes concepts like linear regression, genetic algorithm, euclidean distance, uniform criterion, and cloud computing (integration with AWS DynamoDB).
im-pek's Repositories
im-pek/LDA2Word_Topic_Modelling
LDA2Word is a novel topic modelling technique created in June 2019, by yours truly, Pek Yun Ning. It hybridises the older LDA and Word2Vec topic modelling algorithms to form this all-new algorithm.
im-pek/autofit-excel-cell-widths-using-xlwings
Autofit excel cell widths (using xlwings). Xlwings is an extremely efficient, state-of-the-art python library to manage, edit, and manipulate excel files & data. Its documentation can be viewed at https://readthedocs.org/projects/xlwings/downloads/pdf/stable.
im-pek/Doc2X_Topic_Modelling
Doc2X is a novel topic modelling technique created in June 2019, by yours truly, Pek Yun Ning. It hybridises the older Doc2Vec and Corex topic modelling algorithms to form this all-new algorithm.
im-pek/LDA2Char_Topic_Modelling
LDA2Char is a novel topic modelling technique created in June 2019, by yours truly, Pek Yun Ning. It hybridises the older LDA and FastText topic modelling algorithms to form this all-new algorithm.
im-pek/Logistic_Regression_-_Confusion_Matrix
An application of Logistic Regression and presentation of its accuracy and precision using Confusion Matrix. This was applied to a consumer complaints dataset.
im-pek/NLP-NER
Natural Language Processing and Named Entity Recognition to automatically get specific structured entities from unstructured texts / data input.
im-pek/Optimisation-Model_Augmentation
An Optimisation + Model Augmentation project to derive best plant growth recipe. Includes concepts like linear regression, genetic algorithm, euclidean distance, uniform criterion, and cloud computing (integration with AWS DynamoDB).
im-pek/add-symbol-currency-2dp-to-every-excel-cell
Add symbol, currency, and/or 2 d.p. (to every excel cell)
im-pek/amazon-textract-enhancer
This workshop demonstrates how to build a Document parser and query engine with Amazon Textract and other services, such as ElasticSearch and DynamoDB.
im-pek/bold-excel-cells-using-openpyxl
Bold Excel Cells (using openpyxl)
im-pek/combine-excel-sheets-using-openpyxl
Combine excel sheets from two workbooks (using openpyxl)
im-pek/Corex_Topic_Modelling
Correlation Explanation (Corex) is a topic modelling technique that is great at identifying 'hidden' topics, or low-frequency-worded but representative topics, very well. It was originally created by Greg Ver Steeg.
im-pek/Data-Anonymisation
Anonymising data in meeting notes.
im-pek/Doc2Vec_Topic_Modelling
Doc2vec method of topic modelling. It's document-level of Word2vec. Builds on the concept of word vector representations.
im-pek/docs
TensorFlow documentation
im-pek/excel--unique-values-in-columns-and-difference-between-two-columns
Excel - find unique values in columns, & difference between two columns
im-pek/FastText_Topic_Modelling
FastText is a topic modelling technique originally created by Facebook AI Research team. Its first stable version release was on December 2018. FastText is now available on Python's gensim and scikit-learn as well.
im-pek/LDA2X_Topic_Modelling
LDA2X is a novel topic modelling technique created in June 2019, by yours truly, Pek Yun Ning. It hybridises the older LDA and Corex topic modelling algorithms to form this all-new algorithm.
im-pek/LDA2XPand_Topic_Modelling
LDA2XPand is a novel topic modelling technique created in June 2019, by yours truly, Pek Yun Ning. It hybridises the older LDA, Corex, and Word2Vec topic modelling algorithms to form this all-new algorithm.
im-pek/LDA_Topic_Modelling
Latent Dirichlet Allocation (LDA) is a topic modelling technique that involves a three-layered probabilistic approach, taking into account words at words, documents, and corpus level. It is accompanied by its very own unique and powerful data visualisation tool, LDAvis (as part of this code in its Pythonic version, pyLDAvis), as well.
im-pek/Model_Augmentation
Model Augmentation by adding more sample data points to an existing (small) dataset in a mathematical manner. Carried out using mathematical concepts such as Euclidean Distance and Uniform Criterion (max-of-min concept). This adds exploration factor to the model, reduces prediction error, and improves global accuracy (in optimisation).
im-pek/Named-Entity-Recognition-Spacy-NLTK
Identifies first names, last names, DOBs, and IDs on scanned documents and IDs.
im-pek/NLP-with-Python
Scikit-Learn, NLTK, Spacy, Gensim, Textblob and more
im-pek/NMF_Topic_Modelling
Non-negative Matrix Factorisation (NMF) is a topic modelling technique that uses matrix factorisation concepts to identify topics and top words that describe each topic.
im-pek/OCR-Entities-from-JSON-AWS-Textract
Changing output of AWS Textract, from JSON to Python readable format.
im-pek/Optimisation_for_Plant_Growth
Achieve optimal plant growth through derivation of best growth recipe (i.e. input growth parameters) using Genetic Algorithm and Modelling (e.g. Linear Regression) methods. Data sends to & retrieves from AWS' DynamoDB; the latter is a centralised system for IoT integration between controls & sensors, for an integrated smart plant growth system.
im-pek/popup-box-to-select-file-using-tkinter
Pop-up dialog box to select file (using tkinter)
im-pek/retrieve-current-date-time
Retrieve current date & time
im-pek/sekritfiles
Test your national security credentials.
im-pek/Spacy-Language-Detector
Using 'spacy-langdetect' Python library. Able to identify 55 different languages.