im-pek
Data Scientist | with exceptional focus on NLP, Text Analytics, Topic Modelling, Recommendation Systems, Trend Analysis, Pattern Recognition, Novel Algorithms
Singapore
Pinned Repositories
add-symbol-currency-2dp-to-every-excel-cell
Add symbol, currency, and/or 2 d.p. (to every excel cell)
amazon-textract-enhancer
This workshop demonstrates how to build a Document parser and query engine with Amazon Textract and other services, such as ElasticSearch and DynamoDB.
autofit-excel-cell-widths-using-xlwings
Autofit excel cell widths (using xlwings). Xlwings is an extremely efficient, state-of-the-art python library to manage, edit, and manipulate excel files & data. Its documentation can be viewed at https://readthedocs.org/projects/xlwings/downloads/pdf/stable.
bold-excel-cells-using-openpyxl
Bold Excel Cells (using openpyxl)
Doc2X_Topic_Modelling
Doc2X is a novel topic modelling technique created in June 2019, by yours truly, Pek Yun Ning. It hybridises the older Doc2Vec and Corex topic modelling algorithms to form this all-new algorithm.
LDA2Char_Topic_Modelling
LDA2Char is a novel topic modelling technique created in June 2019, by yours truly, Pek Yun Ning. It hybridises the older LDA and FastText topic modelling algorithms to form this all-new algorithm.
LDA2Word_Topic_Modelling
LDA2Word is a novel topic modelling technique created in June 2019, by yours truly, Pek Yun Ning. It hybridises the older LDA and Word2Vec topic modelling algorithms to form this all-new algorithm.
Logistic_Regression_-_Confusion_Matrix
An application of Logistic Regression and presentation of its accuracy and precision using Confusion Matrix. This was applied to a consumer complaints dataset.
NLP-NER
Natural Language Processing and Named Entity Recognition to automatically get specific structured entities from unstructured texts / data input.
Optimisation-Model_Augmentation
An Optimisation + Model Augmentation project to derive best plant growth recipe. Includes concepts like linear regression, genetic algorithm, euclidean distance, uniform criterion, and cloud computing (integration with AWS DynamoDB).
im-pek's Repositories
im-pek/classify_nouns-verbs-adjectives
This is how nouns, verbs, and adjectives can be classified from a bunch of text.
im-pek/corex_topic
Hierarchical unsupervised and semi-supervised topic models for sparse count data with CorEx
im-pek/csv-blank-removal
Removes blank cells in CSV files using Python. In Python list, it is seen as 'nan'.
im-pek/enumerate_using_python
A simple implementation of 'enumeration' in Python. In this case, we number webchats from one whole chunk of text filled with tons of webchat entries.
im-pek/geographical_hexbins-projections
Data science applications in geographical data. Involves hexbins and projections.
im-pek/Mutual-Information
In probability theory and information theory, the mutual information of two random variables is a quantity that measures the mutual dependence of the two random variables. This script performs MI over Mutual Information over discrete random variables
im-pek/snownlp
Python library for processing Chinese text
im-pek/text-preprocessing
Convert a text file to Python-readable, by firstly segregating each line of text and transferring them all to a Python list, then splitting each line into individual words. Good for analysis that requires by-line and/or by-word analysis. Removes all Stopwords as well, such as 'the', 'a', 'but'. Finally, consolidate them in a CSV file.
im-pek/text-summarisation
To reduce essays / paragraphs to mere sentences. To obtain the gist of a large corpus of text.
im-pek/topic-ranking
If you'd like to rank topics / sentences (based on relative importance between entries in a text corpus).
im-pek/wordcloud-using-python
Create a Word Cloud using Python.