Data science portfolio by Gim Seng Ng

This is a collection of data science project notebooks I created to learn and explore machine learning algorithms. Please check out other projects on my github page.

End-to-end projects

1. Gameplay Data Analysis

Oftentimes, an end-to-end project is both a fun and challenging experience. I decided to pick something that I have enjoyed with questions I sometimes wondered. I enjoy playing video games in my free time and I often use the website howlongtobeat.com to track my backlogs, I decided to build a data science project using gameplay data. The motivation comes from trying to understand gamer's playing behavior and leverage such data to provide guidance for game developer and publisher. Building a predictiv model based on playtime, ratings and sales is a useful tool for gaming industry to balance between profit and establishing satisfactory customer base.

All of the above, and hopefull more, will be explored and answered in the link to the project's github repo. The project covers data acquisition (through web-scaping), data exploration and cleaning, data transformation, machine learning model selection and evaluation, and finally building visualization for business analytics and insights.

2. Citation and Collaboration Connection Among Physicists

This project stems from a desire to learn graph database to build powerful network visualization. On the other hand, since I have worked in high energy physics for more than 5 years after my PhD, this is a chance to explore connectivitiy among high energy physicists. The two primary ways are through exploring collaborations and citations. The data are freely available on arXiv, though we actually can get very far by using the API and the data dump of inspirehep.net. The data I am using will be publication data up to early 2020.

Interesting questions such as: Which physicists are highly collaborative and productive? Do higher collaborations yield better citations? There is also a potential to explore what influences opportunities of getting a permanet faculty jobs, due to the job title and affliation data provided by inspirehep.net.

The project is still at the beginning of its conception and data collection/cleaning phase. Stay tuned at the github repo.