/Python-big-data

Python and Pandas are known to have issues around scalability and efficiency. You will learn how to use libraries such as Modin, Dask, Ray, Vaex etc to overcome the problems faced by Pandas.

Primary LanguageJupyter Notebook

Stars Badge Forks Badge Pull Requests Badge Issues Badge GitHub contributors Visitors

Don't forget to hit the ⭐ if you like this repo.

About Us

The information on this Github is part of the materials for the subject High Performance Data Processing (SECP3133). This folder contains general big data information as well as big data case studies using Malaysian datasets. This case study was created by a Bachelor of Computer Science (Data Engineering), Universiti Teknologi Malaysia student.

📚 Course: High Performance Data Processing

Contents:

Notes

Big Data: Pandas

Big Data: Alternatives to Pandas for Processing Large Datasets

Modin

Dask

Datatable

🎖️ Comparison between libraries

Big Data: Case study

Lab

Pandas

Modin

Dask

Comparison between libraries

Contribution 🛠️

Please create an Issue for any improvements, suggestions or errors in the content.

You can also contact me using Linkedin for any other queries or feedback.

Visitors