Data Analytics in Python - From Data Collection to Machine Learning Algorithms

This repository contains Data Science work. Each file focuses on specific topics related to data collection, visualization, database management, and machine learning techniques. Below is an overview of each assignment:

Collecting & Visualizing Data, SQLite, D3 Warmup, OpenRefine

This covers fundamental concepts in data science, including data collection, visualization, and basic tools such as SQLite, D3 for graphing, and OpenRefine for data cleaning.

Tasks:

  • Data Collection: Gather and preprocess data from various sources.
  • SQLite: Utilize SQLite for database management tasks.
  • D3 Warmup: Create basic visualizations using D3.
  • OpenRefine: Clean and transform data using OpenRefine.

D3 Graphs and Visualization

Building upon the D3 warmup, this delves deeper into creating interactive and informative visualizations using D3.

Tasks:

  • Advanced D3 Graphs: Develop more complex and interactive visualizations.
  • Visualization Best Practices: Apply best practices for effective data representation.
  • Data Storytelling: Use visualizations to tell a compelling data story.

Spark, Docker, DataBricks, Cloud Services (AWS, Azure, GCP)

Explore distributed computing, containerization, and cloud services in this assignment. Topics include Apache Spark, Docker, DataBricks, and cloud platforms like AWS, Azure, and GCP.

Tasks:

  • Apache Spark: Implement data processing using Apache Spark.
  • Docker: Containerize applications for efficient deployment.
  • DataBricks: Use DataBricks for collaborative big data analytics.
  • Cloud Services: Deploy and manage services on AWS, Azure, and GCP.

PageRank, Random Forest, Scikit-Learn

This focuses on advanced machine learning techniques, including PageRank algorithm, Random Forest, and utilizing the Scikit-Learn library.

Tasks:

  • PageRank: Implement and analyze the PageRank algorithm for graph ranking.
  • Random Forest: Explore and apply Random Forest for ensemble learning.
  • Scikit-Learn: Use Scikit-Learn for machine learning tasks and evaluation.