This repository contains Data Science work. Each file focuses on specific topics related to data collection, visualization, database management, and machine learning techniques. Below is an overview of each assignment:
This covers fundamental concepts in data science, including data collection, visualization, and basic tools such as SQLite, D3 for graphing, and OpenRefine for data cleaning.
- Data Collection: Gather and preprocess data from various sources.
- SQLite: Utilize SQLite for database management tasks.
- D3 Warmup: Create basic visualizations using D3.
- OpenRefine: Clean and transform data using OpenRefine.
Building upon the D3 warmup, this delves deeper into creating interactive and informative visualizations using D3.
- Advanced D3 Graphs: Develop more complex and interactive visualizations.
- Visualization Best Practices: Apply best practices for effective data representation.
- Data Storytelling: Use visualizations to tell a compelling data story.
Explore distributed computing, containerization, and cloud services in this assignment. Topics include Apache Spark, Docker, DataBricks, and cloud platforms like AWS, Azure, and GCP.
- Apache Spark: Implement data processing using Apache Spark.
- Docker: Containerize applications for efficient deployment.
- DataBricks: Use DataBricks for collaborative big data analytics.
- Cloud Services: Deploy and manage services on AWS, Azure, and GCP.
This focuses on advanced machine learning techniques, including PageRank algorithm, Random Forest, and utilizing the Scikit-Learn library.
- PageRank: Implement and analyze the PageRank algorithm for graph ranking.
- Random Forest: Explore and apply Random Forest for ensemble learning.
- Scikit-Learn: Use Scikit-Learn for machine learning tasks and evaluation.