/100DaysOfDataScience

Hello everyone, Welcome to my 100 Days of Data Science Journey!

Primary LanguageJupyter Notebook

100 Days of Data Science

100DaysCode

Welcome to my "100 Days of Data Science" journey!

Day 1: Introduction to Python Basics

Topics Covered:

0️⃣ Comments, Datatypes, Numbers, Casting:

  • Comments: Annotations in code for explanations, marked with #.
  • Datatypes: Categories of values, like int for integers, float for decimals, and str for strings.
  • Numbers: Integers (int) and floating-point numbers (float) for calculations.
  • Casting: Changing a value's datatype, e.g., str(5) converts integer 5 to string "5".

1️⃣ String, Booleans, Operators:

  • String: Sequence of characters, like "Hello, World!".
  • Booleans: True (True) or false (False) values for logical comparisons.
  • Operators: Symbols like + for addition, - for subtraction, == for equality checks.

2️⃣ List, Tuples, Sets, Dictionaries:

  • List: Ordered collection enclosed in square brackets, e.g., my_list = [1, 2, 3].
  • Tuples: Similar to lists but enclosed in parentheses, e.g., my_tuple = (1, 2, 3).
  • Sets: Unordered unique items enclosed in curly braces, e.g., my_set = {1, 2, 3}.
  • Dictionaries: Key-value pairs enclosed in curly braces, e.g., my_dict = {"key": "value"}.

3️⃣ If..Else, For Loops:

  • If..Else: Conditional statements to execute code based on conditions.
  • For Loops: Iterating over a sequence (like a list) to repeat code for each item.

Day 2: Exploring Python Functions

Topics Covered:

📝 Python Function: A block of statements that performs a specific task and returns a value.

0️⃣ Types of Functions:

  • Built-in functions provided by Python.
  • User-defined functions created by programmers.

1️⃣ Types of Function Arguments:

  • Positional arguments passed in the order they are defined.
  • Keyword arguments identified by parameter names.
  • Default arguments with predefined values.

2️⃣ Docstring:

  • A string placed at the beginning of a function to provide documentation.
  • Describes the purpose and usage of the function.

3️⃣ Anonymous Function:

  • Also known as lambda functions.
  • Used for short, simple operations without creating a formal function.

4️⃣ Nested Function:

  • A function defined within another function.
  • Encapsulation allows for better organization and reusability.

5️⃣ While Loop:

  • Executes a block of code repeatedly while a condition is true.
  • Useful for tasks where the number of iterations isn't known in advance.

Day 3: Exploring Advanced Python Concepts

Topics Covered:

🖥️ Map Functions:

  • map() in Python is an iterator that applies a function to every item in an iterable (e.g., tuple, list) and returns a new iterable with results.

🧹 Filter Function:

  • filter() creates an iterator by applying a function to items in an iterable and returning only items that satisfy a certain condition.

📚 Arrays:

  • Python doesn't have native arrays, but you can use lists to achieve similar functionality.

🔧 Array Methods:

  • Python lists come with built-in methods to manipulate and interact with data efficiently.

🧬 Class and Object:

  • Objects are data bundles containing variables and methods.
  • A class serves as a blueprint to create objects with specific attributes and behaviors.

Acknowledgments:

I'd like to extend my gratitude to W3Schools.com and GeeksforGeeks for providing valuable resources that assisted me in today's learning.

Day 4: Deepening Python Proficiency

Topics Explored:

  1. Exception Handling:

    • Understand how exceptions can disrupt normal program execution.
    • Learn to handle exceptions gracefully with try and except blocks.
  2. Custom Exceptions:

    • Explore the creation of custom exceptions to tailor error handling to specific needs.
  3. Inheritance:

    • Dive into inheritance, a key concept allowing one class to inherit attributes and methods from another.
  4. The Super() Function:

    • Uncover the power of super(), which enables child classes to inherit methods and properties from their parents.

Special Thanks:

I'd like to express my appreciation to W3Schools.com and GeeksforGeeks for providing invaluable insights that facilitated my learning today.

Day 5: Exploring Iterators, Iterables, and Polymorphism

Topics Covered:

  1. Iterables vs Iterators:

    • Grasped the distinction between iterators and iterables.
    • In Python, an iterable is any object that can be looped over, like lists or strings.
    • An iterator is a specific kind of iterable that provides an interface for retrieving elements sequentially using the __next__() method.
  2. Polymorphism:

    • Dived into polymorphism, a pivotal concept enhancing code flexibility and reusability.
    • Explored the ability to treat different objects as instances of a common superclass, simplifying code through a uniform interface.

Day 6: Further Exploring Python Programming

Topics Explored:

  1. Access Modifiers:

    • Explored access modifiers that control the visibility of variables and methods in classes.
    • Discovered the significance of public, private, and protected access levels, enhancing security.
  2. Encapsulation:

    • Learned about encapsulation, where I encapsulated data and methods within classes.
    • Discovered the benefits of encapsulation in promoting clean, organized, and modular code.
  3. Constructors and Destructors:

    • Dived into constructors, special methods that initialize objects during creation.
    • Uncovered destructors, which perform cleanup operations before objects are destroyed or go out of scope.
  4. File Handling:

    • Ventured into the realm of file handling, mastering the art of reading and writing data to files.
    • Explored interactions with files to store and retrieve data effectively.

Grateful for Assistance:

I owe a debt of gratitude to ChatGPT for its assistance on my learning journey.

Embrace Learning:

Remember, each day brings a wealth of new knowledge. Stay curious, stay enthusiastic, and happy learning! 😊

Day 7: Embarking on the Journey with Pandas

Topics Explored:

After dedicating 6 days to learning Python programming, today I immersed myself in the world of Pandas, a prominent Python library.

📘 Pandas Overview:

  • Delved into Pandas, an open-source Python library designed for data manipulation and analysis.
  • Widely employed in data science, machine learning, and various domains for efficient data handling and analysis.
  • Named "pandas" due to its focus on multidimensional structured data sets.

📊 Pandas Data Structures:

  • Explored the fundamental data structures provided by Pandas: Series and DataFrame.

0️⃣ Pandas DataFrame:

  • Unveiled the versatile two-dimensional data structure, resembling a table or SQL table.
  • Consisting of rows and columns, each column can have distinct data types.
  • Enables diverse operations like filtering, grouping, aggregating, merging, and more.

1️⃣ Pandas Series:

  • Introduced the concept of a one-dimensional labeled array.
  • Accompanied by an index to label each element, akin to a single column of data with labels.

🙏 Acknowledgments: Special thanks to Krish Naik's YouTube channel for providing valuable guidance on this learning path.

Day 8: Embracing the Power of NumPy

Topics Explored:

0️⃣ Exploring NumPy Arrays:

  • Discovered the remarkable efficiency and power of NumPy arrays compared to regular Python lists.
  • Explored various types of arrays, including multi-dimensional arrays that revolutionize handling complex data structures.

1️⃣ Dimensions and Shape:

  • Learned how to check dimensions and shape of arrays, crucial for effective data analysis.

2️⃣ Array Initialization:

  • Created arrays of zeros, ones, and empty values, setting the stage for efficient data manipulation.

3️⃣ Array Creation and Manipulation:

  • Gained hands-on experience with array creation using ranges, even filling diagonals with 1s.
  • Explored NumPy's flexibility in generating equally spaced values using linspace.
  • Experimented with random number generation, from uniform to normal distributions.

4️⃣ Arithmetic Operations, Reshaping, Broadcasting, and Indexing:

  • Delved into data types and conversion precision for accurate calculations.
  • Unleashed the full potential of NumPy through arithmetic operations, reshaping, broadcasting, indexing, slicing, and iteration.

Day 9: Advancing with NumPy Mastery

Topics Explored:

0️⃣ Copying and Viewing Arrays:

  • Explored the nuances between copying and viewing NumPy arrays.
  • Understood when to create new arrays and when to work with views for optimal memory usage.

1️⃣ Joining and Splitting Arrays:

  • Mastered the art of joining and splitting arrays, a crucial skill for integrating diverse data sources.

2️⃣ NumPy Functions:

  • Unleashed the full potential of NumPy's functions.
  • Covered searching arrays, sorting, filtering, and transforming raw data into valuable insights.

3️⃣ Arithmetic Operations and Data Manipulation:

  • Dived into arithmetic functions including shuffling, finding unique elements, resizing, and flattening arrays.
  • Gained hands-on experience in array insertion, appending, and deletion.

4️⃣ Matrices and Linear Algebra:

  • Explored the realm of matrices and their advantages over arrays for specialized tasks.
  • Uncovered matrix functions such as transposition, swapping axes, finding inverses, and calculating determinants.

Day 10: Embracing Data Formats with Python Pandas

Day 10 Recap: Python Pandas and Data Formats

0️⃣ Working with JSON:

  • Leveraged the power of Pandas to effortlessly read JSON files using read_json.
  • Mastered the art of converting data from DataFrames to JSON with to_json.
  • Explored JSON normalization for transforming nested data structures.

1️⃣ Working with HTML:

  • Unveiled the magic of reading HTML tables directly from web browsers into DataFrames using read_html.
  • Explored the flexibility of converting DataFrames back into HTML using to_html.

2️⃣ Working with XML:

  • Unearthed the secrets of reading XML data and transforming it into DataFrames.
  • Successfully achieved bidirectional conversion - from DataFrames to XML.

Day 11: Illuminating Insights with Data Visualization

Day 11 Recap: Diving into Data Visualization

Data visualization is the representation of data through use of common graphics, such as charts, plots, infographics, and even animations. These visual displays of information communicate complex data relationships and data-driven insights in a way that is easy to understand.

0️⃣ Data Visualization Essentials:

  • Explored the fundamental concepts of data visualization and its vital role in effective data analysis.

1️⃣ Seaborn and Matplotlib:

  • Discovered the distinctions between two prominent Python libraries, Seaborn and Matplotlib.
  • Embraced Seaborn's higher-level interface and visually pleasing styles for effortless plot creation.

2️⃣ Creating Various Plot Types with Seaborn:

  • Lineplot: Visualizing trends and relationships between continuous variables over time.
  • Barplot: Comparing categorical data through bar representations.
  • Histogram: Displaying the distribution of a single numerical variable.
  • Scatter Plot: Visualizing relationships between two continuous variables.
  • Heatmap: Depicting matrices of values using colors to reveal patterns and relationships.

Day 12: Deepening Data Visualization Skills with Seaborn

0️⃣ Count Plot in Seaborn:

  • Dived into the nuances of count plots, differentiating them from bar plots.
  • Explored how count plots excel in visualizing record counts per category.

1️⃣ Violin Plot Mastery:

  • Mastered the art of creating violin plots, which reveal data distribution across categorical variables.
  • Explored customization options like palette, linewidth, order, saturation, color, and inner styles.

2️⃣ Pair Plot Proficiency:

  • Enhanced my understanding of pair plots for uncovering relationships between variables and forming clusters.
  • Explored attributes like hue_order, palette, x_vars, y_vars, and kind for insightful pair plots.

3️⃣ Strip Plot Techniques:

  • Delved into strip plots, an engaging way to display data point distribution across categories.
  • Explored patterns and outliers using strip plots.

Day 13: Exploring Linear Algebra for Machine Learning

Day 14 - Linear Algebra

  • Solve a system of linear equations using the elimination method.

  • Use a matrix to represent a system of linear equations and solve it using matrix row reduction.

  • Solve a system of linear equations by calculating the matrix in the row echelon form.

  • Calculate the rank of a system of linear equations and use the rank to determine the number of solutions of the system.

  • Resource: https://coursera.org/share/4ba5d65a3df9d87e99e10296a3030624

Day 15 - Essential Linear Algebra Operations and Transformations

  • Performed common operations on vectors like sum, difference, and dot product.
  • Multiply matrices and vectors.
  • Represent a system of linear equations as a linear transformation on a vector.
  • Calculated the inverse of a matrix, if it exists.

Day 16 - Exploring Matrix Determinants, Bases, and Eigenvalues - Linear Algebra

  • Interpret the determinant of a matrix as an area and calculate determinant of an inverse of a matrix and a product of matrices.
  • Determine the bases and span of vectors.
  • Find eigenbases for a special type of linear transformations commonly used in machine learning.
  • Calculate the eignenvalues and eigenvectors of a linear transformation (matrix).

Day 17 - Calculus for Machine Learning and Data Science

  • Derivatives and tangents
  • Slopes, maxima and minima
  • Concept of Derivatives
  • Derivative and their notation
  • And also performed some devivates of lines, Quadratics, Higher degree polynomials and other power functions
  • The inverse function and its derivative

Day 18 - Calculus for Machine Learning and Data Science

  • Derivative of trigonometric functions
  • Meaning of the Exponential(e)
  • The derivative of e^x
  • The derivative of log(x)
  • Existence of the derivative

Day 19 - Calculus for Machine Learning and Data Science

  • Properties of the derivative: Multiplication by scalars
  • Properties of the derivative: The sum rule
  • Properties of the derivative: The product rule
  • Properties of the derivative: The chain rule

Day 20 - Calculus for Machine Learning and Data Science

  • Introduction to optimization

  • Optimization of squared loss - The one powerline problem

  • Optimization of squared loss - The two powerline problem

  • Optimization of squared loss - The three powerline problem

  • Optimization of log - loss

  • Gradients: The concept of gradients was a bit challenging but super interesting! I learned about tangent planes and partial derivatives, which are building blocks for understanding gradients. Gradients play a crucial role in finding maxima and minima points in functions, which is essential for optimization.

Day 21 - Calculus for Machine Learning and Data Science

Lesson 1 - Gradients

  • Optimization with gradients : An example
  • Optimization with gradients - Analytical method

Lesson 2 - Gradient Descent

  • Optimization using Gradient Descent in one Variable
  • Optimization using Gradient Descent in two Variable
  • Optimization using Gradient Descent - Least squares
  • Optimization using Gradient Descent - Least square with multiple observations

Day 22 - Calculus for Machine Learning and Data Science

  1. Regression with a Perceptron:

    • Dived into the world of regression using a perceptron, a fundamental concept in machine learning.
    • Learned how a perceptron can be used to predict outcomes and model relationships between variables.
  2. Regression with Perceptron - Loss Function:

    • Explored the concept of loss function, a critical element in assessing the accuracy of our predictions.
    • Understood how the choice of a loss function impacts our model's performance and optimization process.
  3. Regression with Perceptron - Gradient Descent:

    • Discovered how Gradient Descent comes into play for optimizing regression models using a perceptron.
    • Learned how to adjust model parameters iteratively to minimize the loss function and enhance predictive power.

Day 23 - Calculus for Machine Learning and Data Science

  1. Classification with Perceptron: - Embarked on the journey of classification, a crucial aspect of machine learning, using the perceptron model. - Understood how the perceptron can be trained to categorize data into distinct classes.

  2. Classification with Perceptron - The Sigmoid Function: - Explored the role of the sigmoid function in classification, understanding its significance in mapping predictions to probabilities. - Learned how this function adds a layer of flexibility to our model's outputs.

  3. Classification with Perceptron - Gradient Descent: - Delved into the usage of Gradient Descent to optimize our classification model, just as we did in regression. - Gained insights into the iterative parameter adjustment process that leads to better classification accuracy.

  4. Classification with Perceptron - Calculating the Derivatives: - Explored the mathematics behind calculating derivatives for our classification model. - Understood how derivatives guide the optimization process, ensuring we reach the most accurate model configuration.

Day 24 - Calculus for Machine Learning and Data Science

  1. Classification with Neural Network
  2. Classification with a Neural Network - Minimizing log-loss
  3. Gradient Descent and Backpropagation

Day 25 - Calculus for Machine Learning and Data Science

  1. Netwon's Method
  2. An example of Netwon's method
  3. The second derivative and it's concept
  4. The Hessian and concavity
  5. Newton's Method for two variables

Day 26 - Probability & Statistics for Machine Learning & Data Science

  • Fundamental concepts of probablity and statistics.

Day 27 - Probability & Statistics for Machine Learning & Data Science

  1. Bayes Theorem - Intuition
  2. Bayes Theorem - Mathematical Formula
  3. Monty Hall Problem
  4. Bayes Theorem - Mathematical Formula
  5. Bayes Theorem - Spam example
  6. Bayes Theorem - Prior and Posterior
  7. Bayes Theorem - The Naive Bayes Model
  8. Probability in Machine Learning

Day 28 - Probability & Statistics for Machine Learning & Data Science

  1. Random Variable: A variable whose possible values are outcomes of a random phenomenon.

  2. Probability Distribution (Discrete): Describes the likelihood of each possible outcome in a discrete random variable.

  3. Binomial Distribution: Models the number of successes in a fixed number of independent Bernoulli trials. Formula: P(X = k) = (n choose k) * p^k * (1-p)^(n-k)

  4. Binomial Coefficient (n choose k): Represents the number of ways to choose k items from a set of n distinct items. Formula: C(n, k) = n! / (k!(n-k)!)

  5. Bernoulli Distribution: Models a random experiment with two possible outcomes (usually denoted as success and failure). Formula: P(X = x) = p^x * (1-p)^(1-x)

  6. Probability Distribution (Continuous): Describes the likelihood of outcomes in a continuous random variable.

  7. Probability Density Function (PDF): Describes the probability distribution of a continuous random variable.

  8. Cumulative Distribution Function (CDF): Gives the probability that a continuous random variable takes a value less than or equal to a specific value.

  9. Uniform Distribution: Every outcome in the sample space is equally likely. Formula: f(x) = 1 / (b - a) for a ≤ x ≤ b

  10. Normal Distribution: Describes a continuous probability distribution characterized by a bell-shaped curve. Formula: f(x) = (1 / (σ√(2π))) * e^(-((x-μ)^2 / (2σ^2)))

  11. Chi-square Distribution: Used in hypothesis testing and modeling the variability in data.

Day 29 - Probability & Statistics for Machine Learning & Data Science

  1. Measure of Central Tendency: Understanding how data clusters around a central value.
  2. Expected Value: Calculating the long-term average of a random variable.
  3. Expected Value of a Function: Extending expected value concepts to functions of random variables.
  4. Sum of Expectations: Managing sums and expectations in data analysis.
  5. Variance: Measuring the spread or variability in data.
  6. Standard Deviation: A widely used measure of data dispersion.
  7. Sum of Gaussians: Delving into the mathematics behind the Gaussian distribution.
  8. Standardizing a Distribution: Making data comparable by transforming it into standard units.
  9. Skewness and Kurtosis: Exploring data asymmetry and the shape of probability distributions.
  10. Quantiles and Box-Plots: Visualizing data distribution and detecting outliers.
  11. Visualizing Data: Using Box-Plots, Kernel Density Estimation, Violin Plots, and QQ Plots to gain insights from data.

Day 30 - House Price Prediction

Day 31 - Deployment of House Price Prediction

Day 32 - SMS Spam Detection