/datascience-cheatsheet

Cheatsheet for datascience libraries in python like Numpy, Pandas, Matplotlib and Scikit-learn.

Primary LanguageJupyter NotebookMIT LicenseMIT

Table of contents

  • Datatypes
  • Difference between '==' and 'is'
  • Some useful built-in functions
  • List comprehension
  • Lambda function
  • Unpacking
  • More about mutable and immutable datatypes
  • Why we use Numpy?
  • Creating and inspecting array
  • Indexing and slicing
  • Operators
  • Mask and filter
  • Copy vs view
  • Array manipulation
  • Understanding axes
  • Broadcasting
  • Sparse matrix
  • What is Pandas?
  • Series, dataframe and index
  • Data indexing and selection
  • Operators
  • Doing some stuffs!
    • Datatype
    • Change datatype
    • Describe
    • Rename columns
    • Rename rows
    • Replace values
    • Finding unique values
    • Deleting column
    • Deleting row
    • Remove duplicates
    • Mask and filter
    • Apply funciton to the data
    • Convert to numpy array
  • Handling missing data
  • Aggregation and grouping
  • Simple plot
  • Title, Ticks, Labels, and Legends
  • Colors, Markers, and Line Styles
  • Subplot, Subplots and subplot2grid
  • Histogram, bar, scatter, pie and heatmap
  • Saving Plots to File
  • What is machine learning?
  • Preprocessing
    • Identifying and handling the missing values
    • Identifying outliers
    • Encoding the categorical data
    • Transforming the dataset
      • Feature scaling
        • Standardization
        • Normalization
      • Dimension reduction
        • Principal Component Analysis (PCA)
    • Train test split
  • Feature selection
  • Model selection
    • Supervised learning
      • Classification
        • KNN
        • SVM
        • Random forest
        • Decision tree
        • Logistic regression
      • Regression
        • Linear regression
        • Polynomial regression
        • None linear regression
        • Multiple linear regression
    • Unsupervised learning
      • Kmeans
      • DBSCAN
    • Semi supervised learning
  • Model evaluation
    • Classification
      • Jaccard
      • F1-score
      • log-loss
    • Clustering
      • Sum of Squared Error (SSE) score
      • Silhouette coefficient
  • Prediction