/Summer-Analytics-2022

This repository contains the course notes, implementations of various ML models and Hackathon submission notebooks of Summer Analytics program.

Primary LanguageJupyter NotebookMIT LicenseMIT

Summer-Analytics-2022 course

Link to the course

Some Tools and Technologies for ML model:
1. for creating UI of model - Gradio
2. for hosting ML model - Higging Faces (majorly used for NLP, but can be used here... read more on its website)
and many more .... still dicovering :)

Must Go Through Kaggle Course (for revision : part of Week 4): LINK

Some Questions I came across the course:

  1. In Pandas dataframe, why we use df[exp1 '|' or '&' exp2] instead of df[exp1 'or' or 'and' exp2]?
    Ans. - See this question

  1. How to use pyplot.imshow() in pandas?
    Ans: -Link to Documentation

  1. How to see documentation in Jupyter notebook itself?
    Ans: in the function parthesis (), press SHIFT + TAB. on pressing it 4 times,
    this will a complete seperate page for the complete docs.
    Or,
    you can type ?<function name without parethesis> and run the cell

  1. Why do we regularize all parameters in the same way? For eg. in the image below:

if we want that effect of parameters, $\theta_1$ and $\theta_2$ isn't neglected, how can we handle that? since in the orginal formula, we multiply $\lambda$ with all the weights from $\theta_1$ to $\theta_n$ (where n is the number of features.), How to handle that?

Ans:

  • Read here 👉 LINK
  • Regularization Folder Link

  1. If we are getting parameters of higher power like $x^2$ or above in the Hypothesis of Linear Regression, how is it linear?
    Ans:

For increasing the complexity of model, or to increase the effects of parameters, we apply higher degrees to them like $x^2$, $x^3$ ... Those are inserted in the model as :
$X_0$,
$X_1$,
$X_2 = X_0^2$,
$X_3 = X_1^4$
etc.

So, the new hypothesis may actually represent something like:

image


  1. When should we use .predict() and .predict_proba() in Scikit-learn?
    Ans:
  • predict_proba will give out the probabilities while predict will give the class value.
  • Class value can be used wherever the evaluation metric is accuracy, recall, precision etc
  • Whereas probabilities can be used wherever the evaluation metric is AUC, ROC_AUC, MSE, MAE, RMSE etc

  1. Is Logistic Regression Classification Based algorithm or Regression based algorithm?
    Ans:
  • It is a classification type algorithm.
  • The name regression is there due to the similar underlying technique of Linear regression applied in it.
Note: We cannot use the evealuation metrics of classification based algorithms on Regression based algorithms.

READ MORE HERE


  1. Cross Validation being an evaluation metric ought to be applied upon Test Dataset, then why it is applied on Train Dataset ?
    Ans. IT IS NOT AN EVALUATION METRIC ⚠️

Cross validation is used to verify the best parameters on which the model is trained. image

The diagram shows the position where cross validation is used, while, evaluation metrics are used in final evaluation,
and hence applied on TEST DATASET.

Cross-validation is a technique for validating the model efficiency by training it on the subset of input data and testing on previously unseen subset of the input data. We can also say that it is a technique to check how a statistical model generalizes to an independent dataset.

Here, input data is the Training Data, not complete dataset (The logic is that we use train test split to get our test data, but in real world situation, only Train data is given to the model; hence, we consider input data as Train data.)



  1. Why should we use SCIKIT-LEARN's fit_transform() method on Training Dataset and transform() on Test Dataset ?
    ANS: This methodology performs the same actions on Test data automatically which were applied on Train Dataset. It prevents DATA LEAKAGE (it means model learns something new from test dataset!! which is not allowed.) Reference LINK to watch in complete detail

  1. How to choose Categorical Data and Continuous Data?
    ANS:
  • Categorical Data: Anything which can be used to categorize data into something. It can be of any datatype, though floats and ints are not frequently used.
  • Continuous Data: Data with no discrete boundaries. generally datatype = Float is used here.

  1. What is the difference between the 2 methods of OneHotEncoding: pd.get_dummies() and Scikit-learn's OneHotEncoder()? ANS:
  • For quick data cleaning and EDA, it makes a lot of sense to use pandas get dummies. However, if I plan to transform a categorical column to multiple binary columns for machine learning, it’s better to use OneHotEncoder().
  • READ THIS COMPLETE ARTICLE FOR FULL DETAILS

  1. What is a Meta - Estimator?
    Ans: Meta Estimator: An estimator which takes another estimator as a parameter.

  1. What are Feature importances and Permutation Feature Importances?
    ANS:
  • Feature Importance: Feature Importance refers to techniques that calculate a score for all the input features for a given model — the scores simply represent the “importance” of each feature. A higher score means that the specific feature will have a larger effect on the model that is being used to predict a certain variable.
  • Permutation Feature Importance: Permutation feature importance is a model inspection technique that can be used for any fitted estimator when the data is tabular. This is especially useful for non-linear or opaque estimators. The permutation feature importance is defined to be the decrease in a model score when a single feature value is randomly shuffled. This procedure breaks the relationship between the feature and the target, thus the drop in the model score is indicative of how much the model depends on the feature. This technique benefits from being model agnostic and can be calculated many times with different permutations of the feature.

  1. Why do we need to use fit_transform() with SimpleImputer rather than fit() alone? ANS:
  • when converting to dataframe, we need a 2D array,
  • The fit() method returns an Object with parameters stored, rather than the transformed Dataset
  • While, fit_transform() apply all the transformations and return the 2D array which can be converted to DataFrame.

image



  1. What is the difference between SVM and SVC?
    ANS: The limitation of SVC is compensated by SVM non-linearly. And that's the difference between SVM and SVC. If the hyperplane classifies the dataset linearly then the algorithm we call it as SVC and the algorithm that separates the dataset by non-linear approach then we call it as SVM.

  1. What is the meaning of Cardinality?
    ANS: "Cardinality" means the number of unique values in a column.

  1. What do you mean by Ensemble (For Eg., as we do from sklearn.ensemble import RandomForestRegressor)?
    ANS: ensemble methods combine the predictions of several models (e.g., several trees, in the case of random forests).

  1. How to choose the number of layers in a neural network?
    ANS: Read this answer : Read first and 2nd answer

  1. What is the importance of Mean Normalization in Gradient Descent, or how does normalizaion make Gradient Descent converge faster?
    ANS: READ THIS ANSWER

  1. Which Machine Learning Algorithms need Feature Scaling or Normalization?
    ANS: READ THIS ARTICLE The algorithms which consider distances between the datapoints usually require feature scaling or normalization.

  1. Are Neural Networks prone to Overfitting?
    ANS:

Certainly they can overfit the data, as other models do. No model is perfectly prone to overfitting: The reasons can be, excessive complexity of its architecture (if we add many layers), or the model has too many parameters.

this overfitting problem is common to all the algorithms, infact, it is more common in case of Deep Learning algorithms as they are designed to understand (or memorize) more complex patterns.

This article tells about the techniques by which we can prevent overfitting... and the fun thing is they are also same as those of classical Machine Learning. READ THE ARTICLE HERE