/ML-CaPsule

ML-capsule is a Project for beginners and experienced data science Enthusiasts who don't have a mentor or guidance and wish to learn Machine learning. Using our repo they can learn ML, DL, and many related technologies with different real-world projects and become Interview ready.

Primary LanguageJupyter Notebook

Master Machine learning

Issues Pull Requests Forks Stars


Description

Machine learning technique to analysis data that automates analytical model building. It is a branch of artificial intelligence based on the idea that systems can learn from data, identify patterns and make decisions with minimal human intervention. ### Importance of Machine Learning Machine learning is important because it gives enterprises a view of trends in customer behavior and business operational patterns, as well as supports the development of new products. Many of today's leading companies, such as Facebook, Google and Uber, make machine learning a central part of their operations. Machine learning has become a significant competitive differentiator for many companies.

🌱Pre-requisites

  • Python IDE : Install it by using this link python.org
  • If you are new to python programming and want to have a fair knowledge before you start working on it, you can learn it in a simplified way through this website

Topics

Extracting Data

Extraction is a general term for methods of constructing combinations of the variables to get around these problems while still describing the data with sufficient accuracy

  • Web scrapping - Library used :->> Beautiful Soup , Which extract the data from web pages.

Visualization

Data visualization is the discipline of trying to understand data by placing it in a visual context so that patterns, trends and correlations that might not otherwise be detected can be exposed. Python offers multiple great graphing libraries that come packed with lots of different features.

  • Different types of libraries used to manipulate data in form of type of graphs and graphical representation :->> Seaborn , pandas , matplotlib etc.

Feature selection (Variable Selection)

the process of selecting a subset of relevant features for use in model.Having irrelevant features in your data can decrease the accuracy of the models and make your model learn based on irrelevant features.

Basic concepts of statistic

A).Understand the Type of Analytics

  • Descriptive Analytics tells us what happened in the past and helps a business understand how it is performing by providing context to help stakeholders interpret information.

  • Diagnostic Analytics takes descriptive data a step further and helps you understand why something happened in the past.

  • Predictive Analytics predicts what is most likely to happen in the future and provides companies with actionable insights based on the information.

  • Prescriptive Analytics provides recommendations regarding actions that will take advantage of the predictions and guide the possible actions toward a solution

B). Probability

  • Conditional Probability
  • Independent Events
  • Mutually Exclusive Events
  • Bayes’ Theorem

C). Central Tendency

  • Mean
  • Mode
  • varience
  • Skewness
  • Kurtosis:
  • Standard Deviation

D). Variability

  • Range: The difference between the highest and lowest value in the dataset.
  • Percentiles β€” A measure that indicates the value below which a given percentage of observations in a group of observations falls.
  • Quantilesβ€” Values that divide the number of data points into four more or less equal parts, or quarters.
  • Interquartile Range (IQR)β€” A measure of statistical dispersion and variability based on dividing a data set into quartiles. IQR = Q3 βˆ’ Q1
  • Variance: The average squared difference of the values from the mean to measure how spread out a set of data is relative to mean.

E). Relationship Between Variables

  • Causality: Relationship between two events where one event is affected by the other.
  • Covariance: A quantitative measure of the joint variability between two or more variables.
  • Correlation: Measure the relationship between two variables and ranges from -1 to 1, the normalized version of covariance.

F). Probability Distribution

  • Probability Mass Function (PMF): A function that gives the probability that a discrete random variable is exactly equal to some value.
  • Probability Density Function (PDF): A function for continuous data where the value at any given sample can be interpreted as providing a relative likelihood that the value of the random variable would equal that sample.
  • Cumulative Density Function (CDF): A function that gives the probability that a random variable is less than or equal to a certain value.

G). Hypothesis Testing and Statistical Significance

  • Null and Alternative Hypothesis
  • Interpretation
  • Z-Test
  • T-Test
  • ANOVA (Analysis of Variance)
  • Chi-Square Test

H). Regression

  • Linear Regression ** Assumptions of Linear Regression

        - Linear Relationship
        - Multivariate Normality
        - No or Little Multicollinearity
        - No or Little Autocorrelation
        - Homoscedasticity
    
  • Multiple Linear Regression

Data Science

Data science is an interdisciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from structured and unstructured data, and apply knowledge and actionable insights from data across a broad range of application domains.

Why is data science important?

In business, the goal of data science is to provide intelligence about consumers and campaigns and help companies create strong plans to engage their audience and sell their products.

Data scientists must rely on creative insights using big data, the large amounts of information collected through various collection processes, like data mining. On an even more fundamental level, big data analytics can help brands understand the customers who ultimately help determine the long-term success of a business or initiative. In addition to targeting the right audience, data science can be used to help companies control the stories of their brands. Because big data is a rapidly growing field, there are constantly new tools available, and those tools need experts who can quickly learn their applications. Data scientists can help companies create a business plan to achieve goals based on research and not just intuition.
Data science plays a very important role in security and fraud detection, because the massive amounts of information allow for drilling down to find slight irregularities in data that can expose weaknesses in security systems.It is a driving force between highly specialized user experiences created through personalization and customization. The analysis can be used to make customers feel seen and understood by a company.

What are the six major areas of data science?

The six major areas of data science include the following:

  • Multidisciplinary investigations. Considering large, complex systems with interconnected pieces, data scientists use varying methods to collect large amounts of data.
  • Models and methods for data. Data scientists need to rely on experience and intuition to decide which methods will work best for modeling their data, and they need to adjust those methods continuously to hone in on the insights they seek.
  • Pedagogy. It is up to data scientists to work with companies and clients to determine the best ideologies to apply while collecting and analyzing information about their customers and products.
  • Computing with data. The biggest thing that all data science projects have in common is the necessity to use tools and software to analyze the involved algorithms and statistics, because the size of the pool of information they are working with is so massive.
  • Theory. Data science theory is an evolving and sophisticated professional arena with countless applications.
  • Tool evaluation. There are many tools available for data scientists to use to manipulate and study huge quantities of data, and it's important to always evaluate their effectiveness and keep trying new ones as they become available.

summary

useful urls

Get Started

  • This repo shows a good collection of Machine learning with python and data science with algorithms,projects,explanations from basic to advance level.
  • It has topics based on machine learning, deep learning, sql, natural language proccessing, object detection, classification, recommendation system,chatbots and much more.

Take a look at existing projects

Content List
Advanced Visualizations
Audio Classification
Automatic Summarization of Scientific Papers
Basics of ML and DL
Basics of Power Bi
Basics of the Python
Bitcoin Price Prediction Web App
Bitcoin Price Predictor
COVID_19-DATA-ANALYSIS
Cheat Sheets
Class Imbalance problem
Classification Algorithms
Covid19 forecasting with prophet
Covid_Third_Wave_Forecasting
Customer Segmentation using Machine Learning
Data Cleaning Techniques
Data Filling and Cleaning Techniques
Different types of Clustering
Different types of feature selection techniques
Different_types_of_scaling_Method
EDA-and-Perform-Modelling-on-Ionosphere-Dataset-main
Email Classifier
Emotion Recognition Based on NLP
Ensemble Methds in ML
Explaination and Example for P value with code
Exploratory-data-analysis
Extract_Text_from_PDF_using_Python
Fake_News_Detection
File of SQL Commands
Flight_delay_prediction_project
GUI-JARVIS
Handwritten Equation Solver using CNN
Heart_Predection
Medical Charges for Smokers and Non-smoker
Medical_Help_Chatbot
Meteorite Landing Data Analysis
Movie-Recommendation-System
NumPy - Basics
Object Detection
Ola Bike Ride Request Demand Forecast
Optical character recognition (OCR)
Plant Seedlings Classification
Random forest from scratch
Random forest test
Sentiment analysis for depression based on social media posts
Spam Mail Detection
Speech_Emotion_Recognition
Sports Analytics Project
Time-Series LSTM Model
Unique Chatbot
Various Plots using Matplot,Seaborn,Pandas
Vehicles and Pedestrian Detection
Weather Prediction
Web-Scraping-with-Beautiful-Soup-master
XgBoost_Algorithm
ensemble-methods-notebooks-master
heart failure
recommendation_system
Analysis_of_Temperature_Rise_in_PMSM.ipynb
Beautiful Soup.ipynb
Ensemble learning.docx
Ensemble-Learning (Stacking)
Machine Hack -1.ipynb
README.md updated file
Sql
Statistics- Basics.ipynb
Test Task_NIket.ipynb
Various_Plots_in_Matplotlib.ipynb
Visualization with Seaborn & Matplotlib.ipynb
buyer_s_time234.ipynb
random_forest.py

Note:

  • Above project list will be scheduled automatically,whenever new projects add to the repo it will add in above table.

πŸ“– Code Of Conduct:

You can find our Code of Conduct here.

πŸ“ License

This project follows the MIT License.

Have a look

  • Give it a 🌟 if you ❀ this project.

  • Take a look at the Existing Issues.
  • Create your own Issues, If you have new idea not listed in project.
  • Wait for the Issue to be assigned to you.
  • Fork the repository

  • Clone the repository using-

git clone https://github.com/Niketkumardheeryan/Hands-on-ML-Basic-to-Advance-

βš™οΈ Contribution Guidelines

Some awesome Contributors ✨


Niket kumar Dheeryan (Author)

πŸ’»

Abhishek Sharma

πŸ’»

Sakalya100

πŸ’»

Kaustav Roy

πŸ’»

Soumayan Pal

πŸ’»

Komal Gupta

πŸ’»

Manu Varghese

πŸ’»

Abhishek Panigrahi

πŸ’»

Padmini Rai

πŸ’»

psyduck1203

πŸ’»

Rutik Bhoyar

πŸ’»

Ayushi Shrivastava

πŸ’»

Anshul Srivastava

πŸ’»

RISHAV KUMAR

πŸ’»

Megha0606

πŸ’»

Jagannath8

πŸ’»

Harshita Nayak

πŸ’»

ayushgoyal9991

πŸ’»

SurajPawarstar

πŸ’»

Sumit11081996

πŸ’»

Tanvi Bugdani

πŸ’»

Suyash Singh

πŸ’»

Abhinav Dubey

πŸ’»

Nisha Yadav

πŸ’»

Neeraj Ap

πŸ’»

Nishi

πŸ’»

shivani rana

πŸ’»