Machine learning technique to analysis data that automates analytical model building. It is a branch of artificial intelligence based on the idea that systems can learn from data, identify patterns and make decisions with minimal human intervention. ### Importance of Machine Learning Machine learning is important because it gives enterprises a view of trends in customer behavior and business operational patterns, as well as supports the development of new products. Many of today's leading companies, such as Facebook, Google and Uber, make machine learning a central part of their operations. Machine learning has become a significant competitive differentiator for many companies.
- Python IDE : Install it by using this link python.org
- If you are new to python programming and want to have a fair knowledge before you start working on it, you can learn it in a simplified way through this website
Extraction is a general term for methods of constructing combinations of the variables to get around these problems while still describing the data with sufficient accuracy
- Web scrapping - Library used :->> Beautiful Soup , Which extract the data from web pages.
Data visualization is the discipline of trying to understand data by placing it in a visual context so that patterns, trends and correlations that might not otherwise be detected can be exposed. Python offers multiple great graphing libraries that come packed with lots of different features.
- Different types of libraries used to manipulate data in form of type of graphs and graphical representation :->> Seaborn , pandas , matplotlib etc.
the process of selecting a subset of relevant features for use in model.Having irrelevant features in your data can decrease the accuracy of the models and make your model learn based on irrelevant features.
- Library used for feature selection commonly :->> scikit-learn
- Link - https://machinelearningmastery.com/feature-selection-with-real-and-categorical-data/
A).Understand the Type of Analytics
-
Descriptive Analytics tells us what happened in the past and helps a business understand how it is performing by providing context to help stakeholders interpret information.
-
Diagnostic Analytics takes descriptive data a step further and helps you understand why something happened in the past.
-
Predictive Analytics predicts what is most likely to happen in the future and provides companies with actionable insights based on the information.
-
Prescriptive Analytics provides recommendations regarding actions that will take advantage of the predictions and guide the possible actions toward a solution
- Conditional Probability
- Independent Events
- Mutually Exclusive Events
- Bayesβ Theorem
C). Central Tendency
- Mean
- Mode
- varience
- Skewness
- Kurtosis:
- Standard Deviation
D). Variability
- Range: The difference between the highest and lowest value in the dataset.
- Percentiles β A measure that indicates the value below which a given percentage of observations in a group of observations falls.
- Quantilesβ Values that divide the number of data points into four more or less equal parts, or quarters.
- Interquartile Range (IQR)β A measure of statistical dispersion and variability based on dividing a data set into quartiles. IQR = Q3 β Q1
- Variance: The average squared difference of the values from the mean to measure how spread out a set of data is relative to mean.
E). Relationship Between Variables
- Causality: Relationship between two events where one event is affected by the other.
- Covariance: A quantitative measure of the joint variability between two or more variables.
- Correlation: Measure the relationship between two variables and ranges from -1 to 1, the normalized version of covariance.
F). Probability Distribution
- Probability Mass Function (PMF): A function that gives the probability that a discrete random variable is exactly equal to some value.
- Probability Density Function (PDF): A function for continuous data where the value at any given sample can be interpreted as providing a relative likelihood that the value of the random variable would equal that sample.
- Cumulative Density Function (CDF): A function that gives the probability that a random variable is less than or equal to a certain value.
G). Hypothesis Testing and Statistical Significance
- Null and Alternative Hypothesis
- Interpretation
- Z-Test
- T-Test
- ANOVA (Analysis of Variance)
- Chi-Square Test
H). Regression
-
Linear Regression ** Assumptions of Linear Regression
- Linear Relationship - Multivariate Normality - No or Little Multicollinearity - No or Little Autocorrelation - Homoscedasticity
-
Multiple Linear Regression
Data science is an interdisciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from structured and unstructured data, and apply knowledge and actionable insights from data across a broad range of application domains.
In business, the goal of data science is to provide intelligence about consumers and campaigns and help companies create strong plans to engage their audience and sell their products.
Data scientists must rely on creative insights using big data, the large amounts of information collected through various collection processes, like data mining.
On an even more fundamental level, big data analytics can help brands understand the customers who ultimately help determine the long-term success of a business or initiative. In addition to targeting the right audience, data science can be used to help companies control the stories of their brands.
Because big data is a rapidly growing field, there are constantly new tools available, and those tools need experts who can quickly learn their applications. Data scientists can help companies create a business plan to achieve goals based on research and not just intuition.
Data science plays a very important role in security and fraud detection, because the massive amounts of information allow for drilling down to find slight irregularities in data that can expose weaknesses in security systems.It is a driving force between highly specialized user experiences created through personalization and customization. The analysis can be used to make customers feel seen and understood by a company.
The six major areas of data science include the following:
- Multidisciplinary investigations. Considering large, complex systems with interconnected pieces, data scientists use varying methods to collect large amounts of data.
- Models and methods for data. Data scientists need to rely on experience and intuition to decide which methods will work best for modeling their data, and they need to adjust those methods continuously to hone in on the insights they seek.
- Pedagogy. It is up to data scientists to work with companies and clients to determine the best ideologies to apply while collecting and analyzing information about their customers and products.
- Computing with data. The biggest thing that all data science projects have in common is the necessity to use tools and software to analyze the involved algorithms and statistics, because the size of the pool of information they are working with is so massive.
- Theory. Data science theory is an evolving and sophisticated professional arena with countless applications.
- Tool evaluation. There are many tools available for data scientists to use to manipulate and study huge quantities of data, and it's important to always evaluate their effectiveness and keep trying new ones as they become available.
-
https://www.kdnuggets.com/2020/06/8-basic-statistics-concepts.html
-
https://www.w3schools.com/python/python_ml_getting_started.asp
-
https://www.freecodecamp.org/learn/machine-learning-with-python/
- This repo shows a good collection of Machine learning with python and data science with algorithms,projects,explanations from basic to advance level.
- It has topics based on machine learning, deep learning, sql, natural language proccessing, object detection, classification, recommendation system,chatbots and much more.
- Above project list will be scheduled automatically,whenever new projects add to the repo it will add in above table.
You can find our Code of Conduct here.
This project follows the MIT License.
- Give it a π if you β€ this project.
- Take a look at the Existing Issues.
- Create your own Issues, If you have new idea not listed in project.
- Wait for the Issue to be assigned to you.
- Fork the repository
- Clone the repository using-
git clone https://github.com/Niketkumardheeryan/Hands-on-ML-Basic-to-Advance-
- Have a look at Contibuting Guidelines
Niket kumar Dheeryan (Author) π» |
|||||
Abhishek Sharma π» |
Sakalya100 π» |
Kaustav Roy π» |
Soumayan Pal π» |
Komal Gupta π» |
Manu Varghese π» |
Abhishek Panigrahi π» |
Padmini Rai π» |
psyduck1203 π» |
Rutik Bhoyar π» |
Ayushi Shrivastava π» |
Anshul Srivastava π» |
RISHAV KUMAR π» |
Megha0606 π» |
Jagannath8 π» |
Harshita Nayak π» |
ayushgoyal9991 π» |
SurajPawarstar π» |
Sumit11081996 π» |
Tanvi Bugdani π» |
Suyash Singh π» |
Abhinav Dubey π» |
Nisha Yadav π» |
Neeraj Ap π» |
Nishi π» |
shivani rana π» |