/Different-dataset-Analysis_Churn-Prediction_ML_training

This repository contains a notebook having basics of Numpy; Pandas and Visualisation on different datasets and ML training and Analysis on Churn Dataset

Primary LanguageJupyter Notebook

Different-dataset-Analysis_Churn-Prediction_ML_training

This repository contains a notebook with NumPy and Pandas basic operations and visualisation of different datasets like pokemon dataset, dimond dataset and iris dataset. Machine Learning, Ensemble Techniques, Featurization and Model Tunning along with visual analysis is done on customer churn dataset. Libraries used are: Numpy, scipy, scikit-learn, matplotlib, seaborn, numba, shap, XGboost.

Datasets

  • Pokemon dataset: This dataset contains information about all 809 Pokemon, including their name, type, stats, and abilities.
  • Diamond dataset: This dataset contains information about diamonds, including their price, carat, color, clarity, and cut.
  • Iris dataset: This dataset contains information about three species of iris flowers, including their sepal width, sepal length, petal width, and petal length.
  • Churn dataset: A telecom call churn dataset is a collection of data that tracks customer behavior and usage patterns over time. This data can be used to identify factors that may contribute to customer churn, such as high monthly bills, poor customer service, or a lack of competitive features.

Libraries

  • NumPy: NumPy is a library for scientific computing in Python. It provides a high-performance multidimensional array object, along with a suite of functions for working with arrays.
  • SciPy: SciPy is a collection of mathematical algorithms and functions for Python. It includes modules for numerical integration, optimization, linear algebra, signal processing, and more.
  • Scikit-learn: Scikit-learn is a machine learning library for Python. It provides a variety of algorithms for supervised and unsupervised learning.
  • Matplotlib: Matplotlib is a library for plotting data in Python. It provides a variety of plotting functions, as well as a high-level interface for creating figures.
  • Seaborn: Seaborn is a Python visualization library based on matplotlib. It provides a variety of statistical plots, as well as a high-level interface for creating attractive figures.
  • Numba: Numba is a Python JIT compiler that can speed up NumPy code. It can automatically compile NumPy functions to machine code, which can significantly improve performance.
  • SHAP: SHAP is a library for explaining the output of machine learning models. It provides a variety of methods for explaining the predictions of models, including local explanations and global explanations.
  • XGBoost: XGBoost is a gradient boosting library for Python. It is a popular library for machine learning competitions, and it is known for its high performance.

Usage

To use this repository, you will need to have Python and the following libraries installed:

  • NumPy
  • SciPy
  • Scikit-learn
  • Matplotlib
  • Seaborn
  • Numba
  • SHAP
  • XGBoost

Once you have installed the required libraries, you can open the notebook in a Python IDE or Jupyter Notebook. The notebook contains instructions on how to use the libraries and datasets.

Contributing

If you would like to contribute to this repository, please fork the repository and create a pull request. Please make sure that your code is well-documented and that it passes all tests.