/ML_DA

Basics Concepts of Machine Learning, Quantum Computing, Data Analytics, GANs and NLP

Primary LanguageJupyter Notebook

Machine Learning Algorithms and Data Analysis

This repository contains the implementation of various Machine Learning Algorithms and Data Analysis techniques in Python.

Index

  • Basics - Contains the basic programs to get started with. It includes regression, classification, clustering, dimensionality reduction, etc from scratch.
  • DA - Contains the programs related to Data Analysis like data cleaning, data visualization, etc.
  • NLP - Contains the programs related to Natural Language Processing like Viterbi Algorithm, Hidden Markov Model, etc.
  • Colabs - Contains the programs that are implemented in Google Colaboratory, mostly related to Deep Learning and Computer Vision. It includes Image Captioning, Denoising, Enhancements, etc.
  • Minor - Contains the programs related to the Minor Project. It is a comparative study of various GAN models on the MNIST dataset.
  • Test - Contains the programs for automation of data entry to Konnexions Society of KIIT.

Advanced Topics

  • GAN - Contains the programs related to Generative Adversarial Networks optimising algos on non-convex loss functions.
  • Proctoring-AI - Contains the programs related to Proctoring AI. It includes Face Recognition, Eye Tracking, etc.
  • QuantumComp - An optimisation Algorithm for Quantum gate insertion in Quantum Circuits using genetic algorithm.

Pre-requisites

To run the programs in this repo, do the following:

  • create a virtual environment using conda or venv.
    • conda create -n <env_name> python=3.7
    • python -m venv <env_name>
  • activate the virtual environment
    • conda activate <env_name>
    • cd ./venv/Scripts/activate (windows users)
    • source ./venv/bin/activate (mac and linux users)
  • install the requirements
    • pip install --upgrade pip (to upgrade pip)
    • pip install -r requirements.txt

Once the requirements have been installed, The programs will run successfully.

Library Used and their description

For Data Analysis

  • Numpy - NumPy is the fundamental package for scientific computing with Python.
  • Pandas - Pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language.
  • Matplotlib - Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python.
  • Seaborn - Seaborn is a Python data visualization library based on matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics.
  • Scikit-Learn - Scikit-learn is a free software machine learning library for the Python programming language.

For Machine Learning and Deep Learning

  • Tensorflow - TensorFlow is an end-to-end open source platform for machine learning.
  • Keras - Keras is a high-level neural networks API, written in Python and capable of running on top of TensorFlow, CNTK, or Theano.

For Natural Language Processing

  • NLTK - NLTK is a leading platform for building Python programs to work with human language data.
  • Spacy - Industrial-strength Natural Language Processing (NLP) with Python and Cython.
  • hmmlearn - Hidden Markov Models in Python, with scikit-learn like API.

For Computer Vision

  • OpenCV - OpenCV (Open Source Computer Vision Library) is an open source computer vision and machine learning software library.
  • Scikit-Image - Scikit-image is a collection of algorithms for image processing.

Others

  • Jupyter Notebook - Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text.
  • Autograd - Autograd is a Python library that provides automatic differentiation for numpy code.
  • Openpyxl - Openpyxl is a Python library to read/write Excel 2010 xlsx/xlsm/xltx/xltm files.
  • Keras Preprocessing - Keras Preprocessing is a collection of utilities for preprocessing data.
  • PyAudio - PyAudio provides Python bindings for PortAudio, the cross-platform audio I/O library.
  • SpeechRecognition - SpeechRecognition is a library for performing speech recognition, with support for several engines and APIs, online and offline.