100-daya-of-ML-Day-1-

6 most frequently used libraries for Machine Learning

i. Requests

I have used request library a lot while doing my web scraping project. Basically Request will allow you to send HTTP/1.1 requests using Python. With it, you can add content like headers, form data, multipart files, and parameters via simple Python libraries. It also allows you to access the response data of Python in the same way. For that first you need to install Requests package (code in anaconda prompt: pip install requests) and then you can import the request library

ii. Beautiful Soup

This is the best library to parse the html/XML file or data for the beginners. In order to use this library first install the ps4 package and then import the beautiful soup library.

This link is the complete guide for scarping web pages using beautiful soup https://www.digitalocean.com/community/tutorials/how-to-scrape-web-pages-with-beautiful-soup-and-python-3

iii. Numpy

Numpy is the core library for scientific computing in Python. It provides a high-performance multidimensional array object, and tools for working with these arrays. Numpy Arrays: A numpy array is a grid of values, all of the same type, and is indexed by a tuple of nonnegative integers. The number of dimensions is the rank of the array; the shape of an array is a tuple of integers giving the size of the array along each dimension.

For the complete guide please refer to "http://cs231n.github.io/python-numpy-tutorial/#numpy"

iv. Pandas

Pandas bascially provides fast, flexible and expressive data structrures which is very helpful when working with relational or labled data. pandas is well suited for many different kinds of data: a. Tabular data with heterogeneously-typed columns, as in an SQL table or Excel spreadsheet b. Ordered and unordered (not necessarily fixed-frequency) time series data c. Arbitrary matrix data (homogeneously typed or heterogeneous) with row and column labels Any other form of observational / statistical data sets d. The data actually need not be labeled at all to be placed into a pandas data structure The two primary data structures of pandas, Series (1-dimensional) and DataFrame (2-dimensional), handle the vast majority of typical use cases in finance, statistics, social science, and many areas of engineering

For the complete guide refer to https://pandas.pydata.org/pandas-docs/stable/10min.html & http://pandas.pydata.org/pandas-docs/stable/

v. sci-kit learn

Scikit-learn provides a range of supervised and unsupervised learning algorithms via a consistent interface in Python. It features various classification, regression and clustering algorithms including support vector machines, random forests, gradient boosting, k-means, and is designed to interoperate with the Python numerical and scientific libraries NumPy and SciPy. I have also used sci-kit learn for data imputation, scaling, feature selection etc.

For reference this link can be used https://machinelearningmastery.com/a-gentle-introduction-to-scikit-learn-a-python-machine-learning-library/

vi. Matplotlib

Matplotlib is probably the single most used Python package for 2D-graphics. It provides both a very quick way to visualize data from Python and publication-quality figures in many formats. We are going to explore matplotlib in interactive mode covering most common cases.

The link mentioned below is very helpful http://www.scipy-lectures.org/intro/matplotlib/matplotlib.html

siddharthoza/100-daya-of-ML-Day-1-