Dataset examples
This project shows a little example of a handwritten number recognition algorithm which is a classical problem in the machine learning classification models, it basically consists in identifying the right number that corresponds to the image of a handwriting number which is used as input. It's one of the most interesting problems in the machine learning field, so I wanted to give it a shoot. In this occasion I decided to work with a reduced version of the problem which is to consider just two numbers or labels for making the classification, the numbers one and five in this case. The data was previously splited into two different files which were imported as two datasets in the project, the first for training and the other for testing, each one is composed of information about grayscale images of numbers from zero to nine and labels or numbers that represent each image, I'll put both datasets on the repository. I implemented three diferent algorithms or models in order to make an analysis and comparison on them and see which one shows better results, thoses models were the perceptron, pocket algorithm, and linear regression model. On the other hand, instead of using raw data of images as input in the training process it was estimated more representative characteristics to use in their place which are the intensity and simmetry of an image.
- Perceptron Algorithm
- Pocket Algorithm
- Linear regression model
- Learning paradigm: Supervised
- Machine learning
- Dataset size: 9298 images (numbers from 0 to 9)
- Dataset format file: .csv (2 files)
- Initial number of characteristics: 256
- Train data size: 1561 images (images with 1 and 5)
- Test data size: 421 images (images with 1 and 5)
- Final number of characteristics: 2
- Characteristics: Intensity and simmetry of an image
- Output: 2 labels
- Iterations: 200
- Best result: 0.321 (pocket algorithm Error)
Programming languages & tools: Python, Jupyter notebook
Libraries & modules: Numpy, matplotlib, seaborn, Scikit-learn