In this project, I analyzed a dataset containing 79 explanatory variables describing (almost) every aspect of residential homes in Ames, Iowa. It's an expanded and more complex version of the often cited Boston Housing dataset. The target is to predict the sale price of the houses.
- Simple linear regression using features with highest correlation with target variable
- Linear regression with Ridge/Lasso regularization
- Principal Component Analysis
- Gradient boosting decision trees using XGBoost
The project is at this link: House saleprices project. It can also be found in the folder 'House_price_advanced', containing also a .txt file with the description of the 79 features.
In this project, I analyzed the Sentiment140 dataset, containing 1.6M tweets. Each tweet is labelled as 'positive' or 'negative' based on its sentiment. The target of this project is to build a model able to predict whether a tweet has a 'positive' or 'negative' polarity.
- Simple Naive Bayes using td-idf
- Convolutional Neural Network
- Recurrent Neural Network using LSTM
- Recurrent Neural Network using LSTM and word embedding using Word2Vec
The project is at this link: NLP project. It can also be found in the folder 'NLP analysis sentiment_140'.
3) Image classification using Convolutional Neural Networks - Prediction of pneumonia from chest X-ray images comparing different CNN architectures
In this project, I analyzed a dataset containing 5783 chest X-ray images of pediatric patients. The goal is to build a model able to predict whether a patient has pneumonia or not.
- Simple CNN with 5 convolutional layers
- AlexNet
- Transfer learning using VGG-16
- Transfer learning using DenseNet121
The project is at this link: CNN project. It can also be found in the folder 'Image_classification_chestXray_pneumonia'.
4) Speech recognition & multiclass classification problem - Recognition of emotions from audio files using Toronto TESS dataset
In this project, I analyzed the Otto the TESS Toronto dataset, containing around 2800 audio files in WAV format.
The files contain 200 target words that were spoken in the phrase "Say the word_' by two different voices, and each of the recording portrays one of seven different emotions: anger, disgust, fear, happiness, pleasant surprise, sadness, and neutral. The goal is build a model able to predict the correct emotion of the recording.
- Support Vector Machine
- k-nearest neighbors
- Random forest
- Gradient boosting decision trees
The project is at this link: Speech emotion recognition project. It can also be found in the folder 'Speech_emotion'.
In this notebook, I have used a GAN to generate images that look like those of the celebA dataset, containing more than 200,000 pictures of face images of various celebrities.
The project is at this link: GAN project. It can also be found in the folder 'GAN project'.
6) Multiclass classification problem - Comparison of different machine learning algorithms on the categorization of products from the Otto dataset
In this project, I analyzed the Otto product dataset, containing 93 features of more than 200,000 products. There are nine categories for all products,
each of them representing one of the most important product categories (like fashion, electronics, etc.). The goal is to build a model able to predict in which category a given product belongs to.
- Logistic regression
- k-nearest neighbors
- Random forest
- Gradient boosting decision trees using XGBoost
- Dimensionality reduction: Principal component analysis and TSNE
- Stacking: combination of different algorithms
The project is at this link: Otto project. It can also be found in the folder 'Otto products'.
In this project I have analyzed a multivariate time series representing the total sales of different types of goods from 1993 to 2020 in the US (alcohol, groceries, household appliances and furniture).
The target is to create a model able to predict the future sales (12 months in the future) in the 4 different categories. This has been done by using Recurrent Neural Networks (RNN). In particular, two types of architecture have been used:
- LSTM (Long-Short Term Memory)
- GRU (Gated Recurrent Unit)
The project is at this link: RNN project. It can also be found in the folder 'RNN_project'.
In this project I trained a simple ML model to assess a chess position, returning the evaluation number (which tells who has the advantage, white or black, and by how much).
The target is to create a simple 'chess engine' by training a simple ML model, using positions from real games each labelled with the evaluation.
The project is at this link: Chess engine
In this project I built a fake news / real news classifier exploting a popular transformer model, BERT.
The target is to build a classifier able to distinguish fake news from real news.
The project is at this link: Fake news detection