These are my jupyter notebooks on ML & DL.
- Pytorch Deep Learning Framewrok.
- Fastai For training fast and accurate NNs using modern best practices.
- Scikit-learn For Machine learning algorithms.
- Pandas For data manipulation.
- Spacy For NLP processing.
- Notebook Using Random Forests to predict income from Tabular Data.
- Notebook Using Deep Neural Networks to predict income from Tabular Data.
- Notebook Using a MLP (Multi Layer Perceptron) to classify images from the MNIST dataset. Written in vanilla Pytorch
- Notebook Using a CNN (Convoloution Neural Network) to classify images from the CIFAR-10 dataset. Written in vanilla Pytorch.
- Notebook Using Transfer learning to fine-tune a Resnet pre-trained on Imagenet to recognize Arabic handwritten characters. Acheiving SOTA result ~98% Accuracy.
Publishing the SOTA pre-trained Language model for Arabic Language trained on ~800,000 Wikipedia articles following the paper ULMFiT (Universal Language Model Fine-tuning for Text Classification) .
Simple transfer learning using just a single layer of weights (embeddings) has been extremely popular for some years, such as the word2vec embeddings from Google However, full neural networks in practice contain many layers and can encompass much more details about the language and many implementations for this idea have emerged in the last year like ULMFit, ELMo, GLoMo, OpenAI transformer, BERT.
The published Language model weights are available here and can be used for a variety of NLP tasks like (Sentiment Analysis, Text Generation ) and any other type ask that require the model to have an understanding of the language semantics.
-
Classification of HARD (Hotel Arabic Reviews Dataset) :
-
This dataset contains 93700 hotel reviews in Arabic language. The hotel reviews were collected from Booking.com website during June/July 2016. The reviews are expressed in Modern Standard Arabic as well as dialectal Arabic
-
Notebook using the balanced reviews file (50 % neg, 50% pos).
-
Notebook unsing the unbalanced reviews file (13% neg, 19% ntl, 68% pos).
-
Both notebooks achieve a better result (+4% in F1-score) than the one published in the assosiated paper
-
-
Classification of BRAD (Books Reviews in Arabic Dataset) :
-
This dataset contains 510,600 book reviews in Arabic language. The reviews were collected from GoodReads.com website during June/July 2016. The reviews are expressed mainly in Modern Standard Arabic but there are reviews in dialectal Arabic as well.
-
Notebook using the balanced reviews file (50 % neg, 50% pos).
-
Notebook uses the unbalanced reviews file (9% neg, 12% ntl, 79% pos).
-
-
Sentiment Analysis for Arabic Tweets:
- This dataset contains A corpus of Arabic tweets (2,104,671 Positive tweets, 2,313,457 Negative Tweets)categorized based on some emoji characters appearance.
- Notebook Although most of the tweets in this dataset are in dialectal Arabic while the language model is mostly trained on standard Arabic the model achieves +90% classification accuracy it can even recognize how emojis affect the sentiment of the tweet.
-
Text generation using previous Tweets from the Twitter API:
- Notebook This is a proof of concept, it needs more research from me on text generation and also needs more data but it's a fun experiment to play with and can generate some fun results :D .
-
Python3.6
-
fastai 1.0.51.dev0
after normal installation use
pip install git+https://github.com/fastai/fastai.git
to get the bleeding edge version needed for some QRNN fixes.
Every notebook contains links to download the dataset it uses, create a data
folder to store the downloaded files.
Every notebook contains more details about the specific implementaions of the model used.