May 3, 2021
Trying to find a passage within a long document can be very annoying when we don't recall it exactly. We have to either search for a single word and browse through what can be dozens of matches, or we can try to narrow our search down by trial-and-error with different combinations of words.
With word embeddings, we can do much better! We can use them to create a dense vector that encodes the meaning of a desired passage, and then search the document for an excerpt that matches our desired meaning.
In this article, I illustrate the concept by searching the classic Pride and Prejudice for passages that correspond to either simplified or translated parts of the book.
April 24, 2021
Aviation is a heavily regulated industry. There are tons of regulations and advisory material both at the global level (stemming mostly from ICAO, the UN's arm for civil aviation) and at the local level (from each country's own civil aviation authorities). Even though there is a huge effort put into interoperability, this sea of regulations is still far from standardized.
This vast variety makes the field perfect for the application of Natural Language Processing techniques.
This simple app showcases two such applications. In the first, the user is prompted to choose a requirement (either from a list taken from Brazilian regulations or from a free text input field) and a model tries to find the corresponding requirement in the US regulations. In the second, the user can select a requirement (either from the US or Brazilian regulations or from a free text input field) and a model determines whether that rule applies to the aircraft or to the operator.
In both applications, requirements are restricted to those applied to airlines, but the concept can easily be extended to text of any nature.
You can visit the web app at this link. You can also check the app code here, or the code for the comparison technique and classification model here.
April 1, 2021
Over the last decade, I spent of lot of time ensuring that aircraft, operations and airmen comply with civil aviation regulations from around the globe. I thought this topic would be an interesting pick for exploring the application of Recurrent Neural Networks to the creation of original text.
In this article, I explain how I trained a stack of Gated Recurrent Units on the full corpus of FAA regulations and discuss the results of using the model to generate new text following the same style. You can also see a teaser in the block below or visit this link for a 1-million characters sample of original regulatory text.
March 10, 2021
I'm sure by now everyone has seen examples of AI applications where an algorithm is left to train for some time on a large corpus of text or music and can then be used to create new content following the same style. This project is my take on training a recurrent neural network on the full set of JS Bach's four-part chorale pieces and then using it to create some new, original music.
This article gives a high-level description of the project and showcases some examples generated by the trained network. If you want to skip the details and just listen to some music, these are two of my favorite pieces:
<script src="https://cdn.jsdelivr.net/combine/npm/tone@14.7.58,npm/@magenta/music@1.21.0/es6/core.js,npm/focus-visible@5,npm/html-midi-player@1.1.1"></script>
Nov 26, 2020
A statistical model for predicting the Mechanism of Action of a drug based on gene expression and cell viability data.
A mechanism of action (MoA) is a label attributed to an agent to describe its biological activity. By being able to properly identify a molecule’s MoA, it can subsequently be used in a targeted manner to obtain a desired cell response.
The final model is a weighted average of predictions from several individual models: Logistic Regression, K-nearest neighbors, Naive Bayes with loess smoothing, Support Vector Machine and Multi-class Penalized Mixture Discriminant Analysis.
Created as part of the capstone project for the HarvardX Data Science Professional Certificate and inspired by a Kaggle competition.
Sep 21, 2020
A model for predicting movie ratings on the MovieLens dataset. A series of models of increased complexity are proposed considering user and movie average ratings, user preferences for particular movie genres, movie age at the time of rating, and finally the application of Item-Based Collaborative Filtering.
Created as part of the capstone project for the HarvardX Data Science Professional Certificate.
A very simple web-based recommendation system based on a minimalist version of the model is hosted at https://fabio-a-oliveira.shinyapps.io/MovieRecommenderApp/. You can also use it in the frame below:
<iframe src="https://fabio-a-oliveira.shinyapps.io/MovieRecommenderApp/" width="100%" height="600"> </iframe>Aug 4, 2020
A naive encoding of time series data, created before I knew anything about recurrent neural networks. A combination of a 1D conv-net for encoding and a recurrent network for inference would have been much better suited to the task. Cool visualizations though!
Jul 30, 2020
An investigation on the data from a 2015 PNAS research paper (used as a pretext for practicing R-Markdown and data visualization with ggplot2).