The goal of this project was to come up with a named-entity recognition model for a corpus of text from Twitter. This meant that for each tweet, the sub-spans of the words that represent the entities need to be identified. There were a total of 2,394 pretokenized tweets where each token was tagged as either B, I, or O. B means that the token is the start of an entity, I means that this token is part of an entity mention (not first token in mention though), and O means that this token is not part of an entity.
For details on how this problem was approached and what the results were, you can check out the full report here: https://github.com/viren-velacheri/NLP_FINAL_PROJECT/blob/master/NLP_Final_Project_Report.pdf