/TFDistilBert-N-grams

Twitter has become an important communication channel in times of emergency. The ubiquitousness of smartphones enables people to announce an emergency they’re observing in real-time. Because of this, more agencies are interested in programatically monitoring Twitter (i.e. disaster relief organizations and news agencies).

Primary LanguageJupyter NotebookApache License 2.0Apache-2.0

Disaster Tweets – Deep NLP Analysis & [] Distributed Training DistilBert-N-grams-sst-en

Copyright [2022] [AI Engineer: Ahmed]

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

📖Overview

Twitter has become an important communication channel in times of emergency. The ubiquitousness of smartphones enables people to announce an emergency they’re observing in real-time. Because of this, more agencies are interested in programatically monitoring Twitter (i.e. disaster relief organizations and news agencies).

📝Acknowledgments

This dataset was created by the company figure-eight and originally shared on their ‘Data For Everyone’ website here.

Tweet source: https://twitter.com/AnyOtherAnnaK/status/629195955506708480

📝Proof of Work

I decided to use the Transfer Learning in this project and Fine-tune it since I believe in not to re-invent the wheels. We're going to use one of my favourite libraries; it's Hugging Face. It is the library – you need to use to inject your model with a pre-trained model that has trained on billions of examples.

📚Dictionary

Variables Definition
id a unique identifier for each tweet
text the text of the tweet
location the location the tweet was sent from (may be blank)
keyword a particular keyword from the tweet (may be blank)
target in train.csv only, this denotes whether a tweet is about a real disaster (1) or not (0)