Projects from students of NLP Course
Name | Description | Team | Repository |
---|---|---|---|
Movie Poster Caption Generation | @kazzand | https://github.com/kazzand/huaweiproject | |
Chinese-Russian Machine Translation | @RonanenkovN | https://github.com/RomanenkovN/HuaweiNLP | |
Aspect-Based Sentiment Analysis in German | Identify aspect and document-level polarity of messages in German. It is important for German services providers such as railways. | @DrFirestream | https://github.com/DrFirestream/NLP |
Aspect Extraction with Capsule Networks | Topic modelling with CapsNet. Knowing what people are talking about and understanding their problems and opinions is highly valuable to businesses, administrators, political campaigns. And it’s really hard to manually read through such large volumes and compile the topics. Thus is required an automated algorithm that can read through the text documents and automatically output the topics discussed. | @KirillKrasikov | https://github.com/KirillKrasikov/TopicModelingWithCapsNet |
Text Summarization in Russian | The project's goal is to summarize the text for the Russian language. I think that one of the most valuable and expensive things in a person's life is their time. The task of selecting the main from text item will allow you not to read news articles in their entirety and save a lot of time. I planned to build a model that would make a summary for news about stock trading in Russian language. To create my own set of texts and there’s summary I have short news tweets in the telegram(as summary) and full news articles about trading on the exchange(texts) on the site https://quote.rbc.ru. | @medphisiker | https://github.com/medphisiker/Huawei-s-nlp-course-project |
BERT-based Aspect Extraction | The goal of my project is to solve the problem of aspect extraction from text data. In order to solve the problem one should discover not only an author's opinion of an entity mentioned in text but also opinions relative to specific properties of the entity called aspects. Aspects are represented in texts via aspect terms. The practical importance of the problem includes the possibility to use the developed models in analysis of social media to assess users' perception of products, manage brand reputation, conduct different political and social researches and so on. | @ulaelfray | https://bitbucket.org/ulaelfray/huawei-nlp-course/ |
Setiment Analysis in Russian | @alekxd | https://github.com/alekxd/project-NLP-sentiment-rus | |
Text Summarization Task in Russian | The problem which I am going to solve is summarization task in Russian. Nowadays, we have a lot of information and it is important to extract the main idea from a text, in my case the model will help people to generate headlines for news articles. | @alexvishnevskiy | https://github.com/alexvishnevskiy/Huawei-project |
Generation of news headlines | Summarization task in Russian for news data set | @germanjke, @kotyukov | https://github.com/germanjke/huaweiNLP |
Russian aspect-based sentiment analysis | BERT-based techniques to identify the sentiment of the selected entity in the text. For example, "In general I like the car but I hate it's ". The sentiment of the "color" is negative.The most relevant dataset is https://github.com/songyouwei/ABSA-PyTorch/tree/master/datasets/semeval14 | @preduct0r | https://github.com/preduct0r/huawei |
Jigsaw Multilingual Toxic Comment Classification | "Jigsaw Multilingual Toxic Comment Classification" is the Kaggle competition. Use TPUs to identify toxicity comments across multiple languages. We have to predict the probability that a comment is toxic/non-toxic. https://www.kaggle.com/c/jigsaw-multilingual-toxic-comment-classification | LeonidMorozov, Mteterin | https://github.com/LeonidMorozov/jigsaw_toxic_classification |
Headlines generation from news articles in Russian | Reading full texts is time consuming. If the headline of the text reflects the main idea of the original version, then reading it saves a lot of time. I will be working on Rossiya Segodnya (RIA) corpus, consisting of long text-heading pairs. I'm going to make Data preprocessing and then use Pre-trained embeddings to build Attentive RNN model in pyTorch implementation. | @vadimvvlasov | https://github.com/vadimvvlasov/nlp-project |
Text summarization by using the topic (aspect) of the text. | Our task is to hybridize topic modelling and summarization. Particularly - to use aspects vectors in the summary generation process. And therefore manipulate the focus of the summary. Subtask is to check whether an aspect can influence the result of summing text E.g. generate a different summary of the text by bias to one or more of its topics: text about sport event with attention to politics, should (from our point of view) summarize more information about famous people who attended the event, than about the event itself. | @dmitriy.valetov @RomanButov | https://github.com/DmitriyValetov/nlp_course_project |
Authorship probability estimation | Authorship probability assessment of doubtful documents attributed to the author; Single out the characteristic features inherent to the authors works; Approach to typing the periods of the author's creative works | @dbadeev | https://github.com/dbadeev/nlp_huawei_project |
Chinese to Russian machine translation | The zh-ru translation pair is pretty weak now even in Yandex and Google translation systems. The main goal of this project is to practice with attention models and build the machine translation system producing the decent BLEU. There is also a competition hosted by ML Bootcamp. | @averkij | https://github.com/averkij/ml-bootcamp-zh-ru-translation |
Search engine with topic document embeddings | Development of a search-engine using a topic model built with the help of the TopicNet library. The search corpora is based on byweb-2007 open collection. | @To-olak | https://github.com/Evgeny-Egorov-Projects/ROMIP-search |
Generate text with outside context changing. | In that project I want to try generate next word with external context. Also I want to try solve reverse text summarisation task with reverse text generation (previous word) if will be in time. Metrics will be statistical, like researches used here http://gltr.io/dist/index.html. Dataset will be collected from zero to understand all aspects of such work. | @FrankShikhaliev | https://github.com/MindSetLib/MS-Education/tree/master/NLP/HuaweiProject |
Math word problem solver/explainer | @Max Plevako | https://github.com/mplevako/tp-n2f | |
Social networks' posts classification | The problem I am trying to solve is the problem of social networks posts classification. The problem is important to solve because it helps to extract inappropriate content from the network and hide it from users, who are under the age limit. It also helps administrators of social networks to moderate their groups in an automated way. I will be working with the data that I will collect myself from the public resources. | @BorodinDmitriy | https://github.com/BorodinDmitriy/huawei-nlp-course |
Clustering learners’ essays on the basis of key words | @Aniezka,@lkoteuka | https://github.com/Aniezka/huawei-nlp-project | |
Topic modeling with Aspect Extraction using CapsNets for Russian | @e-lderberry | https://gitlab.com/tatiana_sham/topic_modeling | |
Document summarization on government's dataset with BART and PEGASUS | @Greyss | https://github.com/GreySR/huawei-project | |
PEGASUS finetuning on Russian sport text broadcasts | @kotik_konstantin | https://github.com/kotikkonstantin/pegasus-in-russian |