TajaKuzman
PhD student in Computational Linguistics with a MA in Translation (FR, EN&SI). Main interests: large language models, language technologies and resources
Jožef Stefan InstituteLjubljana, Slovenia
Pinned Repositories
Achademio
AI assistant, based on the GPT-3.5 model by OpenAI, designed to enhance your proficiency in writing research papers. Allows you to adapt your content to academic standards, transform bullet points into eloquent text, or enhance the quality of your writing through error detection.
AGILE-Automatic-Genre-Identification-Benchmark
A benchmark for evaluating robustness of automatic genre identification models to test their usability for the automatic enrichment of large text collections with genre information.
Applying-GENRE-on-MaCoCu-bilingual
Cross-Lingual-and-Cross-Dataset-Experiments-with-Genre-Datasets
Hate-Speech-Classification
Classification of hate speech and implicitness of hate speech, using Transformer language models (BERT). This repository can be used as an introduction to text classification with BERT-like models.
IPTC-Media-Topic-Classification
Development of a multilingual IPTC Media Topic classifier for single-label topic classification of the 17 top-level topic labels from the IPTC Media Topic hierarchical schema.
NER-recognition
An evaluation of various encoder Transformer-based large language models on the named entity recognition task. The models are compared on 6 datasets, manually-annotated with named entitites.
pandachat-rag-benchmark
PandaChat-RAG benchmark for evaluation of RAG systems on a non-synthetic Slovenian test dataset.
Parlamint-translation
A pipeline for machine translation (using OPUS-MT models) of parliamentary text collections in 30+ languages (ParlaMint corpora). The pipeline includes parsing TEI XLM and CONLL-u files, linguistic processing with the Stanza pipeline, machine translation and word alignment with the Eflomal tool.
Topic-Classification-FastText-Transformers
Training and evaluating topic classification models (fastText and Transformer-based language models) for topic classification of Slovenian news texts. The repository can be used as a tutorial to learn topic classification.
TajaKuzman's Repositories
TajaKuzman/Achademio
AI assistant, based on the GPT-3.5 model by OpenAI, designed to enhance your proficiency in writing research papers. Allows you to adapt your content to academic standards, transform bullet points into eloquent text, or enhance the quality of your writing through error detection.
TajaKuzman/AGILE-Automatic-Genre-Identification-Benchmark
A benchmark for evaluating robustness of automatic genre identification models to test their usability for the automatic enrichment of large text collections with genre information.
TajaKuzman/Topic-Classification-FastText-Transformers
Training and evaluating topic classification models (fastText and Transformer-based language models) for topic classification of Slovenian news texts. The repository can be used as a tutorial to learn topic classification.
TajaKuzman/IPTC-Media-Topic-Classification
Development of a multilingual IPTC Media Topic classifier for single-label topic classification of the 17 top-level topic labels from the IPTC Media Topic hierarchical schema.
TajaKuzman/Parlamint-translation
A pipeline for machine translation (using OPUS-MT models) of parliamentary text collections in 30+ languages (ParlaMint corpora). The pipeline includes parsing TEI XLM and CONLL-u files, linguistic processing with the Stanza pipeline, machine translation and word alignment with the Eflomal tool.
TajaKuzman/Hate-Speech-Classification
Classification of hate speech and implicitness of hate speech, using Transformer language models (BERT). This repository can be used as an introduction to text classification with BERT-like models.
TajaKuzman/Applying-GENRE-on-MaCoCu-bilingual
TajaKuzman/Cross-Lingual-and-Cross-Dataset-Experiments-with-Genre-Datasets
TajaKuzman/Crosslingual-Genre-Bias-Analysis
TajaKuzman/Genre-Datasets-Comparison
TajaKuzman/GINCO-Genre-Annotation-Guidelines
Genre Annotation Guidelines for GINCO corpora
TajaKuzman/NER-recognition
An evaluation of various encoder Transformer-based large language models on the named entity recognition task. The models are compared on 6 datasets, manually-annotated with named entitites.
TajaKuzman/pandachat-rag-benchmark
PandaChat-RAG benchmark for evaluation of RAG systems on a non-synthetic Slovenian test dataset.
TajaKuzman/Text-Representations-in-FastText
Analysing different text representations for genre identification. I parse CONLL-u files and extract various representations of a text (running text, lemmas, part-of-speech), then train a Fasttext model on each to see which representation is the most beneficial for the genre identification task.
TajaKuzman/machinetranslate.org
Open resources and community for machine translation
TajaKuzman/notion_widgets
A set of HTML widgets that could be embedded into Notion.so https://www.notion.so/ pages. For more see https://blog.shorouk.dev/notion-widgets-gallery/
TajaKuzman/Objectivity_Prediction_Web_App
A ML web app which detect objectivity of the text
TajaKuzman/semshift_esslli2023
Hands-on sessions for ESSLLI course "Computational approaches to semantic change detection"
TajaKuzman/Taja-Kuzman-Home-Page
Home page to Taja Kuzman's GitHub repository.
TajaKuzman/task7
Variety identification
TajaKuzman/tdm-notebooks
Example notebooks and tutorials from Constellate, the text analysis service from ITHAKA.
TajaKuzman/Transformers-GINCO-Experiments