TajaKuzman

PhD student in Computational Linguistics with a MA in Translation (FR, EN&SI). Main interests: large language models, language technologies and resources

Jožef Stefan InstituteLjubljana, Slovenia

Pinned Repositories

Achademio
AI assistant, based on the GPT-3.5 model by OpenAI, designed to enhance your proficiency in writing research papers. Allows you to adapt your content to academic standards, transform bullet points into eloquent text, or enhance the quality of your writing through error detection.
Language:Python28 2 09
AGILE-Automatic-Genre-Identification-Benchmark
A benchmark for evaluating robustness of automatic genre identification models to test their usability for the automatic enrichment of large text collections with genre information.
Language:Jupyter Notebook5 1 00
Applying-GENRE-on-MaCoCu-bilingual
Language:Jupyter Notebook0 1 01
Cross-Lingual-and-Cross-Dataset-Experiments-with-Genre-Datasets
Language:Jupyter Notebook0 1 00
Hate-Speech-Classification
Classification of hate speech and implicitness of hate speech, using Transformer language models (BERT). This repository can be used as an introduction to text classification with BERT-like models.
Language:Jupyter Notebook1 1 00
IPTC-Media-Topic-Classification
Development of a multilingual IPTC Media Topic classifier for single-label topic classification of the 17 top-level topic labels from the IPTC Media Topic hierarchical schema.
Language:Jupyter Notebook20
NER-recognition
An evaluation of various encoder Transformer-based large language models on the named entity recognition task. The models are compared on 6 datasets, manually-annotated with named entitites.
Language:Jupyter Notebook00
pandachat-rag-benchmark
PandaChat-RAG benchmark for evaluation of RAG systems on a non-synthetic Slovenian test dataset.
Language:Python0 1 00
Parlamint-translation
A pipeline for machine translation (using OPUS-MT models) of parliamentary text collections in 30+ languages (ParlaMint corpora). The pipeline includes parsing TEI XLM and CONLL-u files, linguistic processing with the Stanza pipeline, machine translation and word alignment with the Eflomal tool.
Language:Jupyter Notebook2 1 00
Topic-Classification-FastText-Transformers
Training and evaluating topic classification models (fastText and Transformer-based language models) for topic classification of Slovenian news texts. The repository can be used as a tutorial to learn topic classification.
Language:Jupyter Notebook4 1 00

TajaKuzman's Repositories

TajaKuzman/Achademio
AI assistant, based on the GPT-3.5 model by OpenAI, designed to enhance your proficiency in writing research papers. Allows you to adapt your content to academic standards, transform bullet points into eloquent text, or enhance the quality of your writing through error detection.
Language:Python28 2 09
TajaKuzman/AGILE-Automatic-Genre-Identification-Benchmark
A benchmark for evaluating robustness of automatic genre identification models to test their usability for the automatic enrichment of large text collections with genre information.
Language:Jupyter Notebook5 1 00
TajaKuzman/Topic-Classification-FastText-Transformers
Training and evaluating topic classification models (fastText and Transformer-based language models) for topic classification of Slovenian news texts. The repository can be used as a tutorial to learn topic classification.
Language:Jupyter Notebook4 1 00
TajaKuzman/IPTC-Media-Topic-Classification
Development of a multilingual IPTC Media Topic classifier for single-label topic classification of the 17 top-level topic labels from the IPTC Media Topic hierarchical schema.
Language:Jupyter Notebook20
TajaKuzman/Parlamint-translation
A pipeline for machine translation (using OPUS-MT models) of parliamentary text collections in 30+ languages (ParlaMint corpora). The pipeline includes parsing TEI XLM and CONLL-u files, linguistic processing with the Stanza pipeline, machine translation and word alignment with the Eflomal tool.
Language:Jupyter Notebook2 1 00
TajaKuzman/Hate-Speech-Classification
Classification of hate speech and implicitness of hate speech, using Transformer language models (BERT). This repository can be used as an introduction to text classification with BERT-like models.
Language:Jupyter Notebook1 1 00
TajaKuzman/Applying-GENRE-on-MaCoCu-bilingual
Language:Jupyter Notebook0 1 01
TajaKuzman/Cross-Lingual-and-Cross-Dataset-Experiments-with-Genre-Datasets
Language:Jupyter Notebook0 1 00
TajaKuzman/Genre-Datasets-Comparison
Language:Jupyter Notebook0 1 00
TajaKuzman/GINCO-Genre-Annotation-Guidelines
Genre Annotation Guidelines for GINCO corpora
Language:JavaScript0 0 01
TajaKuzman/NER-recognition
An evaluation of various encoder Transformer-based large language models on the named entity recognition task. The models are compared on 6 datasets, manually-annotated with named entitites.
Language:Jupyter Notebook00
TajaKuzman/pandachat-rag-benchmark
PandaChat-RAG benchmark for evaluation of RAG systems on a non-synthetic Slovenian test dataset.
Language:Python0 1 00
TajaKuzman/Text-Representations-in-FastText
Analysing different text representations for genre identification. I parse CONLL-u files and extract various representations of a text (running text, lemmas, part-of-speech), then train a Fasttext model on each to see which representation is the most beneficial for the genre identification task.
Language:Jupyter Notebook0 1 00
TajaKuzman/machinetranslate.org
Open resources and community for machine translation
Language:HTML
TajaKuzman/notion_widgets
A set of HTML widgets that could be embedded into Notion.so https://www.notion.so/ pages. For more see https://blog.shorouk.dev/notion-widgets-gallery/
Language:HTML0 0
TajaKuzman/Objectivity_Prediction_Web_App
A ML web app which detect objectivity of the text
Language:Jupyter Notebook
TajaKuzman/semshift_esslli2023
Hands-on sessions for ESSLLI course "Computational approaches to semantic change detection"
Language:Jupyter Notebook
TajaKuzman/Taja-Kuzman-Home-Page
Home page to Taja Kuzman's GitHub repository.
1 0
TajaKuzman/task7
Variety identification
Language:Jupyter Notebook0 0
TajaKuzman/tdm-notebooks
Example notebooks and tutorials from Constellate, the text analysis service from ITHAKA.
Language:Jupyter Notebook0 0
TajaKuzman/Transformers-GINCO-Experiments
Language:Jupyter Notebook

TajaKuzman

Pinned Repositories

Achademio

AGILE-Automatic-Genre-Identification-Benchmark

Applying-GENRE-on-MaCoCu-bilingual

Cross-Lingual-and-Cross-Dataset-Experiments-with-Genre-Datasets

Hate-Speech-Classification

IPTC-Media-Topic-Classification

NER-recognition

pandachat-rag-benchmark

Parlamint-translation

Topic-Classification-FastText-Transformers

TajaKuzman's Repositories

TajaKuzman/Achademio

TajaKuzman/AGILE-Automatic-Genre-Identification-Benchmark

TajaKuzman/Topic-Classification-FastText-Transformers

TajaKuzman/IPTC-Media-Topic-Classification

TajaKuzman/Parlamint-translation

TajaKuzman/Hate-Speech-Classification

TajaKuzman/Applying-GENRE-on-MaCoCu-bilingual

TajaKuzman/Cross-Lingual-and-Cross-Dataset-Experiments-with-Genre-Datasets

TajaKuzman/Genre-Datasets-Comparison

TajaKuzman/GINCO-Genre-Annotation-Guidelines

TajaKuzman/NER-recognition

TajaKuzman/pandachat-rag-benchmark

TajaKuzman/Text-Representations-in-FastText

TajaKuzman/machinetranslate.org

TajaKuzman/notion_widgets

TajaKuzman/Objectivity_Prediction_Web_App

TajaKuzman/semshift_esslli2023

TajaKuzman/Taja-Kuzman-Home-Page

TajaKuzman/task7

TajaKuzman/tdm-notebooks

TajaKuzman/Transformers-GINCO-Experiments