Pinned Repositories
BERT_doc_classification
Document classification with BERT
bert_document_classification
architectures and pre-trained models for long document classification.
BERT_NER
NER with BERT
cache-conda-envs
Speed up your builds by caching Anaconda environments on GitHub Actions
CVDD-PyTorch
A PyTorch implementation of Context Vector Data Description (CVDD), a method for Anomaly Detection on text.
Demo
Demo repo for tutotial articles on Opensource.com
diffgram
Training Data (Data Labeling, Annotation, Catalog, Workflow) for all Data Types (Image, Video, 3D, Text, Geo, Audio, more) at scale.
dkpro-cassis
UIMA CAS processing library written in Python
doc_classification_tfidf
DPR
Dense Passage Retriever - is a set of tools and models for open domain Q&A task.
ArneDefauw's Repositories
ArneDefauw/BERT_doc_classification
Document classification with BERT
ArneDefauw/BERT_NER
NER with BERT
ArneDefauw/cache-conda-envs
Speed up your builds by caching Anaconda environments on GitHub Actions
ArneDefauw/Demo
Demo repo for tutotial articles on Opensource.com
ArneDefauw/diffgram
Training Data (Data Labeling, Annotation, Catalog, Workflow) for all Data Types (Image, Video, 3D, Text, Geo, Audio, more) at scale.
ArneDefauw/dkpro-cassis
UIMA CAS processing library written in Python
ArneDefauw/doc_classification_tfidf
ArneDefauw/DPR
Dense Passage Retriever - is a set of tools and models for open domain Q&A task.
ArneDefauw/fake_news_semantics
Code for the paper "Do Sentence Interactions Matter ? Leveraging Sentence Level Representations for Fake News Classification"
ArneDefauw/FakeNewsCorpusSpanish
The Spanish Fake News Corpus contains a collection of 971 news divided into 491 real news and 480 fake news. The corpus covers news from 9 different topics: Science, Sport, Economy, Education, Entertainment, Politics, Health, Security, and Society
ArneDefauw/files2rouge
Calculating ROUGE score between two files (line-by-line)
ArneDefauw/ganbert-pytorch
Enhancing the BERT training with Semi-supervised Generative Adversarial Networks in Pytorch/HuggingFace
ArneDefauw/ilastik-napari
ilastik plugin for napari
ArneDefauw/Legal-Docs-Large-MLTC
Multi Label Text Classification for Legal documents. Work on mono-lingual and multilingual parallel data
ArneDefauw/lmtc-eurlex57k
Large-Scale Multi-Label Text Classification on EU Legislation
ArneDefauw/mlm-scoring
Python library & examples for Masked Language Model Scoring (ACL 2020)
ArneDefauw/multi-eurlex
MultiEURLEX - A multi-lingual and multi-label legal document classification dataset for zero-shot cross-lingual transfer
ArneDefauw/multilingual-fake-news
The code related to the paper
ArneDefauw/Multimodal-Toolkit
Multimodal model for text and tabular data with HuggingFace transformers as building block for text data
ArneDefauw/neural-document-aligner
Document aligner which uses neural technologies to search matches across bilingual documents
ArneDefauw/Nimbus
ArneDefauw/question_generator
An NLP system for generating reading comprehension questions
ArneDefauw/quick-tips
ArneDefauw/spatialdata
An open and universal framework for processing spatial omics data
ArneDefauw/spatialdata-io
ArneDefauw/TopicalChange
Code accompanying the submission "Structural Text Segmentation of Legal Documents" by Aumiller et al.
ArneDefauw/trafilatura
Web scraping library and command-line tool for text discovery and extraction (main content, metadata, comments)
ArneDefauw/Voice-Privacy-Challenge-2020
Baseline Recipe for VoicePrivacy Challenge 2020: https://www.voiceprivacychallenge.org/docs/VoicePrivacy_2020_Eval_Plan_v1_3.pdf
ArneDefauw/word2word
Easy-to-use word-to-word translations for 3,564 language pairs.
ArneDefauw/wordfreq
Access a database of word frequencies, in various natural languages.