malteos
NLP researcher. Currently working on: Document similarity, LLMs, scientific & legal document processing
Berlin, Germany
Pinned Repositories
aspect-document-similarity
Implementation, trained models and result data for the paper "Aspect-based Document Similarity for Research Papers" #COLING2020
awesome-document-similarity
A curated list of resources on document similarity measures (papers, tutorials, code, ...)
clp-transfer
Efficient Language Model Training through Cross-Lingual and Progressive Transfer Learning
legal-document-similarity
Legal document similarity - Code, data, and models for the ICAIL 2021 paper "Evaluating Document Representations for Content-based Legal Literature Recommendations"
llm-datasets
A collection of datasets for language model pretraining including scripts for downloading, preprocesssing, and sampling.
pytorch-bert-document-classification
Enriching BERT with Knowledge Graph Embedding for Document Classification (PyTorch)
scincl
Neighborhood Contrastive Learning for Scientific Document Representations with Citation Embeddings (EMNLP 2022 paper)
semantic-document-relations
Implementation, trained models and result data for the paper "Pairwise Multi-Class Document Classification for Semantic Relations between Wikipedia Articles"
oldp
Open Legal Data Platform
malteos's Repositories
malteos/awesome-document-similarity
A curated list of resources on document similarity measures (papers, tutorials, code, ...)
malteos/scincl
Neighborhood Contrastive Learning for Scientific Document Representations with Citation Embeddings (EMNLP 2022 paper)
malteos/aspect-document-similarity
Implementation, trained models and result data for the paper "Aspect-based Document Similarity for Research Papers" #COLING2020
malteos/llm-datasets
A collection of datasets for language model pretraining including scripts for downloading, preprocesssing, and sampling.
malteos/semantic-document-relations
Implementation, trained models and result data for the paper "Pairwise Multi-Class Document Classification for Semantic Relations between Wikipedia Articles"
malteos/clp-transfer
Efficient Language Model Training through Cross-Lingual and Progressive Transfer Learning
malteos/german-language-models
A collection of German GPT language models
malteos/awesome-contrastive-learning-for-nlp
A collection of papers about contrastive learning for natural language processing.
malteos/getting-started
malteos/emnlp2022-papers
malteos/finetune-evaluation-harness
malteos/chat-ui
Open source codebase powering the HuggingChat app
malteos/data-sourcing
malteos/NeMo
NeMo: a toolkit for conversational AI
malteos/qaleph
Search and browse documents and data; find the people and companies you look for.
malteos/turkish-lm-bias
Investigating Gender Bias in Turkish Language Models
malteos/arqmath
malteos/bigs-lm-evaluation-harness
A framework for few-shot evaluation of autoregressive language models.
malteos/bigscience-Megatron-DeepSpeed-oxw
Ongoing research training transformer language models at scale, including: BERT & GPT-2
malteos/documentation
malteos/followthemoney
Data model and processing tools for investigative entity data
malteos/gpt-neox-oxw
An implementation of model parallel autoregressive transformers on GPUs, based on the DeepSpeed library.
malteos/Megatron-LLM
distributed trainer for LLMs
malteos/Megatron-LM
Ongoing research training transformer models at scale
malteos/mteb
MTEB: Massive Text Embedding Benchmark
malteos/Negative-Sampling-Paper
This repository collects 100 papers related to negative sampling methods.
malteos/Open-Assistant
OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.
malteos/promptsource
Toolkit for creating, sharing and using natural language prompts.
malteos/sentence-transformers
Multilingual Sentence & Image Embeddings with BERT
malteos/testtest