Pinned Repositories
4lang
Concept dictionary using Eilenberg machines
Adam-experiments
Experiments with Adam/AdamW/amsgrad
asr-evaluation
Python module for evaluating ASR hypotheses (e.g. word error rate, word recognition rate).
cc_corpus
Tools for compiling corpora from Common Crawl
debug_pytorch_lm
Pytorch language modeling experiments
emBERT
emtsv module for pre-trained Transfomer-based models
gensim
Python framework for efficient vector space modelling
graphchi-ltr
LTR framework based on GraphChi
lucene-solr
Mirror of Apache Lucene & Solr
zim_to_corpus
Scripts to extract (mostly) Wikipedia pages from .zim archives.
DavidNemeskey's Repositories
DavidNemeskey/cc_corpus
Tools for compiling corpora from Common Crawl
DavidNemeskey/zim_to_corpus
Scripts to extract (mostly) Wikipedia pages from .zim archives.
DavidNemeskey/emBERT
emtsv module for pre-trained Transfomer-based models
DavidNemeskey/Adam-experiments
Experiments with Adam/AdamW/amsgrad
DavidNemeskey/asr-evaluation
Python module for evaluating ASR hypotheses (e.g. word error rate, word recognition rate).
DavidNemeskey/awd-lstm-lm
LSTM and QRNN Language Model Toolkit for PyTorch
DavidNemeskey/awesome-hungarian-nlp
A curated list of NLP resources for Hungarian
DavidNemeskey/bert
TensorFlow code and pre-trained models for BERT
DavidNemeskey/cc_emergency_corpus
Code to create emergency corpora
DavidNemeskey/commoncrawl-downloader
Simple Python command line tools for retrieving a list of urls and specific files in bulk
DavidNemeskey/dep_search
Search back-end for dependency tree search. See the docs at https://fginter.github.io/dep_search/
DavidNemeskey/electra
ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators
DavidNemeskey/emLam
Preprocessing scripts for Hungarian Language Modeling
DavidNemeskey/emmorph2ud2
DavidNemeskey/emtokenpy
A python wrapper for quntoken.
DavidNemeskey/emtsv
e-magyar text processing system -- inter-module communication via tsv + REST API
DavidNemeskey/google-research
Google AI Research
DavidNemeskey/humaze
Hungarian Transformer-based A-maze implementation
DavidNemeskey/jupyterhub-deploy-teaching
Reference deployment of JupyterHub and nbgrader on a single server
DavidNemeskey/model_card
DavidNemeskey/python-idzip
Seekable, gzip compatible, compression format
DavidNemeskey/pytorch_lm
Pytorch language modeling experiments
DavidNemeskey/quntoken
New Hungarian tokenizer based on quex, huntoken
DavidNemeskey/search_engine
Simple user inteface for Whoosh
DavidNemeskey/sentencepiece
Unsupervised text tokenizer for Neural Network-based text generation.
DavidNemeskey/SlimeAnUTLE
DavidNemeskey/transformer-xl
DavidNemeskey/transformers
🤗Transformers: State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0.
DavidNemeskey/warc3
Python 3 library for reading and writing warc files
DavidNemeskey/xtsv