Pinned Repositories
brat
brat rapid annotation tool (brat) - for all your textual annotation needs
annodoc
Annodoc annotation documentation support system
conlleval.py
Python version of the evaluation script from CoNLL'00-
conllu.js
CoNLL-U format library for JavaScript
ncbi-disease
NCBI disease corpus - related resources
nxml2txt
NLM .nxml to text format conversion
standoff2conll
Conversion from brat-flavored standoff to CoNLL format
wiki-bert-pipeline
Generate BERT vocabularies and pretraining examples from Wikipedias
wvlib
word vector library
docs
Universal Dependencies online documentation
spyysalo's Repositories
spyysalo/lumi-llm-scaling
Scripts and documentation on scaling large language model training on the LUMI supercomputer
spyysalo/dl-binf-summer-school-2023
Material for 2023 Summer School on Applied Deep Learning in Bioinformatics
spyysalo/keras-bert-ner
Named entity recognition built on top of BERT and keras-bert.
spyysalo/warc-tools
Tools for working with Web ARChive files.
spyysalo/consensus-pipeline
Annotation consensus processing pipeline
spyysalo/finnish-natural-instructions
Tools and data for a Finnish machine translation of Natural Instructions (https://github.com/allenai/natural-instructions)
spyysalo/generative-lm-server
Simple generative language model service
spyysalo/instruction-finetune
Finetune language model on instruction data
spyysalo/lm-text-correction
Text correction using a language model
spyysalo/pdftools
Tools for working with PDF documents
spyysalo/string-db-tools
Tools for working with STRING database text mining data
spyysalo/torch-transformers-text-classifier
Simple text classifier using Transformers with the Torch backend.
spyysalo/bert-span-classifier
Text span classifier using BERT
spyysalo/Bert_classification
spyysalo/databricks-dolly-translation
Translation of Databricks Dolly instruction dataset
spyysalo/gendemo
Minimal text generation demo using transformers
spyysalo/gpt-neox
An implementation of model parallel autoregressive transformers on GPUs, based on the DeepSpeed library.
spyysalo/gutenberg-tools
Tools for working with Project Gutenberg texts (https://www.gutenberg.org/)
spyysalo/instruction-generation
Tools for generating instruction data
spyysalo/lumi-causal-lm-finetune
Tools for finetuning large causal language models on LUMI
spyysalo/Megatron-DeepSpeed
Ongoing research training transformer language models at scale, including: BERT & GPT-2
spyysalo/Megatron-LM
Ongoing research training transformer models at scale
spyysalo/mt-quality-assessment
Tools and resources for learning to predict machine translation quality
spyysalo/nanotron
Minimalistic large language model 3D-parallelism training
spyysalo/ni-to-chatml
Generate ChatML from Natural Instructions data
spyysalo/onion-tools
Tools for text deduplication using the onion (ONe Instance ONly) tool
spyysalo/paraphrase-generation
Tools and resources for training causal language model for paraphrase generation
spyysalo/suomi24-corpus
Tools for working with the Suomi24 corpus
spyysalo/taggedpdf
Tools for working with tagged PDF documents
spyysalo/xling-instructions
Generate instruction-formatted data from translation pairs