suhara's Stars
megagonlabs/napa
🍷 Code for Noisy Pairing and Partial Supervision for Stylized Opinion Summarization (Iso et al; INLG 2024)
abetlen/llama-cpp-python
Python bindings for llama.cpp
shizhediao/Post-Training-Data-Flywheel
We aim to provide the best references to search, select, and synthesize high-quality and large-quantity data for post-training your LLMs.
NVIDIA/RULER
This repo contains the source code for RULER: What’s the Real Context Size of Your Long-Context Language Models?
llm-jp/awesome-japanese-llm
日本語LLMまとめ - Overview of Japanese LLMs
karpathy/llm.c
LLM training in simple, raw C/CUDA
shure-dev/Awesome-LLM-Papers-Comprehensive-Topics
Awesome LLM Papers and repos on very comprehensive topics.
NVIDIA/NeMo-Curator
Scalable data pre processing and curation toolkit for LLMs
ddhruvkr/CONTRADOC
NVIDIA/NeMo
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
NVIDIA/NeMo-Aligner
Scalable toolkit for efficient model alignment
daviddao/awful-ai
😈Awful AI is a curated list to track current scary usages of AI - hoping to raise awareness
hplt-project/sacremoses
Python port of Moses tokenizer, truecaser and normalizer
allenai/RL4LMs
A modular RL library to fine-tune language models to human preferences
megagonlabs/doduo
Annotating Columns with Pre-trained Language Models
gotutiyan/GEC-Info
Repository to collect and categorize Grammatical Error Correction papers.
gentaiscool/indonesian-nlp
A curated list of research papers and resources on Indonesian languages
meetdavidwan/factpegasus
PyTorch code for "FactPEGASUS: Factuality-Aware Pre-training and Fine-tuning for Abstractive Summarization" (NAACL 2022)
mhagiwara/xfspell
xfspell — the Transformer Spell Checker
abrazinskas/sigir2022-opinion-summarization-tutorial
This repository contains materials for the SIGIR 2022 tutorial on opinion summarization.
megagonlabs/cocosum
:coconut: Code & Data for Comparative Opinion Summarization via Collaborative Decoding (Iso et al; Findings of ACL 2022)
ucinlp/autoprompt
AutoPrompt: Automatic Prompt Construction for Masked Language Models.
KalaniStanton/LSTMvBERT-NER
This repository contains multiple notebooks (created using Google Colab) that transform data from a doccano format for use in training a Bi-LSTM-CRF, fine-tuning a transformer using custom labels, and classifying using the fine-tuned bert-base-ner model
sfs0126/Lyric-Generator-fine-tuned-GPT-2
This project uses Huggingface transformers GPT-2 to fine-tune text generation models based on lyric data to specific music genres.
vivienneprince/bookcovers-ml-python
Judge a book by it's cover. Data from Open Library
tmccormack165/McCormack_Final_IMDb
Final project for Topics in Computing
lashleyaq/CellSegmentation
Chowlett2/Auto_Colorizer
A Convolutional Autoencoder for image colorization in Pytorch
sarahaman/CIS6930_TweetSum_Summarization
Performing abstractive summarization on dialogue-based texts poses several potential challenges to SOTA deep-learning techniques, which are tested primarily on single-author texts. I compare the performance of three SOTA pre-trained abstractive text summarization models on the TweetSum (He et al., 2020) dataset. Final project for CIS6390: Special Topics in Computing.
HHousen/TransformerSum
Models to perform neural summarization (extractive and abstractive) using machine learning transformers and a tool to convert abstractive summarization datasets to the extractive task.