Pinned Repositories
bunkai
Sentence boundary disambiguation tool for Japanese texts (日本語文境界判定器)
ditto
Code for the paper "Deep Entity Matching with Pre-trained Language Models"
ginza
A Japanese NLP Library using spaCy as framework based on Universal Dependencies
HappyDB
A corpus of 100,000 happy moments
jrte-corpus
Japanese Realistic Textual Entailment Corpus (NLP 2020, LREC 2020)
opiniondigest
OpinionDigest: A Simple Framework for Opinion Summarization (ACL 2020)
sato
Code and data for Sato https://arxiv.org/abs/1911.06311.
SubjQA
A question-answering dataset with a focus on subjective information
t5-japanese
Codes to pre-train Japanese T5 models
vecscan
Megagon Labs's Repositories
megagonlabs/ginza
A Japanese NLP Library using spaCy as framework based on Universal Dependencies
megagonlabs/ditto
Code for the paper "Deep Entity Matching with Pre-trained Language Models"
megagonlabs/bunkai
Sentence boundary disambiguation tool for Japanese texts (日本語文境界判定器)
megagonlabs/sato
Code and data for Sato https://arxiv.org/abs/1911.06311.
megagonlabs/opiniondigest
OpinionDigest: A Simple Framework for Opinion Summarization (ACL 2020)
megagonlabs/vecscan
megagonlabs/SubjQA
A question-answering dataset with a focus on subjective information
megagonlabs/asdc
Accommodation Search Dialog Corpus (宿泊施設探索対話コーパス)
megagonlabs/starmie
Resources for PVLDB 2023 submission
megagonlabs/cocosum
:coconut: Code & Data for Comparative Opinion Summarization via Collaborative Decoding (Iso et al; Findings of ACL 2022)
megagonlabs/zett
:see_no_evil: Code for Zero-shot Triplet Extraction by Template Infilling (Kim et al; IJCNLP-AACL 2023)
megagonlabs/llm-longeval
💵 Code for Less is More for Long Document Summary Evaluation by LLMs (Wu*, Iso* et al; EACL 2024)
megagonlabs/meganno-client
megagonlabs/xatu
🕊️ Code and Data for XATU: A Fine-grained Instruction-based Benchmark for Explainable Text Updates (Zhang et al; LREC-COLING 2024)
megagonlabs/holobench
🫧 Code for Holistic Reasoning with Long-Context LMs: A Benchmark for Database Operations on Massive Textual Data (Maekawa*, Iso* et al.; Oct 2024)
megagonlabs/witqa
megagonlabs/ambignlg
:dog: Data for AmbigNLG: Addressing Task Ambiguity in Instruction for NLG (Niwa and Iso; EMNLP 2024)
megagonlabs/CMDBench
Data and Code for CMDBench experiments
megagonlabs/Hallucination_MDS
megagonlabs/magneton
Repository of the Magneton framework for authoring interaction-aware and customizable widgets.
megagonlabs/watchog
The code for SIGMOD 2024 paper titled "Watchog: A Light-weight Contrastive Learning based Framework for Column Annotation"
megagonlabs/MCR
megagonlabs/napa
🍷 Code for Noisy Pairing and Partial Supervision for Stylized Opinion Summarization (Iso et al; INLG 2024)
megagonlabs/pilota
✈ SCUD generator (解釈文生成器)
megagonlabs/autotemplate
🧩 Code for AutoTemplate: A Simple Recipe for Lexically Constrained Text Generation (Iso; INLG 2024)
megagonlabs/meganno-service
megagonlabs/meganno-ui
megagonlabs/Megatron-LM
Ongoing research training transformer models at scale
megagonlabs/rjdb
megagonlabs/TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.