ewfian's Stars
tqdm/tqdm
:zap: A Fast, Extensible Progress Bar for Python and CLI
MuiseDestiny/zotero-style
Ethereal Style for Zotero
LDNOOBW/List-of-Dirty-Naughty-Obscene-and-Otherwise-Bad-Words
List of Dirty, Naughty, Obscene, and Otherwise Bad Words
castorini/pyserini
Pyserini is a Python toolkit for reproducible information retrieval research with sparse and dense representations.
Future-Scholars/paperlib
An open-source academic paper management tool.
llm-jp/awesome-japanese-llm
日本語LLMまとめ - Overview of Japanese LLMs
aozorabunko/aozorabunko
oh-my-ocr/text_renderer
stephenmk/Jitendex
A free, offline, and openly licensed Japanese-to-English dictionary. Updates weekly!
ikegami-yukino/neologdn
Japanese text normalizer for mecab-neologd
suguru03/made-in-japan
🇯🇵 A list of great developers and cool projects made in Japan 🍙
rskmoi/namedivider-python
A tool for dividing the Japanese full name into a family name and a given name.
WorksApplications/chiVe
Japanese word embedding with Sudachi and NWJC 🌿
chakki-works/chABSA-dataset
chakki's Aspect-Based Sentiment Analysis dataset
stockmarkteam/ner-wikipedia-dataset
Wikipediaを用いた日本語の固有表現抽出データセット
himkt/awesome-bert-japanese
📝 A list of pre-trained BERT models for Japanese with word/subword tokenization + vocabulary construction algorithm information
MaxKinny/TabRecSet
A large scale camera-taken table detection and recognition dataset.
biswassanket/synth_doc_generation
Official PyTorch Implementation of DocSynth: A Layout Guided Approach for Controllable Document Image Synthesis - ICDAR 2021
retarfi/language-pretraining
Pre-training Language Models for Japanese
prime-slam/line-detection-association-dockers
A collection of dockers for line detection and association algorithms
Greatdane/Convert-Numbers-to-Japanese
Converts Arabic numerals, or 'western' style numbers, to a Japanese context.
tanreinama/Japanese-BPEEncoder_V2
Japanese-BPEEncoder Version 2
takumakanari/japanese-numbers-python
A parser for Japanese number (Kanji, arabic) in the natural language.
chongzhangFDU/Token-Path-Prediction-Datasets
This is the official repository of the revised datasets FUNSD-r and CORD-r, introduced in EMNLP 2023 paper Reading Order Matters: Information Extraction from Visually-rich Documents by Token Path Prediction.
H-Ambrose/NTable
a dataset for camera-based table detection
shigashiyama/nlp_survey
JapanExchangeGroup/FinancialResultsHTML-DataExtraction
ricardobnjunior/Extended-Smartdoc-Dataset
Extended Smartdoc Dataset, a new dataset!
DungLe13/bidding-dataset
Bidding documents for paper "CinBidding: A Dataset for Domain-specific Information Extraction with Limited Data"
retarfi/jptranstokenizer
Japanese Tokenizer for transformers library