ChenghaoMou
🖖 NLP enthusiast | 👨🏻💻 Docusign MLE | ⚔️ Proud Trojan | 🍵 Tea person
DocusignCalifornia, US
Pinned Repositories
awesome-data-deduplication
An awesome list of data deduplication use cases, papers, tools, and methods.
chenghaomou.github.io
Personal Blog
deduplicate-text-datasets
A modified version of Google's tool for pure text file
embeddings
zero-vocab or low-vocab embeddings
karafuru
Traditional Chinese colors in your terminal
pytorch-pQRNN
Implementation of pQRNN in PyTorch
simhash
Simhash in C++
text-dedup
All-in-one text de-duplication
touchbar-lyric
Show synced lyric in the touch-bar with BetterTouchTool and NetEase APIs
transformer-pointer-generator
Transformer with pointer generator for machine translation
ChenghaoMou's Repositories
ChenghaoMou/text-dedup
All-in-one text de-duplication
ChenghaoMou/touchbar-lyric
Show synced lyric in the touch-bar with BetterTouchTool and NetEase APIs
ChenghaoMou/pytorch-pQRNN
Implementation of pQRNN in PyTorch
ChenghaoMou/embeddings
zero-vocab or low-vocab embeddings
ChenghaoMou/awesome-data-deduplication
An awesome list of data deduplication use cases, papers, tools, and methods.
ChenghaoMou/chenghaomou.github.io
Personal Blog
ChenghaoMou/deduplicate-text-datasets
A modified version of Google's tool for pure text file
ChenghaoMou/karafuru
Traditional Chinese colors in your terminal
ChenghaoMou/simhash
Simhash in C++
ChenghaoMou/lightning-grid-template
A minimal template for pytorch-lightning and grid.ai
ChenghaoMou/mini-vae
Minimal GMM VAE model for NLP
ChenghaoMou/ai.robots.txt
A list of AI agents and robots to block.
ChenghaoMou/awesome-nlp
:book: A curated list of resources dedicated to Natural Language Processing (NLP)
ChenghaoMou/bender-ruler
Bender Rule analysis for NLP papers
ChenghaoMou/bigcode-analysis
Repository for analysis notebooks and experimentes of the BigCode project.
ChenghaoMou/bigcode-dataset
ChenghaoMou/blog
Public repo for HF blog posts
ChenghaoMou/chenghaomou
ChenghaoMou/closedapi
Tired of seeing not-so-open apis behind paywalls.
ChenghaoMou/data_tooling
Tools for managing datasets for governance and training.
ChenghaoMou/edgar-crawler
SEC EDGAR Exhibit Downloader
ChenghaoMou/file-explorer-markdown-titles
Obsidian Plugin that adds the the markdown title within your notes to the file explorer
ChenghaoMou/go-wordninja
Probabilistically split concatenated words using NLP based on English Wikipedia unigram frequencies.
ChenghaoMou/open-source-mac-os-apps
🚀 Awesome list of open source applications for macOS. https://t.me/s/opensourcemacosapps
ChenghaoMou/paper2audio
Convert research papers to audio files.
ChenghaoMou/presidio
Context aware, pluggable and customizable data protection and de-identification SDK for text and images
ChenghaoMou/pytorch-dice-loss
Dice loss for data-imbalanced NLP tasks
ChenghaoMou/quartz
🌱 a fast, batteries-included static-site generator that transforms Markdown content into fully functional websites
ChenghaoMou/star-classification
A tool for the projects you starred on GitHub
ChenghaoMou/table-transformer-doclaynet
Table Transformer Fine-tuned with DocLayNet Dataset