Pinned Repositories
cookbot
minimal crawlers for scraping recipes from various recipe sites
findjatbar
find Jason and Terry on Yelp
gastroglot
machine translation for recipes
github_lda
GitHub LDA - Collaborative Topic Modeling for Recommending GitHub Repos
pyjvm
yet another jvm, written in pure Python
python-api-wrapper
MuseScore.com Python API Wrapper
tabebot
tabelog crawler
tsukurepo-predictor
predict COOKPAD's Tsukurepo count
mrorii's Repositories
mrorii/ramenbot
Crawler for Ramen Database
mrorii/RedPajama-Data
mrorii/airflow
Airflow is a system to programmatically author, schedule and monitor data pipelines.
mrorii/cc_net
Tools to download and cleanup Common Crawl data
mrorii/data-juicer
A one-stop data processing system to make data higher-quality, juicier, and more digestible for LLMs! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大语言模型提供更高质量、更丰富、更易”消化“的数据!
mrorii/data-preparation
Code used for sourcing and cleaning the BigScience ROOTS corpus
mrorii/django-sandbox
mrorii/do-not-answer
Do-Not-Answer: A Dataset for Evaluating Safeguards in LLMs
mrorii/doremi
Pytorch implementation of DoReMi, a method for optimizing the data mixture weights in language modeling datasets
mrorii/dotfiles
my dotfiles
mrorii/dps
Data processing system for polyglot
mrorii/evals
Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.
mrorii/FastChat
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
mrorii/GroundingDINO
Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
mrorii/gtalkbot
GIRL'S TALK crawler
mrorii/instructor
structured outputs for llms
mrorii/japanese-llm-ranking
mrorii/java-sandbox
mrorii/llm-jp
Project of llm evaluation to Japanese tasks
mrorii/marker
Convert PDF to markdown quickly with high accuracy
mrorii/mecab-ipadic-neologd
Neologism dictionary based on the language resources on the Web for mecab-ipadic
mrorii/micrometer
An application metrics facade for the most popular monitoring tools. Think SLF4J, but for metrics.
mrorii/open-instruct
mrorii/presto
Distributed SQL query engine for big data
mrorii/reactor-core
Non-Blocking Reactive Foundation for the JVM
mrorii/scrapy
Scrapy, a fast high-level web crawling & scraping framework for Python.
mrorii/scrapy-deltafetch
Scrapy spider middleware to ignore requests to pages containing items seen in previous crawls
mrorii/smart_open
Utils for streaming large files (S3, HDFS, gzip, bz2...)
mrorii/wikipedia-utils
Utility scripts for preprocessing Wikipedia texts for NLP
mrorii/WizardLM
LLMs build upon Evol Insturct: WizardLM, WizardCoder, WizardMath