Pinned Repositories
beccabai
beccabai.github.io
Github Pages template for academic personal websites, forked from mmistakes/minimal-mistakes
Data-centric_multimodal_LLM
Survey on Data-centric Large Language Models
efficient-tlbo
Efficient transfer learning based bayesian optimization for Hyperparameter optimization
mindware
An efficient open-source AutoML system for automating machine learning lifecycle, including feature engineering, neural architecture search, and hyper-parameter tuning.
multi-agent-data-selection
This is the repo for the paper Multi-Agent Collaborative Data Selection for Efficient LLM Pretraining.
open-box
Generalized and Efficient Blackbox Optimization System.
ProX
Offical Repo for "Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scale"
Data-Centric_LLM_Studies
A list of papers about data quality in Large Language Models (LLMs)
RedPajama-Data
The RedPajama-Data repository contains code for preparing large datasets for training large language models.
beccabai's Repositories
beccabai/Data-centric_multimodal_LLM
Survey on Data-centric Large Language Models
beccabai/multi-agent-data-selection
This is the repo for the paper Multi-Agent Collaborative Data Selection for Efficient LLM Pretraining.
beccabai/beccabai
beccabai/beccabai.github.io
Github Pages template for academic personal websites, forked from mmistakes/minimal-mistakes
beccabai/efficient-tlbo
Efficient transfer learning based bayesian optimization for Hyperparameter optimization
beccabai/mindware
An efficient open-source AutoML system for automating machine learning lifecycle, including feature engineering, neural architecture search, and hyper-parameter tuning.
beccabai/open-box
Generalized and Efficient Blackbox Optimization System.
beccabai/ProX
Offical Repo for "Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scale"