Pinned Repositories
Crawl4LLM
Official repository for "Crawl4LLM: Efficient Web Crawling for LLM Pretraining"
ED-Copilot
embedding-scope
Interpret and control dense embedding via sparse autoencoder.
esae
FactMM-RAG
Official repository for FactMM-RAG: Fact-Aware Multimodal Retrieval Augmentation for Accurate Medical Radiology Report Generation [NAACL 2025]
InContextDataAttribution
LongEmbeddingAnalysis
MATES
Official repository for MATES: Model-Aware Data Selection for Efficient Pretraining with Data Influence Models [NeurIPS 2024]
Montessori-Instruct
Official repository for Montessori-Instruct: Generate Influential Training Data Tailored for Student Learning [ICLR 2025]
RAGViz
Official repository for RAGViz: Diagnose and Visualize Retrieval-Augmented Generation [EMNLP 2024]
cxcscmu's Repositories
cxcscmu/Crawl4LLM
Official repository for "Crawl4LLM: Efficient Web Crawling for LLM Pretraining"
cxcscmu/RAGViz
Official repository for RAGViz: Diagnose and Visualize Retrieval-Augmented Generation [EMNLP 2024]
cxcscmu/MATES
Official repository for MATES: Model-Aware Data Selection for Efficient Pretraining with Data Influence Models [NeurIPS 2024]
cxcscmu/Montessori-Instruct
Official repository for Montessori-Instruct: Generate Influential Training Data Tailored for Student Learning [ICLR 2025]
cxcscmu/ED-Copilot
cxcscmu/FactMM-RAG
Official repository for FactMM-RAG: Fact-Aware Multimodal Retrieval Augmentation for Accurate Medical Radiology Report Generation [NAACL 2025]
cxcscmu/embedding-scope
Interpret and control dense embedding via sparse autoencoder.
cxcscmu/LongEmbeddingAnalysis
cxcscmu/InContextDataAttribution
cxcscmu/esae