HQCM Dataset

Question

HQCM Dataset

Closed this issue a month ago · 1 comments

In the paper Understanding Code Changes Practically with Small-Scale Language Models (ASE 2024) and the presentation 蚂蚁CodeFuse 的应用实践：应用环境下的代码变更理解技术 (ChinaSoft2024), you mention the dataset HQCM, but I didn't find it in any of repositories of codefuse.
How can I access it or a part of it, please?

Contributions. Our major contributions are: • Dataset: We provide a small yet high-quality dataset called HQCM for augmenting the current era’s SLMs with an improved understanding of code changes. It is a dataset that has been reviewed, revised, and validated by human experts. • Models: We finetune SLMs Llama2-7b, CodeLlama-7b, and CCT5-220m using the HQCM dataset, all of which are confirmed to have competitive capabilities than state-of-the-art baselines and LLMs in change summarization, change classification, and code refinement. • Benchmark: We package the HQCM dataset with our scripts to finetune SLMs into the HQCM benchmark to facilitate future research in code change understanding: https://github.com/codefuse-ai/codefuse-hqcm.

Answer 1 · 2024-11-15T07:17:18.000Z

This paper comes from another team in the codefuse project. While we have notified them of your inquiry, you would probably get a faster response if you email the authors directly.