salesforce/CodeRL

CodeT5 input for APPS/MBPP problems

ysymyth opened this issue · 2 comments

Hi, I wonder what's the exact input formats for APPS/MBPP problems to be fed into CodeT5-large-ntp-py or CodeT5-finetuned_CodeRL? I tried """{Problem}""" but it doesn't work well, generating a lot of comments or natural language outputs.

Would appreciate an example for each dataset as they are not found in repo/paper. Thanks!

We followed the default formats that are used in these benchmarks. For APPS, the format is defined here in the code:

q_str = "\nQUESTION:\n" + q_str + "\n" + s_str + "\n" + answer_type + "\nANSWER:\n"

where q_str is the question description, s_str is the starter code (if any), answer_type is the type of problems in APPS (e.g. Call-based/ Standard-input).

For MBPP, please refer to the original paper for the input format.

We followed the default formats that are used in these benchmarks. For APPS, the format is defined here in the code:

q_str = "\nQUESTION:\n" + q_str + "\n" + s_str + "\n" + answer_type + "\nANSWER:\n"

where q_str is the question description, s_str is the starter code (if any), answer_type is the type of problems in APPS (e.g. Call-based/ Standard-input).

For MBPP, please refer to the original paper for the input format.

hi,where is the original paper (the MBPP).I want to test model in the mbpp,but i dont konw how to got it