facebookresearch/CodeGen

Code clustering

shaileshj2803 opened this issue · 1 comments

Can i use the DOBF model for code clustering to find similar code patterns? If so can you guide which model to begin with or if you have any examples?

brozi commented

Hi. You can use our released DOBF models to do that (those models are using the roberta tokenizer and architecture). You can start with the DOBF + DAE version for instance. I would expect it to give similar representations to code with similar semantics (since they would be likely to have similar variable names). https://github.com/facebookresearch/CodeGen/blob/main/docs/dobf.md#release