Code clustering
shaileshj2803 opened this issue · 1 comments
shaileshj2803 commented
Can i use the DOBF model for code clustering to find similar code patterns? If so can you guide which model to begin with or if you have any examples?
brozi commented
Hi. You can use our released DOBF models to do that (those models are using the roberta tokenizer and architecture). You can start with the DOBF + DAE version for instance. I would expect it to give similar representations to code with similar semantics (since they would be likely to have similar variable names). https://github.com/facebookresearch/CodeGen/blob/main/docs/dobf.md#release