Code embeddings
Opened this issue · 7 comments
Is there any information if this is also recommended for extracting embeddings from code snippets? In particular Javascipt and Solidity?
Hi @rragundez , Maybe you can have a try to WhereIsAI/UAE-Code-Large-V1
. It was trained using the github-issue-similarity dataset, which contains some javascript code.
angle = AnglE.from_pretrained('WhereIsAI/UAE-Code-Large-V1').cuda()
angle.encode("YOUR CODE")
Let me try it and I'll comment back here the results
It did work but results over solidity code is not very good. thanks.
I am going to try with LLM trained on SOlidity code, but it has GGUF files, how would I use those in this library? for example:
maybe you can use its base model: https://huggingface.co/andrijdavid/Solidity-Llama3-8b
Would this work out of the box just putting the model name as the argument?
Yes. For LLM inference, you can check it document: https://angle.readthedocs.io/en/latest/notes/quickstart.html#infer-llm-based-models
Since this model hasn't been trained on sentence embedding learning, it is recommended to use some prompts to improve performance. You can specify a prompt with angle.encode(..., prompt="Here is a prompt: {text}.")
.
Yes. For LLM inference, you can check it document: https://angle.readthedocs.io/en/latest/notes/quickstart.html#infer-llm-based-models
Since this model hasn't been trained on sentence embedding learning, it is recommended to use some prompts to improve performance. You can specify a prompt with
angle.encode(..., prompt="Here is a prompt: {text}.")
.
there is no need to specify a pretrained_lora_path
, just directly specify the model_name_or_path
to andrijdavid/Solidity-Llama3-8b