Semantic Code & Doc Search
Experiments with NN Embeddings for code.
- Code search over intellij-community/platform using GPT-3 embeddings
- Documentation Q&A over intellij-community
-
Accuare the data from intellij-community Clone, parse and extract function declarations for Java and Kotlin
-
Get the embeddings Embed all the functions using
- OpenAI API for Embeddings
using .jsonl and
request_parallel_processor.py
- CodeGen running localy on GPU
Build an Index
- Annoy
- FAISS
- OpenAI API for Embeddings
using .jsonl and
-
Code clustering
-
Code Search Interactive queries over intellij-community
-
Documentation Q&A
-
Evaluation on CodeSearchNet Java
- Embed (OpenAI, CodeGen)
- Cluster
- Query rephrasing \w in-context learning (few-shot)
- Run evaluation (nDCG)
- IR baseline