Pinned Repositories
bigcode-dataset
bigcode-tools
Set of tools to help working with "Big Code"
code-bert-score
CodeBERTScore: an automatic metric for code generation, based on BERTScore
code-transformer
Implementation of the paper "Language-agnostic representation learning of source code from structure and context".
code2seq
Code for the model presented in the paper: "code2seq: Generating Sequences from Structured Representations of Code"
code_tokenize
Fast tokenization and structural analysis of any programming language
CodeSearchNet
Datasets, tools, and benchmarks for representation learning of code.
CodeTF
CodeTF: One-stop Transformer Library for State-of-the-art Code LLM
CodeTrans
Pretrained Language Models for Source code
FineTuner
veerumehta's Repositories
veerumehta/FineTuner
veerumehta/bigcode-dataset
veerumehta/code-bert-score
CodeBERTScore: an automatic metric for code generation, based on BERTScore
veerumehta/code-transformer
Implementation of the paper "Language-agnostic representation learning of source code from structure and context".
veerumehta/code_tokenize
Fast tokenization and structural analysis of any programming language
veerumehta/CodeTF
CodeTF: One-stop Transformer Library for State-of-the-art Code LLM
veerumehta/developer
with 100k context windows on the way, it's now feasible for every dev to have their own smol developer
veerumehta/dmls-book
Summaries and resources for Designing Machine Learning Systems book (Chip Huyen, O'Reilly 2022)
veerumehta/ggml
Tensor library for machine learning
veerumehta/gpt-neox
An implementation of model parallel autoregressive transformers on GPUs, based on the DeepSpeed library.
veerumehta/graph4code
GraphGen4Code: a toolkit for creating code knowledge graphs based on WALA code analysis and extraction of documentation and forum content.
veerumehta/human-eval
Code for the paper "Evaluating Large Language Models Trained on Code"
veerumehta/jemma
JEMMA: An Extensible Java dataset for Many ML4Code Applications
veerumehta/langchain
⚡ Building applications with LLMs through composability ⚡
veerumehta/llm-numbers
Numbers every LLM developer should know
veerumehta/magicoder
Magicoder: Source Code Is All You Need
veerumehta/ml4code.github.io
Website for "A Survey of Machine Learning for Big Code and Naturalness"
veerumehta/multilang-code-datasets
Multiple language specific code datasets
veerumehta/odex
veerumehta/pseudo-code-instructions
Pseudo-code Instructions dataset
veerumehta/Replication-Package-ICSE-NIER-2023-Unsupervised-ML
veerumehta/scikit-llm
Seamlessly integrate powerful language models like ChatGPT into scikit-learn for enhanced text analysis tasks.
veerumehta/starcoder
Home of StarCoder: fine-tuning & inference!
veerumehta/StarCoderEx
Extension for using alternative GitHub Copilot (StarCoder API) in VSCode
veerumehta/summarization_tf
Implementation of "Automatic Source Code Summarization with Extended Tree-LSTM"
veerumehta/tree-sitter-parsers
veerumehta/TreeTransformer
veerumehta/unlimiformer
Public repo for the preprint "Unlimiformer: Long-Range Transformers with Unlimited Length Input"
veerumehta/XLCoST
Code and data for XLCoST: A Benchmark Dataset for Cross-lingual Code Intelligence
veerumehta/zep
Zep: A long-term memory store for LLM / Chatbot applications