iNeil77's Stars
ibm-granite/dolomite-engine
A highly efficient library for large scale distributed training
bigcode-project/bigcodebench
BigCodeBench: The Next Generation of HumanEval
jzhang38/EasyContext
Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware.
RLHFlow/RLHF-Reward-Modeling
Recipes to train reward model for RLHF.
OpenRLHF/OpenRLHF
An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & Mixtral)
saltudelft/ml4se
A curated list of papers, theses, datasets, and tools related to the application of Machine Learning for Software Engineering
nomic-ai/contrastors
Train Models Contrastively in Pytorch
bitextor/neural-document-aligner
Document aligner which uses neural technologies to search matches across bilingual documents
UKPLab/acl2024-ircoder
Data creation, training and eval scripts for the IRCoder paper
davidfraser/pyan
pyan is a Python module that performs static analysis of Python code to determine a call dependency graph between functions and methods. This is different from running the code and seeing which functions are called and how often; there are various tools that will generate a call graph in that way, usually using debugger or profiling trace hooks - for example: https://pycallgraph.readthedocs.org/ This code was originally written by Edmund Horner, and then modified by Juha Jeronen. See README for the original blog posts and links to their repositories.
iNeil77/vllm-code-harness
Run code inference-only benchmarks quickly using vLLM
BK-SCOSS/sctokenizer
A Source Code Tokenizer
ASE-REEF/REEF-data
s2e-lab/SecurityEval
Repository for "SecurityEval Dataset: Mining Vulnerability Examples to Evaluate Machine Learning-Based Code Generation Techniques" published in MSR4P&S'22.
FormAI-Dataset/FormAI-dataset
OpenLMLab/MOSS-RLHF
MOSS-RLHF
theblackcat102/evol-dataset
evol augment any dataset online
codefuse-ai/Awesome-Code-LLM
A curated list of language modeling researches for code and related datasets.
NL2Code/NL2Code.github.io
Large Language Models Meet NL2Code: A Survey
tobymao/sqlglot
Python SQL Parser and Transpiler
FSoft-AI4Code/CodeText-parser
⚒️ Tree-sitter custom toolkit for extracting function and class from raw source file
stas00/ml-engineering
Machine Learning Engineering Open Book
mbasso/awesome-wasm
😎 Curated list of awesome things regarding WebAssembly (wasm) ecosystem.
Toloka/crowd-kit
Control the quality of your labeled data with the Python tools you already know.
dmarx/anthology-of-modern-ml
Collection of important articles to be treated as a textbook
TQRG/security-patches-dataset
☠️ Ground-truth dataset for vulnerability prediction (known research datasets and data sources included such as NVD, CVE Details and OSV); tools to automatically update the data are provided.
Jur1cek/gcj-dataset
Collected solutions from Google Code Jam programming competition (2008-2020).
arpitbbhayani/system-design-questions
Problem statements on System Design and Software Architecture as part of Arpit's System Design Masterclass
AgileRL/AgileRL
Streamlining reinforcement learning with RLOps. State-of-the-art RL algorithms and tools.
ManifoldRG/NEKO_Archive
The NEKO Project is an open source effort to build a model of equivalent scale and capability as that reported in DeepMind’s 2022 Paper, A Generalist Agent .