rdnfn

ML researcher who likes to build software. PhD student in Cambridge.

University of Cambridge

rdnfn's Stars

jonathan-roberts1/GRAB
61
dmg-illc/JUDGE-BENCH
Language:Jupyter Notebook175
BenTenmann/bio-data-harmoniser
Automatically ingest and harmonise biological data from different sources.
Language:TypeScript21
google-deepmind/dangerous-capability-evaluations
Language:Python392
cambridgeltl/zepo
Fairer Preferences Elicit Improved Human-Aligned Large Language Model Judgments (Zhou et al.)
Language:Python8
confident-ai/deepeval
The LLM Evaluation Framework
Language:Python3k216
UKGovernmentBEIS/inspect_ai
Inspect: A framework for large language model evaluations
Language:Python53981
HannahKirk/prism-alignment
The Prism Alignment Project
Language:Jupyter Notebook321
bminixhofer/zett
Code for Zero-Shot Tokenizer Transfer
Language:Python1097
signalstickers/signalstickers
🖥📱 An unofficial gallery of stickers for Signal, the secure messenger!
Language:TypeScript332436
google-deepmind/long-form-factuality
Benchmarking long-form factuality in large language models. Original code for our paper "Long-form factuality in large language models".
Language:Python52455
HowieHwong/MetaTool
[ICLR 2024] MetaTool Benchmark for Large Language Models: Deciding Whether to Use Tools and Which to Use
Language:Python608
swe-bench/experiments
Open sourced predictions, execution logs, trajectories, and results from model inference + evaluation runs on the SWE-bench task.
Language:Shell7965
lm-sys/arena-hard-auto
Arena-Hard-Auto: An automatic LLM benchmark.
Language:Jupyter Notebook41553
thunlp/ChatEval
Codes for our paper "ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate"
Language:Python22114
meta-llama/llama3
The official Meta Llama 3 GitHub site
Language:Python26.1k2.9k
s-orellana/UKB_CM_Brain
Publication code
Language:R1
princeton-nlp/SWE-agent
SWE-agent takes a GitHub issue and tries to automatically fix it, using GPT-4, or your LM of choice. It solves 12.47% of bugs in the SWE-bench evaluation set and takes just 1 minute to run.
Language:Python13.3k1.3k
tatsu-lab/alpaca_eval
An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.
Language:Jupyter Notebook1.4k224
getcursor/cursor
The AI Code Editor
23.1k1.5k
allenai/WildBench
Benchmarking LLMs with Challenging Tasks from Real Users
Language:Python18027
huggingface/lighteval
LightEval is a lightweight LLM evaluation suite that Hugging Face has been using internally with the recently released LLM data processing library datatrove and LLM training library nanotron.
Language:Python64971
justinchiu/openlogprobs
Extract full next-token probabilities via language model APIs
Language:Python22614
killiansheriff/LovelyPlots
Matplotlib style sheets to nicely format figures for scientific papers, thesis and presentations while keeping them fully editable in Adobe Illustrator.
Language:Python83533
segment-any-text/wtpsplit
Toolkit to segment text into sentences or other semantic units in a robust, efficient and adaptable way.
Language:Python67739
HeyPuter/puter
🌐 The Internet OS! Free, Open-Source, and Self-Hostable.
Language:JavaScript24.7k1.6k
mlcommons/modelgauge
Make it easy to automatically and uniformly measure the behavior of many AI Systems.
Language:Python257
marqo-ai/marqo
Unified embedding generation and search engine. Also available on cloud - cloud.marqo.ai
Language:Python4.5k184
beartype/beartype
Unbearably fast near-real-time hybrid runtime-static type-checking in pure Python.
Language:Python2.6k55
citadel-ai/langcheck
Simple, Pythonic building blocks to evaluate LLM applications.
Language:Python18316

rdnfn

rdnfn's Stars

jonathan-roberts1/GRAB

dmg-illc/JUDGE-BENCH

BenTenmann/bio-data-harmoniser

google-deepmind/dangerous-capability-evaluations

cambridgeltl/zepo

confident-ai/deepeval

UKGovernmentBEIS/inspect_ai

HannahKirk/prism-alignment

bminixhofer/zett

signalstickers/signalstickers

google-deepmind/long-form-factuality

HowieHwong/MetaTool

swe-bench/experiments

lm-sys/arena-hard-auto

thunlp/ChatEval

meta-llama/llama3

s-orellana/UKB_CM_Brain

princeton-nlp/SWE-agent

tatsu-lab/alpaca_eval

getcursor/cursor

allenai/WildBench

huggingface/lighteval

justinchiu/openlogprobs

killiansheriff/LovelyPlots

segment-any-text/wtpsplit

HeyPuter/puter

mlcommons/modelgauge

marqo-ai/marqo

beartype/beartype

citadel-ai/langcheck