techandy42
🎓 CS @ U of Waterloo | 🤖 AI Student Researcher @ WAT.ai x hamming.ai | 🏆 4x Hackathon Winner | LLM Enjoyer
Waterloo Ontario, Canada
techandy42's Stars
QwenLM/Qwen2.5
Qwen2.5 is the large language model series developed by Qwen team, Alibaba Cloud.
Devinterview-io/llms-interview-questions
🟣 LLMs interview questions and answers to help you prepare for your next machine learning and data science interview in 2024.
booydar/babilong
BABILong is a benchmark for LLM evaluation using the needle-in-a-haystack approach.
waterhorse1/ChessGPT
(NeurIPS 2023) ChessGPT - Bridging Policy Learning and Language Modeling
Arize-ai/LLMTest_NeedleInAHaystack
Doing simple retrieval from LLM models at various context lengths to measure accuracy
infinigence/LVEval
Repository of LV-Eval Benchmark
HammingHQ/bug-in-the-code-stack
A new benchmark for measuring LLM's capability to detect bugs in large codebase.
techandy42/FinancialBERT
Stock price prediction model built using BERT and regression model trained on textual financial news data.
HammingHQ/hamming-examples
Various examples on how to use Hamming for evals + observability
nonsequitoria/simplekit
SimpleKit
techandy42/awesome-llm-metrics
An open-source framework that makes evaluating LLMs & prompt engineering x10 easier!
techandy42/bug_in_the_code_stack
A new benchmark for measuring LLM's capability to detect bugs in large codebase.
techandy42/CrafterGPT
Leveraging Language Model to Play Procedurally-Generated Survival Games.
techandy42/ExchangeAgent
Training a stock exchange agent with Reinforcement Learning algorithms and Decision Transformer.
techandy42/GreenTechGuardians
A Circular Economy business idea evaluator tool built using Gen-AI.
genai-genesis-2024/web-agent
bing1100/hamming_m3
m3 dataset with hamming
paulpark6/WildFire
SYS-NG/Goose_Guru_HTN2024
techandy42/babilong
BABILong is a benchmark for LLM evaluation using the needle-in-a-haystack approach.
techandy42/bug_in_the_code_stack_v2
Can LLMs find bugs that compilers can't?: A benchmark for measuring LLMs' capabilities in debugging large source code.
techandy42/Codegen_Challenge_Submission
A Python import visualization program.
techandy42/crafter
Benchmarking the Spectrum of Agent Capabilities
techandy42/debugger_llm
Open-source datasets & models for LLM Judges to find and describe bugs in LLM-generated code.
techandy42/eccc-hail-forecasting-project
Open-source ECCC repository for notebooks and documentations for the Hail Forecasting project by Hokyung (Andy) Lee.
techandy42/eccc-webcam-project
Open-source ECCC repository for notebooks and documentations for the Webcam project by Hokyung (Andy) Lee.
techandy42/LVEval
Repository of LV-Eval Benchmark
techandy42/racecar_gym
A gym environment for a miniature racecar using the pybullet physics engine.
techandy42/RagTagTeam
Startup co-founder matching platform built using Cohere for the WAT.AI RAG Challenge hackathon.
techandy42/rank_llm
Repository for prompt-decoding using LLMs (GPT3.5, GPT4, and Vicuna)