Pinned Repositories
best-of-n-sampling
Toy example for best-of-n-sampling
chatgpt
Analysis of OpenAI's ChatGPT
github-copilot
Analysis of the Github Copilot extension
LearningFromDemonstration
This project is a simplified implementation of the learning from demonstration algorithm developed by OpenAI.
MonteCarloTreeSearch
This project applies Monte Carlo Tree Search (MCTS) to a simple grid world.
PopulationBasedTraining
Asynchronous optimisation algorithm to optimise a population of models and their hyperparameters.
self-delusion
Delusions in sequence models for interaction and control
SwiftReinforce
Implementation of the Reinforce algorithm using Swift for Tensorflow.
TemporalDifferenceLearning
Temporal-difference learning is a method to compute the values of all states by sampling the environment. It approximates the current estimate of a state value based on previously learned estimates (bootstrapping).
tiny-chatgpt
Researching the reinforcement learning algorithm of ChatGPT
saschaschramm's Repositories
saschaschramm/github-copilot
Analysis of the Github Copilot extension
saschaschramm/chatgpt
Analysis of OpenAI's ChatGPT
saschaschramm/MonteCarloTreeSearch
This project applies Monte Carlo Tree Search (MCTS) to a simple grid world.
saschaschramm/SwiftReinforce
Implementation of the Reinforce algorithm using Swift for Tensorflow.
saschaschramm/best-of-n-sampling
Toy example for best-of-n-sampling
saschaschramm/PopulationBasedTraining
Asynchronous optimisation algorithm to optimise a population of models and their hyperparameters.
saschaschramm/LearningFromDemonstration
This project is a simplified implementation of the learning from demonstration algorithm developed by OpenAI.
saschaschramm/TemporalDifferenceLearning
Temporal-difference learning is a method to compute the values of all states by sampling the environment. It approximates the current estimate of a state value based on previously learned estimates (bootstrapping).
saschaschramm/tiny-chatgpt
Researching the reinforcement learning algorithm of ChatGPT
saschaschramm/autopilot-for-code
ChatGPT can develop, set up, and run a complete web applications
saschaschramm/chatgpt-eval-plugin
Very simple example of a ChatGPT plugin
saschaschramm/diff-gpt
Incremental algorithm for program synthesis
saschaschramm/MoveToBeacon
Application of Reinforcement Learning on StarCraft.
saschaschramm/Pong
Application of different Reinforcement Learning algorithms on the Atari game Pong.
saschaschramm/sc2-evals
Evaluation of GPT-4 on StarCraft II
saschaschramm/slowloris
saschaschramm/self-delusion
Delusions in sequence models for interaction and control
saschaschramm/A2C
Synchronous implementation of the A3C algorithm.
saschaschramm/codex
Evaluating the Codex language model from OpenAI
saschaschramm/language-models
Language models
saschaschramm/LSTM
Shows how the BasicLSTMCell is implemented internally in Tensorflow.
saschaschramm/mabuc
Bandits with unobserved confounders
saschaschramm/mlflow
Open source platform for the machine learning lifecycle
saschaschramm/msteams-tts
Text-to-Speech for Microsoft Teams
saschaschramm/MultiArmedBandits
Application of the stochastic gradient ascent algorithm on the multi-armed bandit problem.
saschaschramm/PolicyGradientMethods
Reinforcement learning methods that learn a parameterized policy. These methods learn by approximating the gradient of a performance measure with respect to its policy parameters.
saschaschramm/pysc2
StarCraft II Learning Environment
saschaschramm/QLearning
Implementation of the Q-Learning algorithm.
saschaschramm/ReinforcementLearningBasics
Basics of Reinforcement Learning.
saschaschramm/unobserved-confounders
Simple example of unobserved confounders and language models