Social Foundations of Computation

Max Planck Institute for Intelligent Systems, Tübingen

Germany

Pinned Repositories

benchbench
BenchBench is a Python package to evaluate multi-task benchmarks.
Language:Python13 2 01
causal-features
Code to reproduce the paper "Do causal predictors generalize better to new domains?"
Language:Python7 0 01
error-parity
Achieve error-rate fairness between societal groups for any score-based classifier.
Language:Python16 1 24
folktables
Datasets derived from US census data
Language:Python244 7 2118
folktexts
Get classification risk scores on tabular tasks using LLMs
Language:Jupyter Notebook15 4 30
lawma
Lawma: A lightly fine-tuned Llama model for legal classification tasks.
Language:Jupyter Notebook15 2 30
surveying-language-models
Code to reproduce the paper "Questioning the Survey Responses of Large Language Models"
Language:Jupyter Notebook8 3 01
training-on-the-test-task
Code to reproduce the experiments in the paper Training on the Test Task Confounds Evaluation and Emergence.
Language:Jupyter Notebook80
tttlm
Test-time-training on nearest neighbors for large language models
Language:Python32 2 14
whynot
A Python sandbox for decision making in dynamics
Language:Python418 44 1043

Social Foundations of Computation's Repositories

socialfoundations/whynot
A Python sandbox for decision making in dynamics
Language:Python418 44 1043
socialfoundations/folktables
Datasets derived from US census data
Language:Python244 7 2118
socialfoundations/tttlm
Test-time-training on nearest neighbors for large language models
Language:Python32 2 14
socialfoundations/error-parity
Achieve error-rate fairness between societal groups for any score-based classifier.
Language:Python16 1 24
socialfoundations/folktexts
Get classification risk scores on tabular tasks using LLMs
Language:Jupyter Notebook15 4 30
socialfoundations/lawma
Lawma: A lightly fine-tuned Llama model for legal classification tasks.
Language:Jupyter Notebook15 2 30
socialfoundations/benchbench
BenchBench is a Python package to evaluate multi-task benchmarks.
Language:Python13 2 01
socialfoundations/surveying-language-models
Code to reproduce the paper "Questioning the Survey Responses of Large Language Models"
Language:Jupyter Notebook8 3 01
socialfoundations/training-on-the-test-task
Code to reproduce the experiments in the paper Training on the Test Task Confounds Evaluation and Emergence.
Language:Jupyter Notebook80
socialfoundations/causal-features
Code to reproduce the paper "Do causal predictors generalize better to new domains?"
Language:Python7 0 01
socialfoundations/backward_baselines
Code for "Is your model predicting the past?"
Language:Jupyter Notebook1 1 00
socialfoundations/lm-evaluation-harness
A framework for few-shot evaluation of language models.
Language:Python1 0 0
socialfoundations/twitter-predictability
Language:Jupyter Notebook

Social Foundations of Computation

Pinned Repositories

benchbench

causal-features

error-parity

folktables

folktexts

lawma

surveying-language-models

training-on-the-test-task

tttlm

whynot

Social Foundations of Computation's Repositories

socialfoundations/whynot

socialfoundations/folktables

socialfoundations/tttlm

socialfoundations/error-parity

socialfoundations/folktexts

socialfoundations/lawma

socialfoundations/benchbench

socialfoundations/surveying-language-models

socialfoundations/training-on-the-test-task

socialfoundations/causal-features

socialfoundations/backward_baselines

socialfoundations/lm-evaluation-harness

socialfoundations/twitter-predictability