nushib's Stars
allenai/noncompliance
This repository contains data, code and models for contextual noncompliance.
ruizheliUOA/Awesome-Interpretability-in-Large-Language-Models
This repository collects all relevant resources about interpretability in LLMs
mlcommons/modelgauge
Make it easy to automatically and uniformly measure the behavior of many AI Systems.
mlcommons/modelbench
Run safety benchmarks against AI models and view detailed reports showing how well they performed.
microsoft/mechanistic-error-probe
A mechanistic approach for understanding and detecting factual errors of large language models.
koulanurag/maze-world
Random maze environments with different size and complexity for reinforcement learning research.
microsoft/OptiGuide
Large Language Models for Supply Chain Optimization
harshakokel/PlanBench
An extensible benchmark for evaluating large language models on planning
BiDAlab/FairCVtest
FairCVtest: Testbed for Fair Automatic Recruitment and Multimodal Bias Analysis
microsoft/sammo
A library for prompt engineering and optimization (SAMMO = Structure-aware Multi-Objective Metaprompt Optimization)
microsoft/promptbase
All things prompt engineering
EdinburghNLP/awesome-hallucination-detection
List of papers on hallucination detection in LLMs.
microsoft/iglu-datasets
karthikv792/LLMs-Planning
An extensible benchmark for evaluating large language models on planning
microsoft/VISOR
microsoft/FLAML
A fast library for AutoML and tuning. Join our Discord: https://discord.gg/Cppx2vSPVP.
feldera/feldera
Feldera Continuous Analytics Platform
microsoft/greenlands
Platform to run interactive Reinforcement Learning agents in a Minecraft Server
fairlearn/fairlearn
A Python package to assess and improve fairness of machine learning models.
microsoft/responsible-ai-toolbox-tracker
A JupyterLab extension for tracking, managing, and comparing Responsible AI mitigations and experiments.
microsoft/vision-explanation-methods
Methods for creating saliency maps for computer vision models.
LeoGrin/tabular-benchmark
microsoft/responsible-ai-toolbox-mitigations
Python library for implementing Responsible AI mitigations.
HoloClean/holoclean
A Machine Learning System for Data Enrichment.
stanford-crfm/helm
Holistic Evaluation of Language Models (HELM), a framework to increase the transparency of language models (https://arxiv.org/abs/2211.09110). This framework is also used to evaluate text-to-image models in Holistic Evaluation of Text-to-Image Models (HEIM) (https://arxiv.org/abs/2311.04287).
edirgarcia/stable_diffusion_aml
A Repository to hold code to install a Stable Diffusion online managed endpoint on Azure Machine Learning
cvlab-columbia/CT4Recognition
jmchn1994/HINT
Code for HINT: Integration Testing for AI-based features with Humans in the Loop
microsoft/TOXIGEN
This repo contains the code for generating the ToxiGen dataset, published at ACL 2022.
microsoft/Exp-HAIC
Experimental Platform for Human AI Collaboration - Code for the paper "Who Goes First? Influences of Human-AI Workflow on Decision Making in Clinical Imaging" published at FAccT 2022