nushib

nushib's Stars

allenai/noncompliance
This repository contains data, code and models for contextual noncompliance.
Language:Python141
ruizheliUOA/Awesome-Interpretability-in-Large-Language-Models
This repository collects all relevant resources about interpretability in LLMs
17614
mlcommons/modelgauge
Make it easy to automatically and uniformly measure the behavior of many AI Systems.
Language:Python255
mlcommons/modelbench
Run safety benchmarks against AI models and view detailed reports showing how well they performed.
Language:Python467
microsoft/mechanistic-error-probe
A mechanistic approach for understanding and detecting factual errors of large language models.
Language:Jupyter Notebook394
koulanurag/maze-world
Random maze environments with different size and complexity for reinforcement learning research.
Language:Python1
microsoft/OptiGuide
Large Language Models for Supply Chain Optimization
Language:Jupyter Notebook28043
harshakokel/PlanBench
An extensible benchmark for evaluating large language models on planning
Language:PDDL21
BiDAlab/FairCVtest
FairCVtest: Testbed for Fair Automatic Recruitment and Multimodal Bias Analysis
Language:Python153
microsoft/sammo
A library for prompt engineering and optimization (SAMMO = Structure-aware Multi-Objective Metaprompt Optimization)
Language:Python27519
microsoft/promptbase
All things prompt engineering
Language:Python5.3k291
EdinburghNLP/awesome-hallucination-detection
List of papers on hallucination detection in LLMs.
50537
microsoft/iglu-datasets
Language:Python362
karthikv792/LLMs-Planning
An extensible benchmark for evaluating large language models on planning
Language:PDDL22828
microsoft/VISOR
Language:HTML414
microsoft/FLAML
A fast library for AutoML and tuning. Join our Discord: https://discord.gg/Cppx2vSPVP.
Language:Jupyter Notebook3.8k499
feldera/feldera
Feldera Continuous Analytics Platform
Language:Rust32535
microsoft/greenlands
Platform to run interactive Reinforcement Learning agents in a Minecraft Server
Language:Python462
fairlearn/fairlearn
A Python package to assess and improve fairness of machine learning models.
Language:Python1.9k406
microsoft/responsible-ai-toolbox-tracker
A JupyterLab extension for tracking, managing, and comparing Responsible AI mitigations and experiments.
Language:TypeScript416
microsoft/vision-explanation-methods
Methods for creating saliency maps for computer vision models.
Language:Python3812
LeoGrin/tabular-benchmark
Language:Python43859
microsoft/responsible-ai-toolbox-mitigations
Python library for implementing Responsible AI mitigations.
Language:Jupyter Notebook546
HoloClean/holoclean
A Machine Learning System for Data Enrichment.
Language:Python516130
stanford-crfm/helm
Holistic Evaluation of Language Models (HELM), a framework to increase the transparency of language models (https://arxiv.org/abs/2211.09110). This framework is also used to evaluate text-to-image models in Holistic Evaluation of Text-to-Image Models (HEIM) (https://arxiv.org/abs/2311.04287).
Language:Python1.8k239
edirgarcia/stable_diffusion_aml
A Repository to hold code to install a Stable Diffusion online managed endpoint on Azure Machine Learning
Language:Jupyter Notebook3
cvlab-columbia/CT4Recognition
Language:Python218
jmchn1994/HINT
Code for HINT: Integration Testing for AI-based features with Humans in the Loop
Language:TypeScript3
microsoft/TOXIGEN
This repo contains the code for generating the ToxiGen dataset, published at ACL 2022.
Language:Jupyter Notebook26431
microsoft/Exp-HAIC
Experimental Platform for Human AI Collaboration - Code for the paper "Who Goes First? Influences of Human-AI Workflow on Decision Making in Clinical Imaging" published at FAccT 2022
Language:HTML82