argilla-io/distilabel
Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.
PythonApache-2.0
Issues
- 2
[BUG] ❌ Failed to load step 'text_generation_0': Step load failed: No module named 'distilabel.models.openai'
#1130 opened by Tavish9 - 1
[BUG]maybe this error has to do with outlines?
#1133 opened by makrse - 0
[BUG] `TextGeneration` always process with fixed interval, not match the throughput of LLM
#1132 opened by observerw - 0
[BUG] Bad request: Not allowed to GET status/meta-llama/Llama-3.2-1B-Instruct for provider hf-inference
#1131 opened by ytan101 - 1
Failed to load all the steps. Could not run pipeline.
#1047 opened by yuqie - 0
[BUG] AttributeError in AzureOpenAILLM.load: Missing attribute openai in distilabel.models
#1125 opened by FarrelRamdhani - 2
[FEATURE] `sglang` integration
#1001 opened by gabrielmbmb - 2
- 0
- 0
[BUG] ['distilabel.pipeline'] ❌ Failed to load step 'exam_generation': Step load failed: 'InferenceClient' object has no attribute '_resolve_url'
#1117 opened by xenova - 0
- 0
- 1
[BUG] Patched async LiteLLM client for structured output generation is not callable
#1107 opened by rolshoven - 0
- 2
[BUG] GenerateSentencePair(...) always returns None positive and negative pairs
#1068 opened by caesar-one - 0
[FEATURE] Implement a rate limiter for API calls
#1058 opened by plaguss - 2
[FEATURE] Add support for Google Gemini API
#1005 opened by boapps - 0
[FEATURE] Integrate `llm-swarm`
#1002 opened by gabrielmbmb - 0
[FEATURE] Do not pass rows that contains `Step.inputs` with `None` values
#1035 opened by gabrielmbmb - 2
[BUG] Pipeline serialization/caching issue when including `RoutingBatchFunction`
#1070 opened by liamcripwell - 2
- 0
[FEATURE] Update to `outlines>0.1.0`
#1081 opened by gabrielmbmb - 2
[FEATURE] `mlx-lm` integration
#995 opened by gabrielmbmb - 2
[FEATURE] Trim inputs
#1030 opened by arthrod - 1
- 0
[FEATURE] Add pipeline to `DatasetCard` during `Distiset.push_to_hub`
#1071 opened by davidberenstein1957 - 3
Receiving error: The number of required GPUs exceeds the total number of available GPUs in the placement group
#1044 opened by saurabhbbjain - 0
[FEATURE] Update artifact upload to use `upload_large_folder` instead of `upload_folder`
#1067 opened by davidberenstein1957 - 0
[BUG] pipeline not-recoverable from cache
#1065 opened by davidberenstein1957 - 0
[DOCS] Update basic guides of steps and tasks
#1064 opened by plaguss - 1
[DOCS] The example on how to use a Step no longer works
#1056 opened by wwymak - 1
- 1
[BUG] OepnAI JSON format
#1048 opened by tinyrolls - 0
[FEATURE] Compute the input/output tokens of a dataset
#1046 opened by plaguss - 0
[FEATURE] Refactor `llms` and `embeddings` subpackages into `models` subpackage
#1040 opened by gabrielmbmb - 5
CUDA_VISIBLE_DEVICES does not work with distilabel code
#1042 opened by yuqie - 1
- 0
- 0
[FEATURE] `task` decorator
#1026 opened by gabrielmbmb - 0
[DOCS] Add example of text clustering pipeline
#979 opened by plaguss - 0
[BUG] Review integration test failing due to timeout
#990 opened by plaguss - 0
- 4
[BUG] Error when wrapping the step
#1020 opened by sdiazlor - 1
[BUG] Allow `TextClassification` to generate text without structured generation
#991 opened by plaguss - 0
[FEATURE] `ArgillaLabeller` `Task`
#985 opened by davidberenstein1957 - 0
- 2
[BUG] `TypeError: issubclass() arg 1 must be a class when using llm.generate() wih structured output
#1010 opened by javimosa - 0
[FEATURE] Add Self-Taught Evaluator
#1004 opened by Josephrp - 0
[DOCS] Add entry with developer documentation
#996 opened by plaguss - 1