huggingface/datasets
🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
PythonApache-2.0
Issues
- 0
Dataset viewer displays wrong statists
#7289 opened by speedcell4 - 0
Support for identifier-based automated split construction
#7287 opened by alex-hh - 3
- 0
- 2
- 0
- 1
Memory leak when streaming
#7269 opened by Jourdelune - 1
File not found error
#7281 opened by MichielBbal - 0
- 1
- 0
load_dataset
#7275 opened by santiagobp99 - 1
load_from_disk
#7268 opened by ghaith-mq - 1
- 0
Cannot load the cache when mapping the dataset
#7261 opened by zhangn77 - 0
cache can't cleaned or disabled
#7260 opened by charliedream1 - 1
mismatch for datatypes when providing `Features` with `Array2D` and user specified `dtype` and using with_format("numpy")
#7254 opened by Akhil-CM - 0
- 9
`push_to_hub` overwrite argument
#7241 opened by ceferisbarov - 2
ModuleNotFoundError: No module named 'datasets.tasks'
#7248 opened by shoowadoo - 0
How to debugging
#7249 opened by ShDdu - 0
- 1
- 5
- 0
- 1
- 0
- 0
Composite (multi-column) features
#7228 opened by alex-hh - 0
- 0
Huggingface GIT returns null as Content-Type instead of application/x-git-receive-pack-result
#7225 opened by padmalcom - 1
- 0
Fallback to arrow defaults when loading dataset with custom features that aren't registered locally
#7223 opened by alex-hh - 0
- 0
Iterable dataset map with explicit features causes slowdown for Sequence features
#7215 opened by alex-hh - 0
Add with_rank to Dataset.from_generator
#7213 opened by muthissar - 0
- 1
Datasets conflicts with fsspec 2024.9
#7190 opened by cw-igormorgado - 0
Describe only selected fields in README
#7211 opened by alozowski - 1
Iterable dataset.filter should not override features
#7208 opened by alex-hh - 0
- 0
`from_parquet` return type annotation
#7202 opened by saiden89 - 0
`load_dataset()` of images from a single directory where `train.png` image exists
#7201 opened by SagiPolaczek - 1
ConnectionError: Couldn't reach 'allenai/c4' on the Hub (ConnectionError)数据集下不下来,怎么回事
#7197 opened by Mrgengli - 3
Add support for 3D datasets
#7195 opened by severo - 0
concatenate_datasets does not preserve shuffling state
#7196 opened by alex-hh - 2
Add repeat() for iterable datasets
#7192 opened by alex-hh - 2
- 1
- 0
- 0
pinning `dill<0.3.9` without pinning `multiprocess`
#7186 opened by shubhbapna - 0