NVIDIA/NeMo-Curator

KeyError in Map Buckets

yyu22 opened this issue · 2 comments

Describe the bug

In map_buckets.py, text_field arg is not specified and the class _MapBuckets has text_field defaults to “text”. This will cause a KeyError if text_field is not "text"

Steps/Code to reproduce bug

    map_buckets = _MapBuckets(
        id_fields=["dataset_id", "doc_id"],
        bucket_field=input_bucket_field,
    )

Expected behavior
set the text_field to the input_text_field

    map_buckets = _MapBuckets(
        id_fields=["dataset_id", "doc_id"],
        bucket_field=input_bucket_field,
        text_field=input_text_field,
    )

Follow up with @yyu22 and @ryantwolf

yyu22 commented

fixed in PR #196