COM Kitchens: An Unedited Overhead-view Video Dataset as a Vision-Language Benchmark
Primary LanguagePythonMIT LicenseMIT