Utilizing a transformer-based object detector for the task of 3D visual grounding.
Primary LanguagePython