A systematic survey of multi-modal and multi-task visual understanding foundation models for driving scenarios