202 People - Multi-angle Lip Multimodal Video Data. The collection environments include indoor natural light scenes and indoor fluorescent lamp scenes. The device is cellphone. The diversity includes multiple scenes, different ages, 13 shooting angles. The language is Mandarin Chinese. The recording content is general field, unlimited content. The data can be used in multi-modal learning algorithms research in speech and image fields.
For more details, please refer to the link: https://www.nexdata.ai/datasets/1298?source=Github
202 people, each person collects the audio and video data from 13 different angles +1 txt document
race distribution: Asian (Indonesia), gender distribution: 89 males, 113 females, age distribution: 165 people aged 18-30, 32 people aged 31-45, and 5 people aged 46-60
indoor natural light scenes, indoor fluorescent lamp scenes
including multiple scenes, different ages, different shooting angles
cellphone, the resolution is 1,920*1,080
audio and video data of front face, 3 angles left side face, 3 angles right side face, looking down, looking up, left side face down, right side face down, left side face up and right side face up all 13 different angles were collected at the same time
general field, unlimited content
Mandarin Chinese, each video is more than 20 seconds
the video data format is .mp4, the audio is greater than or equal to 16KHz, 16bit, the frame rate is 25-30 fps
the accuracy rate of sentence is more than 95%
Commercial License