202-People-Multi-angle-Lip-Multimodal-Video-Data

Description

202 People - Multi-angle Lip Multimodal Video Data. The collection environments include indoor natural light scenes and indoor fluorescent lamp scenes. The device is cellphone. The diversity includes multiple scenes, different ages, 13 shooting angles. The language is Mandarin Chinese. The recording content is general field, unlimited content. The data can be used in multi-modal learning algorithms research in speech and image fields.

For more details, please refer to the link: https://www.nexdata.ai/datasets/1298?source=Github

Specifications

Data size

202 people, each person collects the audio and video data from 13 different angles +1 txt document

People distribution

race distribution: Asian (Indonesia), gender distribution: 89 males, 113 females, age distribution: 165 people aged 18-30, 32 people aged 31-45, and 5 people aged 46-60

Collecting environment

indoor natural light scenes, indoor fluorescent lamp scenes

Data diversity

including multiple scenes, different ages, different shooting angles

Device

cellphone, the resolution is 1,920*1,080

Collecting angle

audio and video data of front face, 3 angles left side face, 3 angles right side face, looking down, looking up, left side face down, right side face down, left side face up and right side face up all 13 different angles were collected at the same time

Recording content

general field, unlimited content

Language

Mandarin Chinese, each video is more than 20 seconds

Data format

the video data format is .mp4, the audio is greater than or equal to 16KHz, 16bit, the frame rate is 25-30 fps

Accuracy rata

the accuracy rate of sentence is more than 95%

Licensing Information

Commercial License

Nexdata-AI/202-People-Multi-angle-Lip-Multimodal-Video-Data