Video Segment Toolkit

Target: Split long videos into several segments according to input transcript of videos. Then we can label segments through labelling toolkit. We provide three segmentment approach.

type 1：
- 1 filter according to lens (gain video in [min_len, max_len])
- 2 filter according to how much frames in the video has faces (min_score)
type 2：
- 1 merge continuous video if they have the same person inside
- 2 filter according to how much frames in the video has faces (min_score)
- 3 filter according to lens (gain video in [min_len, max_len])
type 3：
- 1 filter according to lens (gain video in [min_len, max_len])(dlib is not needed)

The main pipeline is that:

1.split video into segments by corresponded transcript
2.select segments by three segmentment approach.(choose one approach)
3.generate selected segments

Install

dlib
tqdm
argparse
cv2
ffmpeg

Files

model/: save dlib_face_recognition_resnet_model_v1.dat and shape_predictor_68_face_landmarks.dat (which is utilized in dlib)
video/: save <video, trans> pairs. For example: <1911.Revolution.2011.BluRay.iPad.720p.AAC.2Audio.x264-HDSPad.ass, 1911.Revolution.2011.BluRay.iPad.720p.AAC.2Audio.x264-HDSPad.mp4>
dlib_utils.py: for all dlib related process
video_seg_lian.py : gain video segments from original transcript or selected transcripts
video_select.py: select video segments by different select methods(1,2,3)
run_all.sh: main process. Combine video_seg_lian.py and video_select.py together.

Input Data Format

All <video, transcript> pairs save in ./video
transcript mush in .ass format (you can convert into .ass through aegisub toolkit)
transcript mush must in utf8 encoding method. (you can convert into .ass through notepad)
video and transcript mush have the same name; There are no space in the name.

Main porcess

## origin datas：./video  <video, transscript> pairs
## middle folder: save to video_sub
## save to: ./video_sub_sub
sh run_all.sh

Video Segment

video_seg_lian.py: the main file

--data_root: input data root

--save_root: save generate data root

--max_len_one_video: -1: segment number is unlimited

# only extract 100 subvideo from original video
python video_seg_lian.py --data_root='./video' --save_root='./video_sub' --max_len_one_video=100

# gain all video from original video
python video_seg_lian.py --data_root='./video' --save_root='./video_sub' --max_len_one_video=-1

ffmpeg: video segment commond comparison

## the best choice
video_subpath = os.path.join(video_save_root, video_subname+'.mp4')
cmd = 'ffmpeg -i %s -acodec copy -ss %s -to %s %s' %(video_path, start, end, video_subpath)

## not clear
#video_subpath = os.path.join(video_save_root, video_subname+'.avi')
#cmd = 'ffmpeg -i %s -ss %s -to %s %s' %(video_path, start, end, video_subpath)

target format	-acodec	-vcodec	video_size
avi	None	None	408kb(not clear)
avi	None	yes	0
avi	yes	None	619kb
avi	yes	yes	255kb
mp4	None	None	0
mp4	None	yes	0
mp4	yes	None	853kb（clear）
mp4	yes	yes	593kb（clear, but begining is wrong）

Segment Selection

video_select.py : generate --gene_trans_file from original transcript.txt of each video

--data_root: data save root

--gene_trans_file: gene_trans_file save path

--select_type: selection type

--min_len and --max_len: only save video in such range

--min_score: min face rate scores

type 1：
- 1 filter according to lens (gain video in [min_len, max_len])
- 2 filter according to how much frames in the video has faces (min_score)

python video_select.py --data_root='./video_sub' --gene_trans_file='./video_sub/trans_gene.txt' --select_type=1 --min_len=1 --max_len=10 --min_score=0.5

type 2：
- 1 merge continuous video if they have the same person inside
- 2 filter according to how much frames in the video has faces (min_score)
- 3 filter according to lens (gain video in [min_len, max_len])

python video_select.py --data_root='./video_sub' --gene_trans_file='./video_sub/trans_gene.txt' --select_type=2 --min_len=1 --max_len=10 --min_score=0.5

type 3：
- 1 filter according to lens (gain video in [min_len, max_len])(dlib is not needed)

python video_select.py --data_root='./video_sub' --gene_trans_file='./video_sub/trans_gene.txt' --select_type=3 --min_len=1 --max_len=10

Gain new subvideos after selection

video_seg_lian.py: gain subvideo according to gene_trans_file