Video Segment Toolkit

Target: Split long videos into several segments according to input transcript of videos. Then we can label segments through labelling toolkit. We provide three segmentment approach.

  • type 1:

    • 1 filter according to lens (gain video in [min_len, max_len])
    • 2 filter according to how much frames in the video has faces (min_score)
  • type 2:

    • 1 merge continuous video if they have the same person inside
    • 2 filter according to how much frames in the video has faces (min_score)
    • 3 filter according to lens (gain video in [min_len, max_len])
  • type 3:

    • 1 filter according to lens (gain video in [min_len, max_len])(dlib is not needed)

The main pipeline is that:

  • 1.split video into segments by corresponded transcript
  • 2.select segments by three segmentment approach.(choose one approach)
  • 3.generate selected segments

Install

  • dlib
  • tqdm
  • argparse
  • cv2
  • ffmpeg

Files

  • model/: save dlib_face_recognition_resnet_model_v1.dat and shape_predictor_68_face_landmarks.dat (which is utilized in dlib)
  • video/: save <video, trans> pairs. For example: <1911.Revolution.2011.BluRay.iPad.720p.AAC.2Audio.x264-HDSPad.ass, 1911.Revolution.2011.BluRay.iPad.720p.AAC.2Audio.x264-HDSPad.mp4>
  • dlib_utils.py: for all dlib related process
  • video_seg_lian.py : gain video segments from original transcript or selected transcripts
  • video_select.py: select video segments by different select methods(1,2,3)
  • run_all.sh: main process. Combine video_seg_lian.py and video_select.py together.

Input Data Format

  • All <video, transcript> pairs save in ./video
  • transcript mush in .ass format (you can convert into .ass through aegisub toolkit)
  • transcript mush must in utf8 encoding method. (you can convert into .ass through notepad)
  • video and transcript mush have the same name; There are no space in the name.

Main porcess

## origin datas:./video  <video, transscript> pairs
## middle folder: save to video_sub
## save to: ./video_sub_sub
sh run_all.sh

Video Segment

video_seg_lian.py: the main file

--data_root: input data root

--save_root: save generate data root

--max_len_one_video: -1: segment number is unlimited

# only extract 100 subvideo from original video
python video_seg_lian.py --data_root='./video' --save_root='./video_sub' --max_len_one_video=100

# gain all video from original video
python video_seg_lian.py --data_root='./video' --save_root='./video_sub' --max_len_one_video=-1
  • ffmpeg: video segment commond comparison
## the best choice
video_subpath = os.path.join(video_save_root, video_subname+'.mp4')
cmd = 'ffmpeg -i %s -acodec copy -ss %s -to %s %s' %(video_path, start, end, video_subpath)

## not clear
#video_subpath = os.path.join(video_save_root, video_subname+'.avi')
#cmd = 'ffmpeg -i %s -ss %s -to %s %s' %(video_path, start, end, video_subpath)
target format -acodec -vcodec video_size
avi None None 408kb(not clear)
avi None yes 0
avi yes None 619kb
avi yes yes 255kb
mp4 None None 0
mp4 None yes 0
mp4 yes None 853kb(clear)
mp4 yes yes 593kb(clear, but begining is wrong)

Segment Selection

video_select.py : generate --gene_trans_file from original transcript.txt of each video

--data_root: data save root

--gene_trans_file: gene_trans_file save path

--select_type: selection type

--min_len and --max_len: only save video in such range

--min_score: min face rate scores

  • type 1:
    • 1 filter according to lens (gain video in [min_len, max_len])
    • 2 filter according to how much frames in the video has faces (min_score)
python video_select.py --data_root='./video_sub' --gene_trans_file='./video_sub/trans_gene.txt' --select_type=1 --min_len=1 --max_len=10 --min_score=0.5 
  • type 2:
    • 1 merge continuous video if they have the same person inside
    • 2 filter according to how much frames in the video has faces (min_score)
    • 3 filter according to lens (gain video in [min_len, max_len])
python video_select.py --data_root='./video_sub' --gene_trans_file='./video_sub/trans_gene.txt' --select_type=2 --min_len=1 --max_len=10 --min_score=0.5 
  • type 3:
    • 1 filter according to lens (gain video in [min_len, max_len])(dlib is not needed)
python video_select.py --data_root='./video_sub' --gene_trans_file='./video_sub/trans_gene.txt' --select_type=3 --min_len=1 --max_len=10

Gain new subvideos after selection

video_seg_lian.py: gain subvideo according to gene_trans_file

--data_root: input data root

--save_root: save generate data root

--max_len_one_video: -1: segment number is unlimited

--gene_trans_file: generate trans_file path

python video_seg_lian.py --data_root='./video' --save_root='./video_sub_sub' --gene_trans_file='./video_sub/trans_gene.txt' --max_len_one_video=-1

Labeling Toolkit for each segment (unfinished)