SeeHow: Workflow Extraction from Programming Screencasts through Action-Aware Video Analytics

This is source code for SeeHow.

We implement a TOOL

Dataset

You can download the dataset Here or process your own data using the code, please follow the instructions.

If you use our dataset, you can run python3 extract_workflow.py directly to extract workflow.

Data processing

Note: You need to change your own path for each process.

Decode videos to frames by python3 clip_video_ffmpeg.py. This step uses ffmpeg. You can install it by apt-get install ffmpeg. We also process captions in this step. But it's an option for you.
Action region detection by python3 compare_image.py.
Action category classification by python3 extract_action.py. This step uses ActionNet. Please refere to ActionNet for details. The model can be downloaded Here
Text detection by EAST and text recognition by CRNN. You can refer to the two methods to process your own data. You can refer to connect_box.py to connect word level boxes to text lines.
Extrac workflow by python3 extract_workflow.py. We store the results in mysql database. You can build a database to store the results or use any other forms.

Additional process

We also consider captions in this work, which is used to generate text summary to describe the coding steps. Please refer to our TOOL for the results.

How to use CRNN

Download the CRNN project Here
To build the project, first install the latest versions of Torch7, fblualib and LMDB. Please follow their installation instructions respectively. On Ubuntu, lmdb can be installed by apt-get install liblmdb-dev.
To build the project, go to src/ and execute sh build_cpp.sh to build the C++ code. If successful, a file named libcrnn.so should be produced in the src/ directory.
Run demo

A demo program can be found in src/demo.lua. Before running the demo, download a pretrained model from here. Put the downloaded model file crnn_demo_model.t7 into directory model/crnn_demo/. Then launch the demo by:

th demo.lua

The demo reads an example image and recognizes its text content.

How to process narrator

In order to obtain the text summary, you need to:

Parse vtt file, which has been done in clip_video_ffmpeg.py
Punctuation restoration by segment_punctuation.py, as the captions do not have any punctuation.
Caption group by next_sentence.py. This step groups related sentences together.
Caption summarization by summarize.py. This step summarize long sentences to short sentences. Please refer to This method.

Figures in paper

We provide the high resolution figures for the paper.


Fig. 1: An Example of Programming Workflow


Fig. 2: Main Steps of Our Approach


Fig. 3: An example of scroll-content in between coding actions


Fig. 4: Example of text line detection


Fig. 5: Illustration of coding step identification


Fig. 6: An example of locating active text line(s)


Fig. 7: Distribution of coding-step length and total duration


Fig. 8: Coding-step distribution at different IoU and timeoffset thresholds


Fig. 9: Per-video F1 distribution at different IoU and timeoffset thresholds


Fig. 10: Distribution of videos in different F1 ranges for each playlist


Fig. 11: Correctness ratings by human evaluation