This is source code for SeeHow.
We implement a TOOL
You can download the dataset Here or process your own data using the code, please follow the instructions.
If you use our dataset, you can run python3 extract_workflow.py
directly to extract workflow.
Note: You need to change your own path for each process.
- Decode videos to frames by
python3 clip_video_ffmpeg.py
. This step uses ffmpeg. You can install it byapt-get install ffmpeg
. We also process captions in this step. But it's an option for you. - Action region detection by
python3 compare_image.py
. - Action category classification by
python3 extract_action.py
. This step uses ActionNet. Please refere to ActionNet for details. The model can be downloaded Here - Text detection by EAST and text recognition by CRNN. You can refer to the two methods to process your own data. You can refer to
connect_box.py
to connect word level boxes to text lines. - Extrac workflow by
python3 extract_workflow.py
. We store the results in mysql database. You can build a database to store the results or use any other forms.
We also consider captions in this work, which is used to generate text summary to describe the coding steps. Please refer to our TOOL for the results.
-
Download the CRNN project Here
-
To build the project, first install the latest versions of Torch7, fblualib and LMDB. Please follow their installation instructions respectively. On Ubuntu, lmdb can be installed by
apt-get install liblmdb-dev
. -
To build the project, go to
src/
and executesh build_cpp.sh
to build the C++ code. If successful, a file namedlibcrnn.so
should be produced in thesrc/
directory. -
Run demo
A demo program can be found in src/demo.lua
. Before running the demo, download a pretrained model from here. Put the downloaded model file crnn_demo_model.t7
into directory model/crnn_demo/
. Then launch the demo by:
th demo.lua
The demo reads an example image and recognizes its text content.
In order to obtain the text summary, you need to:
- Parse vtt file, which has been done in
clip_video_ffmpeg.py
- Punctuation restoration by
segment_punctuation.py
, as the captions do not have any punctuation. - Caption group by
next_sentence.py
. This step groups related sentences together. - Caption summarization by
summarize.py
. This step summarize long sentences to short sentences. Please refer to This method.
We provide the high resolution figures for the paper.
![]() |
---|
Fig. 1: An Example of Programming Workflow |
![]() |
---|
Fig. 2: Main Steps of Our Approach |
![]() |
---|
Fig. 3: An example of scroll-content in between coding actions |
![]() |
---|
Fig. 4: Example of text line detection |
![]() |
---|
Fig. 5: Illustration of coding step identification |
![]() |
---|
Fig. 6: An example of locating active text line(s) |
![]() |
---|
Fig. 7: Distribution of coding-step length and total duration |
![]() |
---|
Fig. 8: Coding-step distribution at different IoU and timeoffset thresholds |
![]() |
---|
Fig. 9: Per-video F1 distribution at different IoU and timeoffset thresholds |
![]() |
---|
Fig. 10: Distribution of videos in different F1 ranges for each playlist |
![]() |
---|
Fig. 11: Correctness ratings by human evaluation |