RATV Demo

Here is the demo of the paper Shot Retrieval and Assembly with Text Script for Video Montage Generation. We provide two video montages generated by our proposed RATV here.

Video 1 click here to view the video

``Jungles are the richest places on Earth, because of one remarkable fact, they make their own weather. Every day, water rises from the surface of the leaves as vapour.''

Video 2 click here to view the video

``Spinners are the most vocal of all the dolphins. They use echolocation, a kind of sonar, to find their prey. Each hunter sends out a series of clicks, and then listens for returning echoes, allowing them to scan for distant prey, hundreds of metres away.''

Video 3 click here to view the video

``I go and lie down where the wood drake rests in his beauty on the water, and the great heron feeds. I come into the peace of wild things who do not tax their lives with forethought of grief. I come into the presence of still water. And I feel above me the day-blind stars waiting for their light.''

Video 4 click here to view the video

``We embark from Argentina to the antarctic, and have a wonderful journey in this winter vacation. When the ship travels on the vase sea, it rolls in the waves heavily, we thus have to stay in the room to have a rest. As the time time goes on, it becomes cold gradually, and we can see icebergs floating on the sea now. There are many penguins walking on the ice.''

Video 5 click here to view the video

``You raise me up so I can stand on mountains. You raise me up to walk on stormy seas.''

VSPD Dataset

We release our constructed VSPD dataset, which contains 4365 script-video pairs that consist of 19613 shots in total. All shots and annotation files can be downloaded from the anoymous google driver. All released shot videos are resized to 480 * 270, which has no influence on the training and generation.