Here is the demo of the paper Shot Retrieval and Assembly with Text Script for Video Montage Generation. We provide two video montages generated by our proposed RATV here.
Video 1 click here to view the video
``Jungles are the richest places on Earth, because of one remarkable fact, they make their own weather. Every day, water rises from the surface of the leaves as vapour.''
Video 2 click here to view the video
``Spinners are the most vocal of all the dolphins. They use echolocation, a kind of sonar, to find their prey. Each hunter sends out a series of clicks, and then listens for returning echoes, allowing them to scan for distant prey, hundreds of metres away.''
Video 3 click here to view the video
``I go and lie down where the wood drake rests in his beauty on the water, and the great heron feeds. I come into the peace of wild things who do not tax their lives with forethought of grief. I come into the presence of still water. And I feel above me the day-blind stars waiting for their light.''
Video 4 click here to view the video
``We embark from Argentina to the antarctic, and have a wonderful journey in this winter vacation. When the ship travels on the vase sea, it rolls in the waves heavily, we thus have to stay in the room to have a rest. As the time time goes on, it becomes cold gradually, and we can see icebergs floating on the sea now. There are many penguins walking on the ice.''
Video 5 click here to view the video
``You raise me up so I can stand on mountains. You raise me up to walk on stormy seas.''
We release our constructed VSPD dataset, which contains 4365 script-video pairs that consist of 19613 shots in total. All shots and annotation files can be downloaded from the anoymous google driver. All released shot videos are resized to 480 * 270, which has no influence on the training and generation.