SRG Transcriptor
A transcript crawler, search engine and explorer for SRF news and talk shows. http://srf-transcriptor.herokuapp.com/
This was implemented at the SRG SSR Hackdays 2014 and is mostly a proof of concept.
Documentation of data formats used can be found in the wiki.
Run Main Server
npm install
npm start
This will start the server on localhost:3000
serving API endpoints for searching and recieving transcripts. Additionally angular/dist
is served statically.
Api Example: http://srf-transcriptor.herokuapp.com/search?q=Geri%20M%C3%BCller
Notes
The front end build is checked in for easy deploy to heroku of the whole application. Could be optimized in the future.
Develop Front End
npm install
npm start
cd angular/
npm install
bower install
grunt serve
The front end dev server will relay requests to the API endpoints to localhost:3000
via grunt-connect-proxy - make sure to also run the main server from the root directory.
Crawler
The crawler is part of the backend.
cd backend
grunt --help
Add a New Show
grunt add:show --id=3b016ffc-afa2-466d-a694-c48b7ffe1783
Fetch Data and Transcripts
This will fetch episode information and transcripts of all added shows.
grunt fetch:shows
grunt fetch:transcripts
Process Data for Delivery
grunt parse:transcripts
grunt parse:shows
Currently the processed data needs to be checked in for deployment.
Dependencies
- Node.js
- Grunt
- Bower
- Compass
Clips
There is an experimental algorithm included to compose short clips out of text.
Usage:
node backend/clip -m 'Krieg in eine Weile her und es wird Sie eine Weile nicht sehen können, den Fall von ihm zu bekommen, ist nicht ein Problem mit ihm für eine Weile her,'
A MP4 clip will be composed and saved to backend/clips
, alongside with a JSON file with the source meta data.
Above message is sourced from @lauraperrenoud tweet and translated to German with Google Translate.