Movie Trivia Data Sources?

Question

Movie Trivia Data Sources?

mkemka opened this issue 2 years ago · 1 comments

First of all thank you so much for making this available. I don't want to take anything away from this extraordinary technical work but I am curious about the source data. Is there any way I can easily track exactly where in the model this type of answer is getting generated from and ideally - which scientific paper(s) have these references?

prompt = "Question: What is the best job for a man?"

Context: The film begins with a man (played by Tom Cruise) who is working as a waiter in a restaurant. 
He is a good-natured, but somewhat naive, man who is trying to make ends meet. 
He is also a good friend of the owner of the restaurant, played by Robert De Niro. 
The owner is a very wealthy man who is trying to make a fortune. 
He is also a very good friend of the owner's wife, played by Kate Winslet. 
The owner's wife is a very beautiful woman who is trying to make a fortune. 
She is also a very good friend of the owner's son, played by Jack Black. 
The owner's son is a very good-looking, but somewhat naive, man who is trying to make ends meet. 
He is also a very good friend of the owner's daughter, played by Emma Thompson. 
The owner's daughter is a very beautiful woman who is trying to make a fortune. 
She is also a very good friend of the owner's son.

Answer: waiter

edit: added new line characters to split the response

Answer 1 · 2023-01-05T15:45:08.000Z

Hi Martin, please have a look at the Galactica paper, especially appendix sections A.1 and A.6. Model outputs can combine multiple data sources in a non-trivial way so it's not always possible to track them back. The NatureBook corpus includes English wikipedia, which contains lots of movie plots and biographies of actors, which might partially explain the above non-scientific generation.