/movies

data mining

Primary LanguagePython

course: ENGR 5775G Data Mining

assignment 1 : Data collection task

PART 1

automatically collect metadata of the 100 titles and store them in a CSV file

automated python script: movie_meta_info.py
results are stored in a csv file: movie_meta_info.csv

PART 2

search ,find, download the screenplay of each title and store each screenply as semi-structured data

automated python script: screenplay.py. The script privide two method to find the screenplay from two websites
semi-structued data is saved in file : scripts_simply.json (with method 1), this method need install webbrowser driver first, because the content in the website is generated by javascript.
semi-structued data is saved in file : scripts_daily.json (with method 2)
execute the method 'batch_download(filename)' in main.py will download all the movie's screenplays in the current directory. The screenplays are have three file types in terms of txt, html, pdf.