Enviroment: Python 3 Jupyter Notebook.
Part I - Web crawler
Primary task: Write a script that parses the HTML files in the HTML data
directory, Extracts the artist
, works
, currency
, price amount
and outputs to stdout
Output format: A JSON array of objects
Part II - Predictive Model
Primary task: Train a machine learning model that predicts the price of a work of art given its 19 variables, including artist_name
, auction_date
, location
, size(depth
, height
, width
), etc.
Target variable: hammer_price
Metric: Root mean squared error RMSE
Final file: "model.py", containing an importable predict
function.