ANZ Data Science Program
Task 1:
Load the transaction dataset below into an analysis tool of your choice (Excel, R, SAS, Tableau, or similar)
Start by doing some basic checks – are there any data issues? Does the data need to be cleaned?
Gather some interesting overall insights about the data. For example -- what is the average transaction amount? How many transactions do customers make each month, on average?
Segment the dataset by transaction date and time. Visualise transaction volume and spending over the course of an average day or week. Consider the effect of any outliers that may distort your analysis.
For a challenge – what insights can you draw from the location information provided in the dataset?
Put together 2-3 slides summarising your most interesting findings to ANZ management.
Task 2:
For this task, you’ll likely need to use statistical software such as R, SAS, or Python.
Using the same transaction dataset, identify the annual salary for each customer
Explore correlations between annual salary and various customer attributes (e.g. age). These attributes could be those that are readily available in the data (e.g. age) or those that you construct or derive yourself (e.g. those relating to purchasing behaviour). Visualise any interesting correlations using a scatter plot.
Build a simple regression model to predict the annual salary for each customer using the attributes you identified above
How accurate is your model? Should ANZ use it to segment customers (for whom it does not have this data) into income brackets for reporting purposes?
For a challenge: build a decision-tree based model to predict salary. Does it perform better? How would you accurately test the performance of this model?