For my MSc dissertation, I explored if deep learning models could replicate traditional Value-at-Risk (VaR) calculations within the UK property industry.
Specifically, I used UK House Price Index data to calculate VaR using a traditional method (Filtered Historical Simulation VaR using GJR-GARCH), with which I then trained Artificial Neural (ANN), Recurrent Neural Networks (RNN) and an RNN with an LSTM layer. The below shows results of a predicted VaR (95% CI) compared against the traditionally calculated VaR and the actual returns, indicating breaches of the VaR for both actual and predicted.
Below shows the general structure of the dataset with the columns we used in this research, with examples from the East Midlands region.
Date | RegionName | AveragePrice | DetachedPrice | SemiDetachedPrice | TerracedPrice | FlatPrice |
---|---|---|---|---|---|---|
01/01/1995 | East Midlands | 45544.52 | 68923.94 | 41227.5 | 32870.49 | 30954.76 |
01/02/1995 | East Midlands | 46051.57 | 68634.75 | 42051.34 | 33423.75 | 31600.06 |
01/03/1995 | East Midlands | 45383.82 | 67658.6 | 41388.96 | 33005.72 | 30958.9 |
This plot demonstrates the difference in average detached property prices across different regions in the UK. It is clear how the regions have a drastic impact on the prices of properties, with London average prices (green) over double the national average (red).
For VaR calculations we need the returns from period to period, in this case the return from one month to the next in the HPI. Hence, the below plot shows the returns for each region for the average property type as an example. The differences in the volatility clustering and distributions is clear - for this project scope we used a subset of these regions, namely London, South West, West Midlands and Yorkshire and The Humber as well as the English national averages. This was to represent England fairly both geographically and statistically whilst making the work managable in terms of model training times. The results need to be considered in this context, with possibly improved results when using the full dataset.
To optimise the models for each region/property Type combination, we carried out a random search cross-validation process, optimising for the below hyperparameters.
Hyperparameter | Values |
---|---|
Hidden nodes | [32, 64, 128] |
Learning rate | [0.01, 0.001] |
No. of hidden layers | [3, 4] |
Drop out rate | [0.05, 0.1, 0.2] |
The epochs remained the same for each at 200, however early stopping was implemented as to reduce the computation time where possible, which can be seen in the below plot which only reached 124 epochs before the loss converged on an approximate optimum. The scaling of the data was also done within the ML modelling pipeline, using StandardScaler which transforms the data to have amean of 0 and a standard deviation of 1.
The results of the modelling demonstrated varying results across each region and property type. The best RMSE acheived for each of these models can be seen in the table below. The closest replication of the VaR calculation achieved was for the average property type in London with an RMSE of 0.277. London in general provided the closest predictions to the VaR actual calculations with a mean RMSE of 0.3381.
region | average | detached | flat | semi_detached | terraced |
---|---|---|---|---|---|
England | 0.3273 | 0.3700 | 0.5400 | 0.4043 | 0.4350 |
London | 0.2770 | 0.3973 | 0.3680 | 0.2977 | 0.3503 |
South West | 0.4693 | 0.4833 | 0.4653 | 0.4150 | 0.3650 |
West Midlands | 0.8390 | 0.4957 | 0.4860 | 0.7590 | 0.6337 |
Yorkshire and The Humber | 0.4313 | 0.3943 | 0.5493 | 0.4933 | 0.4113 |
To quantify the usability of the VaR models, we use the Kupiec test to backtest the actual VaR figures and the predicted ones. To summarise, we are looking for a value > 0.05 to support the models usability. In this case, the actual number of exceptions is roughly consistent with what would be expected at the 95% confidence level. However, it's essential to note that a p-value greater than 0.05 doesn't necessarily mean the model is perfect. It simply indicates that it's within an acceptable range of accuracy. With this being said, the average performance across each regions NN models produced an acceptable model - however the actual VaR calculations using Filtered Historical Simulation produces unsatisfactory models on average for London and West Midlands.
region | act_pval | pred_pval |
---|---|---|
England | 0.1840 | 1.8118 |
London | 0.0045 | 0.0678 |
South West | 0.5875 | 0.1816 |
West Midlands | 0.0088 | 1.8087 |
Yorkshire and The Humber | 0.4893 | 0.3903 |