Example time-series analysis portfolio, using Python (NumPy, Matplotlib)
We collect weather data from NOAA and use the integrated GHCN-Daily database of daily climate summaries from stations across the globe. This means looking at variables such as minimum and maximum temperatures, precipitations, snowfalls, and so on. We download a list of stations and use it to locate temperature data for different cities. We manage missing values and smooth time series in order to enhance the information out of the noise. Finally, we create some visualisations of daily temperatures.
By plotting longitude against latitude, we can get a feel of the global coverage of the database.
Global coverage of weather stations
Next, we should face the problem of making sense of missing values in the database. This is a common aspect in data analysis and actually we could just ignore them. If we do need an uninterrupted series of numbers, we could set the missing entries, for istance, to the average of the respective column. A more sophisticated approach to restore missing values is given by interpolation, which selects the "good" data points and returns estimated values for the missing ones, that are interpolated linearly by fitting segments between existing data points. Here we follow this approach, which is actually rather conservative, hence intrinsically safe.
A time series is a sequence of values organised chronologically, usually with equal cadence. Looking directly at the data from the series is informative, but one may see lots of noise in the form of rapid variations between one day and the next; this may result in covering up underlying trends. To limit the noise, one can smooth the data. The premise of data smoothing is that one is measuring a variable that is both slowly varying and also corrupted by random noise, so that smoothing (i) increases the signal-to-noise ratio and (ii) exposes the slow, long-term behaviour underneath the oscillations. We follow a simple, direct approach to smoothing by replacing each value with the average of a set of its neighbours. Indeed, since nearby points measure very nearly the same underlying value, averaging can reduce the level of noise without (much) biasing the value obtained. Here we use a so-called "box filter": a smoothing mask with positive, normalised entries that add up to 1. To this end, it proves useful to use cross-correlation.
We try this out, for instance, over multiple years for Roma Ciampino, an international airport just outside Rome, to check if the climate is stable. Quite so.
Climate in Rome for three different years
Climate in four US cities in 2019