info-theory

The purpose of this project is to build up a library to illustrate ideas from conversations with George Judge.

Examining entropy from a de-meaned series

The first application is to compare the distribution of the dynamic sequences between a raw time series and the demeaned time series. We generate the time series using the following function:

(defn mean-dgp
  [T]
  (let [e (s/sample-normal T)]
    (map (partial + 5) e)))

This will return a series of length T, mean 5, and error distributed standard normal. We then apply the permutation-count function to return a hash-map of the permutation sequences and their frequency. We do the same for the de-meaned version of the series. The following image displays the count histogram for each of the sequences for a series of length T=400, of which there are 24 for D = 4. The sequences are arbitrarily ordered.

The function to generate the time series and test for differences in the empirical distribution functions of the permutation counts is below.

(defn retrieve-diff
  "accepts the length of the permutation series, and the errors from
  the reference and new series; returns the the K-S test statistic
  associated with the comparison of the permutation entropy
  distributions associated with a time series of length T and the
  supplied D length."
  [D e-ref e-new]
  {:pre [= (count e-ref) (count e-new)]}
  (let [m-ref (permutation-count D e-ref)
        m-new (permutation-count D e-new)]
    (apply ks-stat
           (map empirical-dist (key-counts m-new m-ref)))))

(defn demean-illustration
  "compares the residuals from a series and the demeaned series"
  [D T]
  (let [y (mean-dgp T)]
    (retrieve-diff D y (demean y))))

For this application, the Kolomogorov-Smirnov test statistic is always 0, since the demeaning only shifts the time series, and does not change the sequencing of relative values. This can be seen in the following line graphs.

Linear model

Consider, now, a random variable generated by the linear model, which amounts to the linear combination of a constant, a single covariate x, and a random variable distributed standard normal.

(defn linear-dgp [T x]
  (let [e (s/sample-normal T)]
    (map (partial + 5) x e)))

Similar to the previous example, we can collect the Kolomogorov-Smirnov test, using the raw time series and the linear residuals as the base distributions.

(defn linear-illustration
  "compares the linear DGP with the residuals from a linear model"
  [D T]
  (let [x (s/sample-normal T :mean 3)
        y (linear-dgp T x)]
    (retrieve-diff D y (linear-residuals y x))))

We can run the same Kolomogorov-Smirnov repeatedly, collecting the test statistic and plotting the histogram for each run. The test statistic is distributed Kolomogorov, and well within the bounds of standard variation. We cannot reject the hypothesis that the two distibutions are the same. There is no dynamic pattern in the unexplained variation that cannot be explained by the linear regression - which makes sense, since we constructed it to be so.

(defn hist-diffstat
  "returns a histogram of the ks-stat for a MC-simulation, iterated B
  times."
  [B f D T]
  (let [dgp-fn (fn [x] (f D T))]
    (i/view (c/histogram (pmap dgp-fn (range B))
                         :nbins 20))))

Instrumental Variables

Now, we use the data generating process that is described in my section notes for the Berkeley applied econometrics sequence. We can get the same histograms, as above, for the IV model and the standard linear model, which has biased estimates. We see that the permutation entropy approach does not differentiate between the two models.

LINEAR MODEL

Distributed under the Eclipse Public License, the same as Clojure.

danhammer/info-theory

info-theory

Examining entropy from a de-meaned series

Linear model

Instrumental Variables