How to treat large/complex systems
cwehmeyer opened this issue · 5 comments
We need to address how dealing with large/complex systems differs from the tutorial cases, e.g., using source instead of load, convergence issues, etc.
This should go into the manuscript, but I there are also notebooks where such explanations might be a good fit.
In the tutorial, we already have mentioned using source()
instead of load()
. So I suggest the following approach:
a) We add some citations about complex systems to the manuscript and mention that there are differences.
b) We add an example that explains what happens if we operate at the edge of poor sampling, i.e. partially converged ITS and CK-test that breaks down after a certain number of lags. I'm already trying to compile an example to for #140. That could go into NB08.
c) add a paragraph on the importance of dimension reduction before clustering and implications of density distributions for k-means in NB02
d) discuss ITS convergence in more detail in NB03 / di-ala section
That sounds very reasonable!
One point that I mentioned in the notebooks as well as in the manuscript concerning large systems it that clustering becomes difficult in high dimensional spaces. My question is if we need a citation for this or if we can just claim this as a part of our daily experience.
If I understand this correctly, a paper that seems to fit this purpose would be Aggarwal et al, 2001, "On the Surprising Behavior of Distance Metrics in High Dimensional Space".
I don't know if we need a citation - my comments were more to the point that the mentions of it becoming difficult at high dimensions don't make it clear why that's the case. i.e. is it because it's computationally difficult or because it's computationally fine but we are less confident in the model.
In one of my papers I showed that models consistently achieve better VAC/GMRQ scores for lower dimensional spaces; see here, sec. V C