YingfanWang/PaCMAP

Default n_neighbors parameter does not match documentation in README

Closed this issue · 1 comments

The documentation in README.md says the default number of neighbors scales up with the number of samples comprising the dataset.

n_neighbors: the number of neighbors considered in the k-Nearest Neighbor graph. Default to 10 for dataset whose sample size is smaller than 10000. For large dataset whose sample size (n) is larger than 10000, the default value is: 10 + 15 * (log10(n) - 4).

But this documentation is incorrect! While this behaviour can be enabled by supplying n_neighbors=None to PaCMAP, that is not the default value of the parameter and hence not the default behaviour. Instead, the default number of neighbors is always n_neighbors=10, irrespective of the size of the dataset.

If the automatic scaling behaviour should actually be the default behaviour, the default value for n_neighbors in the PaCMAP class should be changed to n_neighbors=None. On the otherhand, if the current behaviour (n_neighbors=10 for all dataset sizes) is desired, the README should be corrected to reflect the actual default behaviour, whilst still indicating that automatic scaling can optionally be enabled by explicitly supplying n_neighbors=None.

If the default parameter is changed to n_neighbors=None, I recommend this change is done with great care as it would constitute a breaking change to the API (all code which does not specify n_neighbors explicitly would change its behaviour with the new release). i.e. IMHO Such a change would need to be made in a new major release of pacmap, and be well-documented.

Additionally, I think it is confusing for new users that the example code-block in the README uses the n_neighbors=None behaviour, when this is not the default behaviour.

embedding = pacmap.PaCMAP(n_components=2, n_neighbors=None, MN_ratio=0.5, FP_ratio=2.0) 

Hi there, thank you for your interest in PaCMAP, and my sincerest apologies to any inconvenience you've experienced. The autoselection of n_neighbors feature was brought into PaCMAP in v0.2.0, and at that time we also set it as the default selection. This change was retracted only a few days later in v0.4.1, as we received complaints on the breaking change. Previously, the release notes was displayed at the very beginning of the README file, so the users will be notified of the change in the default value, but it was refactored into a standalone file in an update about one month ago, which leads to the situation. I'm so sorry for any problems you've experienced and I'm working on a hotfix to solve this discrepancy. The default value for n_neighbors will still be set to 10, and we will fix the documentation to ensure this is displayed properly.