SmartCorrelatedSelection may not replicate the same result
WhyYouNeedMyUsername opened this issue · 1 comments
Hi,
I've found this issue when I'm doing experiment on some datasets.
Describe the bug
SmartCorrelatedSelection
may not replicate the same result for some datasets.
This is because set
is unordered data structure, and thus .add()
would not preserve the order.
When some features match the same score by selection method, the result would be different.
It happened only when I restart my development environment. (otherswise, the result might be the same.)
I have solved this by transforming set to list: _temp_set = list(set([feature]))
and replace .add(f2)
to _temp_set.append(f2)
in SmartCorrelatedSelection.py
To Reproduce
Steps to reproduce the behavior:
- Run
SmartCorrelatedSelection
on some dataset which has plenty correlated features.
(In my case, I assignselection_method="variance"
) - record
features_to_keep_
- restart your environment. (including
PyThon
) - Run again and you will see different
features_to_keep_
result
Expected behavior
Drop the same features when the given parameters is the same.
Desktop (please complete the following information):
NAME="CentOS Linux"
VERSION="8"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="8"
PLATFORM_ID="platform:el8"
PRETTY_NAME="CentOS Linux 8"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:8"
HOME_URL="https://centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"
CENTOS_MANTISBT_PROJECT="CentOS-8"
CENTOS_MANTISBT_PROJECT_VERSION="8"
Thank you for developing this wonderful tool! 🌟
@WhyYouNeedMyUsername thanks for raising this up!
I will look into it over Christmas :)
Feel free to make a PR otherwise with the suggested changes! That would be most welcome.