/feature_selection

feature selection by using random forest.

Primary LanguagePython

Feature_selection

use random forest to find important feature.

1. feature importance ranking:

  • By Feature importance ranking top10

    1. TB_9a
    1. TB_a9
    1. Img0.1
    1. ent_p_5
    1. TB_b1
    1. TB_71
    1. ent_p_8
    1. TB_ce
    1. GetStringTypeA
    1. ExitProcess

2. Useless feature ranking:

  • Featurn ranking list Top10 useless feature:

  • GetLastActivePopup
  • ImageList_Add
  • GlobalDeleteAtom
  • IsBadReadPtr
  • SelectPalette
  • GetMenuState
  • ExitThread
  • AdjustWindowRectEx
  • GetEnvironmentVariableA
  • SHGetFileInfoA

3. decision method

  • 使用 sklearn 裡面的 RandonforestClassifier,取前10為 important features,後10為useless features。
  • useless features 只代表相對不佳,不代表這些 useless features 對分類完全沒用,important features 亦同。

4. packages

  • sklearn package 中的 RandomForestClassifier 做 feature selection
  • pandas,numpy 處理資料
  • matplotlib.pyplot 畫圖

5. suggestion