/data_analysis

数据分析

Primary LanguagePython

data_analysis

1.scripts/stats.py(数据分析脚本)

####1) 使用方法
python scripts/stats.py   输入dataframe所在csv   输出dataframe所在csv   待分析特征变量

####2) 使用例子
python scripts/stats.py   'd:\age.csv'   'd:\output.csv'   'age'

####3) 参数解释
输入dataframe

UserIdage
02722683822
01712641534
04752633456
...
![列名解释](scripts/explain.png "列名解释")
输出dataframe
agecnt_reccnt_target %target%cnt_rec%cnt_target %cum_cnt_rec%cum_cnt_targetcnt_nontarget %cum_nontarget%cum_target-%cum_nontarget
181109.0 8.18%0.36%0.53% 0.36%0.53%101.0 0.35%0.18%
...

2.scripts/woe.py(计算woeiv脚本)

####1)使用方法
python scripts/woe.py   输入dataframe所在csv   待分析特征变量   分段表达式(用逗号连接) y变量

####2)使用例子
python scripts/woe.py   "age.csv" "age" "20,30,45" "is_dft"

####3)参数解释
输入dataframe

UserIdageis_dft
027226838221
027242238210
...

输出结果

class good bad %good %bad all woe iv
0 (0,20.0] 76 1519 4.76% 95.24% 1595 -6.34584 0.048765
1 (20,30] 895 17129 4.97% 95.03% 18024 -8.75549 0.561679
2 (30,45] 673 10021 6.29% 93.71% 10694 7.31007 0.372628
3 (45...) 75 869 7.94% 92.06% 944 2.57974 0.036832
4 NA 0 0 nan% nan% 0 NaN 0.000000
1688 28819 5.53% 94.47% 30507 1.019905