在建模的时候,我们对会单个变量的预测能力进行预测,主要使用IV值这个指标,IV值的预测能力如下:
IV<=0.02 : 无预测能力;
0.02 - 0.1 :弱预测能力;
0.1 - 0.3 :中预测能力;
0.3 - 0.5 :强预测能力;
大于0.5的为超强预测能力;
IV值的计算公式:
Pgood_section_total:分箱好用户与整体好用户比值;
Pbad_section_total:分箱坏用户与整体坏用户比值;
Pgood_section:分箱好用户与该分箱整体比值;
Pbad_section:分箱坏用户与该分箱整体比值;
Pgood_total:好用户与整体占比;
Pbad_total:好用户与整体占比;
Python实现逻辑:
d1=pd.DataFrame({'X':data['type'],'Y':data['SeriousDlqin2yrs']})d2 = d1.groupby(['X'],as_index=True)#单个特征个数good=data['SeriousDlqin2yrs'].sum()bad=data['SeriousDlqin2yrs'].count()-goodd3 = pd.DataFrame(d2.X.count(),columns=['good_count'])d3['good_count'] = d2.sum().Yd3['total_count'] = d2.count().Yd3['rate'] = d3['good_count']/d3['total_count']d3['goodall_rate'] = d3['good_count']/goodd3['jzdwdm_rate'] = (d3['total_count'] - d3['good_count'])/badd3['woe'] = np.log((d3['rate']/(1-d3['rate']))/((good/(bad+good))/(bad/(bad+good)))) d3['IV'] = (d3['goodall_rate'] - d3['jzdwdm_rate'])*d3['woe']IV = d3['IV'].sum()print (d3)print ('IV=',IV)Excel实现逻辑:
excel具体公式在资源中进行下载: