cut( x, bins, right=True, labels=None, retbins=False, precision=3, include_lowest=False,
count()[‘total_point’]等頻分箱df[‘point_bins_f’]=pd
append(PSI_cal(5,X,y,cats))可以看到,聚類分箱的方法相對於有監督分箱的結果要差很多,但是穩定性確實很高,並且聚類的蔟越多,iv值越高,這個後續有空考慮放進去試試quantile 等寬mergeiv=[]PSIs=[
hist(x,bins=None,range=None,density=False,weights=None,cumulative=False,bottom=None,histtype=‘bar’,align=‘mid’,orientati
plot_roc(vali_y, vali_proba_df,plot_micro=False,figsize=(6,6),plot_macro=False)def plot_model_ks(y_label, y_pred):“”“繪製k
strftime(“%Y-%m-%d”))dt_train = df[(df[split_date] >= strat_day)&(df[split_date] < apply_day)]df_test = df[(df
分段:KBinsDiscretizer(n_bins=5, encode=‘onehot’, strategy=‘quantile’)需要三個引數n_bins, encode, strategyn_bins:分段的數量encode:編碼的方
cut(d_cut[‘number’], 4, labels=False)d_cut我們可以看到, 上面的cut_group的標籤由開閉區間改變成了數字