机器学习模型调优,机器学习模型

文章目录 1. optuna简介2. LGBM和XGBoost调参汇总2.1 LGBM2.1.1 定义Objective2.1.2 调参try2.1.3 绘图2.1.4 最佳参数 2.2 XGBOOST2.2.1 定义Objectove2.2.2 调参try2.2.3 绘图2.2.4 最佳参数

1. optuna简介

在Kaggle比赛的过程中我发现了一个问题（大家的Kernel模型中包含了众多c超参数设置，但是这些参数是如何设置的呢？），并在Discussion中提出了我的问题，并得到了众多大佬的回答，如下：

关于回答我汇总后发现都提到了关于optuna库的使用，optuna是什么呢？optuna是一个使用python编写的超参数调节框架。一个极简的 optuna 的优化程序中只有三个最核心的概念，目标函数(objective)，单次试验(trial)，和研究(study). 其中 objective 负责定义待优化函数并指定参/超参数数范围，trial 对应着 objective 的单次执行，而 study 则负责管理优化，决定优化的方式，总试验的次数、试验结果的记录等功能。
下面举一个简单的栗子，有助于大家的理解：

定义 x , y ∈ ( − 10 , 10 ) x,yin(-10, 10) x,y∈(−10,10)，求 f ( x ) = ( x + y ) 2 f(x)=(x+y)^2 f(x)=(x+y)2取得最大值时， x , y x,y x,y的取值？

import optuna def objective(trial): x = trial.suggest_uniform('x', -10, 10) y = trial.suggest_uniform('y', -10, 10) return (x + y) ** 2 study = optuna.create_study(direction='maximize')study.optimize(objective, n_trials=100) print(study.best_params)print(study.best_value) 2. LGBM和XGBoost调参汇总 2.1 LGBM 2.1.1 定义Objective from lightgbm import LGBMRegressorimport optunafrom sklearn.metrics import mean_squared_errorfrom sklearn.model_selection import train_test_split, KFoldimport optuna.integration.lightgbm as oplgbdef objective(trial): X_train, X_test, y_train, y_test=train_test_split(data, target, train_size=0.3)# 数据集划分 param = { 'metric': 'rmse', 'random_state': 48, 'n_estimators': 20000, 'reg_alpha': trial.suggest_loguniform('reg_alpha', 1e-3, 10.0), 'reg_lambda': trial.suggest_loguniform('reg_lambda', 1e-3, 10.0), 'colsample_bytree': trial.suggest_categorical('colsample_bytree', [0.3,0.4,0.5,0.6,0.7,0.8,0.9, 1.0]), 'subsample': trial.suggest_categorical('subsample', [0.4,0.5,0.6,0.7,0.8,1.0]), 'learning_rate': trial.suggest_categorical('learning_rate', [0.006,0.008,0.01,0.014,0.017,0.02]), 'max_depth': trial.suggest_categorical('max_depth', [5, 7, 9, 11, 13, 15, 17, 20, 50]), 'num_leaves' : trial.suggest_int('num_leaves', 1, 1000), 'min_child_samples': trial.suggest_int('min_child_samples', 1, 300), 'cat_smooth' : trial.suggest_int('cat_smooth', 1, 100) } lgb=LGBMRegressor(**param) lgb.fit(X_train, y_train, eval_set=[(X_test, y_test)], early_stopping_rounds=100, verbose=False) pred_lgb=lgb.predict(X_test) rmse = mean_squared_error(y_test, pred_lgb, squared=False) return rmse 2.1.2 调参try study=optuna.create_study(direction='minimize')n_trials=50 # try50次study.optimize(objective, n_trials=n_trials) 2.1.3 绘图 optuna.visualization.plot_optimization_history(study)# 绘制 optuna.visualization.plot_parallel_coordinate(study)# optuna.visualization.plot_param_importances(study)# 2.1.4 最佳参数 params=study.best_paramsparams['metric'] = 'rmse' 2.2 XGBOOST 2.2.1 定义Objectove def objective(trial): data = train.iloc[:, :-1] target = train.target train_x, test_x, train_y, test_y = train_test_split(data, target, test_size=0.3, random_state=42) param = { 'lambda': trial.suggest_loguniform('lambda', 1e-3, 10.0), 'alpha': trial.suggest_loguniform('alpha', 1e-3, 10.0), 'colsample_bytree': trial.suggest_categorical('colsample_bytree', [0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]), 'subsample': trial.suggest_categorical('subsample', [0.4, 0.5, 0.6, 0.7, 0.8, 1.0]), 'learning_rate': trial.suggest_categorical('learning_rate', [0.008, 0.009, 0.01, 0.012, 0.014, 0.016, 0.018, 0.02]), 'n_estimators': 4000, 'max_depth': trial.suggest_categorical('max_depth', [5, 7, 9, 11, 13, 15, 17, 20]), 'random_state': trial.suggest_categorical('random_state', [24, 48, 2020]), 'min_child_weight': trial.suggest_int('min_child_weight', 1, 300), } model = xgb.XGBRegressor(**param) model.fit(train_x, train_y, eval_set=[(test_x, test_y)], early_stopping_rounds=100, verbose=False) preds = model.predict(test_x) rmse = mean_squared_error(test_y, preds, squared=False) return rmse 2.2.2 调参try study = optuna.create_study(direction='minimize')n_trials=1study.optimize(objective, n_trials=n_trials)print('Number of finished trials:', len(study.trials))print("------------------------------------------------")print('Best trial:', study.best_trial.params)print("------------------------------------------------")print(study.trials_dataframe())print("------------------------------------------------") 2.2.3 绘图 optuna.visualization.plot_optimization_history(study).show()#plot_parallel_coordinate: interactively visualizes the hyperparameters and scoresoptuna.visualization.plot_parallel_coordinate(study).show()'''plot_slice: shows the evolution of the search. You can see where in the hyperparameter space your searchwent and which parts of the space were explored more.'''optuna.visualization.plot_slice(study).show()optuna.visualization.plot_contour(study, params=['alpha', #'max_depth', 'lambda', 'subsample', 'learning_rate', 'subsample']).show()#Visualize parameter importances.optuna.visualization.plot_param_importances(study).show()#Visualize empirical distribution functionoptuna.visualization.plot_edf(study).show() 2.2.4 最佳参数 params=study.best_params