[MLOps] 9. Optuna 실습

MLOps

[MLOps] 9. Optuna 실습

mlslly 2024. 4. 22. 15:15

Optuna란?

옵튜나는 하이퍼파라미터를 최적화 할 수 있는 대표적인 프레임워크 중 하나. 하이퍼파라미터 최적화 개념은 아래 포스팅 참고. https://ysryuu.tistory.com/15

[MLOps] 8. 하이퍼파라미터 최적화

Hyperparameter Optmization이란? 하이퍼파라미터 최적화란, 주어진 목적함수를 최대/최소화 하는 최적의 파라미터 탐색 행위임 어떤 목적 함수들은 확률 관점에서 목적함수의 최대화가 필요한 반면 (Ac

ysryuu.tistory.com

옵튜나 공식 github doc : https://github.com/optuna/optuna/blob/master/README.md#key-features

optuna/README.md at master · optuna/optuna

A hyperparameter optimization framework. Contribute to optuna/optuna development by creating an account on GitHub.

github.com

1. Objective 2. make Study 3. Parameter Search

import ...

# Define an objective function to be minimized.
def objective(trial):

    # Invoke suggest methods of a Trial object to generate hyperparameters.
    regressor_name = trial.suggest_categorical('regressor', ['SVR', 'RandomForest'])
    if regressor_name == 'SVR':
        svr_c = trial.suggest_float('svr_c', 1e-10, 1e10, log=True)
        regressor_obj = sklearn.svm.SVR(C=svr_c)
    else:
        rf_max_depth = trial.suggest_int('rf_max_depth', 2, 32)
        regressor_obj = sklearn.ensemble.RandomForestRegressor(max_depth=rf_max_depth)

    X, y = sklearn.datasets.fetch_california_housing(return_X_y=True)
    X_train, X_val, y_train, y_val = sklearn.model_selection.train_test_split(X, y, random_state=0)

    regressor_obj.fit(X_train, y_train)
    y_pred = regressor_obj.predict(X_val)

    error = sklearn.metrics.mean_squared_error(y_val, y_pred)

    return error  # An objective value linked with the Trial object.

study = optuna.create_study()  # Create a new study.
study.optimize(objective, n_trials=100)  # Invoke optimization of the objective function.

train.py 분류모델로 Optuna 실습하기

1. 옵튜나 설치 (terminal)

!pip install optuna

2. 옵튜나 포함 모델 파일 작성

train.py의 파일에다가 옵튜나만 돌아가도록 추가해주면 된다. 우선 완성된 전체 파일 내용은 아래와 같다.

<optuna_tutorial.py 내용>

import optuna
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

def objective(trial) : 
    trial.suggest_int('n_estimators', 100,1000, step=100)
    trial.suggest_int('max_depth', 3,10, step=1)

    # load data
    iris = load_iris(as_frame=True)
    X, y = iris['data'], iris['target']

    X_train, X_valid, y_train, y_valid = train_test_split(X,y,test_size=0.3, random_state=2024)

    # train data

    clf = RandomForestClassifier(n_estimators=trial.params['n_estimators'], max_depth=trial.params['max_depth'], random_state=2024)
    clf.fit(X_train, y_train)

    # evaluate data

    y_pred = clf.predict(X_valid)
    acc_score = accuracy_score(y_valid, y_pred)
    return acc_score

if __name__ == '__main__':
    # study 
    sampler = optuna.samplers.RandomSampler(seed=2024) 
    study = optuna.create_study(sampler=sampler, study_name = 'hpo-tutorial', direction='maximize')
    
    # optimize
    study.optimize(objective, n_trials=5)

Optuna에 관한 내용만 하나씩 살펴보자.

2-1) Optuna import 해주기

import optuna

2-2) objective 함수를 생성하기

def objective(trial) : 
	trial.suggest_int('n_estimators', 100, 1000, step = 100)
    	trial.suggest_int('max_depth', 3,10, step=1)

RandomForestClassifier에서 활용할 파라미터들에 대한 범위를 기재해준다.

n_estimator는 100부터 1000까지 100씩 건너뛰며 범위를 지정한 것, max_depth는 3부터 10까지를 파라미터 범위로 지정함.

2-3) study : objective 함수를 실행시키기

if __name__ == '__main__' : 
   # study
    sampler = optuna.samplers.RandomSampler(seed=2024)
    study = optuna.create_study(sampler=sampler, study_name = 'hpo-tutorial', direction = 'maximize')
    
    # optimize
    study.optimize(objective, n_trials=5)

if __name__ == '__main__' : 위에 작성한 함수가 실행될 수 있게 하는 코드. 파이썬 스크립트가 직접 실행될 때 (즉, 모듈이 아닌 메인 프로그램으로 실행될 때) 코드 블록을 실행하기 위한 조건문임
sampler : optuna에서 랜덤으로 파라미터를 추출할 때, 고정하기 위함임. RandomSampler를 이용
study : optuna의 study를 만들고, 스터디 명, 최적화 방향을 결정함. accuracy의 경우 클 수록 좋기 때문에 방향은 'maximize'
study.optimzie(objective) : 위에서 생성한 study에 objective 적용하여, n_trials 만큼 랜덤 추출하며 하이퍼파라미터 최적화

실행 결과

실행 결과 하이퍼파라미터를 탐색한 내용 및 best params를 확인 가능함

1. 터미널에서 optuna_tutorial.py를 실행한다

$ python3 optuna_tutorial.py

2. 확인

하이퍼파라미터 최적화가 진행됨. 그중 Trial 1인 0.91... 이 가장 높았음

* 프로그래머스의 마키나락스 MLOPS 강의를 참고하여 작성함

'MLOps' 카테고리의 다른 글

[MLOps] 11. 교차검증 (Cross-validation) (0)	2024.04.22
[MLOps] 10. MLflow + Optuna 실습 (0)	2024.04.22
[MLOps] 8. 하이퍼파라미터 최적화 (1)	2024.04.18
[MLOps] 7. MLflow 로깅 실습 (1)	2024.04.18
[MLOps] 6. MLflow 로깅 방법 (1)	2024.04.18

현재글[MLOps] 9. Optuna 실습

Tech blog