- 2022-01-27 21:22
*views 23*- Machine learning
- Adaboost
- Python
- Algorithm

<>title: Adaboost

date: 2021-12-21 01:27:40

tags: machine learning

<> about Adaboost

Adaboost The algorithm is an integrated learning algorithm for binary classification problem , yes boosting The most famous representative of class algorithm . When the learning accuracy of a learner is only slightly higher than that of random guess , Then call it a weak learner , When a learning period, the correct rate of learning is very high , Then call it a strong learner . And it's much easier to find weak learner algorithms , In this way, we need to promote the weak learner to a strong learner .Adaboost The approach is to first select a weak learner , Then conduct multiple rounds of training , But after every round of training , The weight of training samples should be adjusted according to the current error rate , Reduce the weight of samples with correct prediction , The sample weight of prediction error increases , Thus, each training is carried out for the part with poor prediction results last time , So as to train a strong learner .

<> realization

Firstly, the theoretical algorithm description is shown .

Follow the above steps to start code implementation .

The first is to initialize the sample weight , Initially, their weights are equal , One third of the sample . The code is shown below .

n_train, n_test = len(X_train), len(X_test) W = np.ones(n_train) / n_train #

Sample weight initialization

Then under the specified number of training rounds , The weak classifier is trained under the corresponding sample weight .

Weak_clf.fit(X_train, Y_train, sample_weight=W)

Then predict in the test set , And calculate the incorrect number of samples .

# Number of samples with incorrect prediction , Calculation accuracy miss = [int(x) for x in (pred_train_i != Y_train)] #

Calculate the error rate under the current weight miss_w = np.dot(W, miss)

Here are the results according to the prediction , Weaken the sample weight with correct prediction , Strengthen the sample weight of prediction error , So as to update the sample weight , For the next learner training , The formula code is shown below .

# calculation alpha alpha = 0.5 * np.log(float(1 - miss_w) / float(miss_w + 0.01)) #

Coefficient of weight factor = [x if x == 1 else -1 for x in miss] # Sample update weight W = np.multiply(W,

np.exp([float(x) * alpha for x in factor])) W = W / sum(W) # normalization

Final output H(x) Multiply each measurement result alhpa Then add it to the list of results .

# predict pred_train_i = [1 if x == 1 else -1 for x in pred_train_i]

pred_test_i= [1 if x == 1 else -1 for x in pred_test_i] pred_test = pred_test +

np.multiply(alpha, pred_test_i)

Finally, for greater than 0 The value of is considered to be predicted as the tag value 1, less than 0 The value of is considered to be predicted as a ratio to the tag value 0.

pred_test = (pred_test > 0) * 1 # pred = (pred > 0) * 1 return pred_test

So it's done Adaboost A training process .

The following describes my own 10% discount cross verification , What is used here is sklearn of KFold, To divide the data set ten times , Let the ten parts divided each time take turns to make the test set , Train ten times under the same weak learner , The final prediction result index is the sum and average of these ten times , The code is shown below .

weak_clf = DecisionTreeClassifier(criterion='entropy', max_depth=2) # Ten fold cross validation

acc= [] pre = [] rec = [] f1 = [] Data = data.copy() kf = KFold(n_splits=10,

shuffle=True, random_state=0) # 10 fracture for train_index, test_index in tqdm(kf.split

(Data)): # Divide data into 10 fracture train_data = Data[train_index] # Selected training set data subscript test_data =

Data[test_index] # Selected test set data subscript x_train = train_data[:, :8] y_train = train_data[:

, 8] x_test = test_data[:, :8] y_test = test_data[:, 8] scaler = StandardScaler(

) # Standardized conversion scaler.fit(x_train) # Training standardization object x_train = scaler.transform(x_train)

scaler.fit(x_test) # Training standardization object x_test = scaler.transform(x_test) pred_test =

my_adaboost(weak_clf, x_train, x_test, y_train, y_test, epoch) acc.append(

accuracy_score(y_test, pred_test)) pre.append(precision_score(y_test, pred_test)

) rec.append(recall_score(y_test, pred_test)) f1.append(f1_score(y_test,

pred_test)) # Calculate the accuracy of the test set , Precision rate , Recall rate ,F1 print("My Adaboost outcome in test set with

{} epoch:".format(epoch)) print("ACC:", sum(acc) / 10) print("PRE: ", sum(pre) /

10) print("REC: ", sum(rec) / 10) print("F1: ", sum(f1) / 10)

The overall code is as follows ：

# Self realized adaboost def my_adaboost(Weak_clf, X_train, X_test, Y_train, Y_test,

Epoch): """ :param Weak_clf: :param X_train: :param X_test: :param Y_train:

:param Y_test: :param Epoch: :return: """ n_train, n_test = len(X_train), len(

X_test) W = np.ones(n_train) / n_train # Sample weight initialization # W = np.ones(n) / n pred_train

, pred_test = [np.zeros(n_train), np.zeros(n_test)] # pred = [np.zeros(n)] for i

in range(Epoch): # Training classifiers with specific weights Weak_clf.fit(X_train, Y_train, sample_weight=W)

pred_train_i= weak_clf.predict(X_train) pred_test_i = weak_clf.predict(X_test)

# pred_i = cross_val_predict(Weak_clf, X, Y, cv=10) # Number of samples with incorrect prediction , Calculation accuracy miss = [int

(x) for x in (pred_train_i != Y_train)] # Calculate the error rate under the current weight miss_w = np.dot(W, miss)

# calculation alpha alpha = 0.5 * np.log(float(1 - miss_w) / float(miss_w + 0.01)) # Coefficient of weight

factor= [x if x == 1 else -1 for x in miss] # Update sample weight W = np.multiply(W, np.exp(

[float(x) * alpha for x in factor])) W = W / sum(W) # normalization # predict

pred_train_i= [1 if x == 1 else -1 for x in pred_train_i] # pred_i = [1 if x ==

1 else -1 for x in pred_i] pred_test_i = [1 if x == 1 else -1 for x in

pred_test_i] pred_test = pred_test + np.multiply(alpha, pred_test_i) pred_train

= pred_train + np.multiply(alpha, pred_train_i) # pred = pred +

np.multiply(alpha, pred_i) pred_train = (pred_train > 0) * 1 pred_test = (

pred_test> 0) * 1 # pred = (pred > 0) * 1 return pred_test

Technology

- Java296 blogs
- Python265 blogs
- Vue125 blogs
- C Language122 blogs
- Algorithm108 blogs
- MySQL96 blogs
- Flow Chart84 blogs
- JavaScript79 blogs
- More...

©2020-2024 ioDraw All rights reserved,
Privacy Policy