date: 2021-12-21 01:27:40
tags: machine learning

Adaboost The algorithm is an integrated learning algorithm for binary classification problem , yes boosting The most famous representative of class algorithm . When the learning accuracy of a learner is only slightly higher than that of random guess , Then call it a weak learner , When a learning period, the correct rate of learning is very high , Then call it a strong learner . And it's much easier to find weak learner algorithms , In this way, we need to promote the weak learner to a strong learner .Adaboost The approach is to first select a weak learner , Then conduct multiple rounds of training , But after every round of training , The weight of training samples should be adjusted according to the current error rate , Reduce the weight of samples with correct prediction , The sample weight of prediction error increases , Thus, each training is carried out for the part with poor prediction results last time , So as to train a strong learner .

<> realization

Firstly, the theoretical algorithm description is shown .

Follow the above steps to start code implementation .

The first is to initialize the sample weight , Initially, their weights are equal , One third of the sample . The code is shown below .
n_train, n_test = len(X_train), len(X_test) W = np.ones(n_train) / n_train #
Sample weight initialization
Then under the specified number of training rounds , The weak classifier is trained under the corresponding sample weight .
Weak_clf.fit(X_train, Y_train, sample_weight=W)
Then predict in the test set , And calculate the incorrect number of samples .
# Number of samples with incorrect prediction , Calculation accuracy miss = [int(x) for x in (pred_train_i != Y_train)] #
Calculate the error rate under the current weight miss_w = np.dot(W, miss)
Here are the results according to the prediction , Weaken the sample weight with correct prediction , Strengthen the sample weight of prediction error , So as to update the sample weight , For the next learner training , The formula code is shown below .
# calculation alpha alpha = 0.5 * np.log(float(1 - miss_w) / float(miss_w + 0.01)) #
Coefficient of weight factor = [x if x == 1 else -1 for x in miss] # Sample update weight W = np.multiply(W,
np.exp([float(x) * alpha for x in factor])) W = W / sum(W) # normalization
Final output H(x) Multiply each measurement result alhpa Then add it to the list of results .
# predict pred_train_i = [1 if x == 1 else -1 for x in pred_train_i]
pred_test_i= [1 if x == 1 else -1 for x in pred_test_i] pred_test = pred_test +
np.multiply(alpha, pred_test_i)
Finally, for greater than 0 The value of is considered to be predicted as the tag value 1, less than 0 The value of is considered to be predicted as a ratio to the tag value 0.
pred_test = (pred_test > 0) * 1 # pred = (pred > 0) * 1 return pred_test
So it's done Adaboost A training process .

The following describes my own 10% discount cross verification , What is used here is sklearn of KFold, To divide the data set ten times , Let the ten parts divided each time take turns to make the test set , Train ten times under the same weak learner , The final prediction result index is the sum and average of these ten times , The code is shown below .
weak_clf = DecisionTreeClassifier(criterion='entropy', max_depth=2) # Ten fold cross validation
acc= [] pre = [] rec = [] f1 = [] Data = data.copy() kf = KFold(n_splits=10,
shuffle=True, random_state=0) # 10 fracture for train_index, test_index in tqdm(kf.split
(Data)): # Divide data into 10 fracture train_data = Data[train_index] # Selected training set data subscript test_data =
Data[test_index] # Selected test set data subscript x_train = train_data[:, :8] y_train = train_data[:
, 8] x_test = test_data[:, :8] y_test = test_data[:, 8] scaler = StandardScaler(
) # Standardized conversion scaler.fit(x_train) # Training standardization object x_train = scaler.transform(x_train)
scaler.fit(x_test) # Training standardization object x_test = scaler.transform(x_test) pred_test =
my_adaboost(weak_clf, x_train, x_test, y_train, y_test, epoch) acc.append(
accuracy_score(y_test, pred_test)) pre.append(precision_score(y_test, pred_test)
) rec.append(recall_score(y_test, pred_test)) f1.append(f1_score(y_test,
pred_test)) # Calculate the accuracy of the test set , Precision rate , Recall rate ,F1 print("My Adaboost outcome in test set with
{} epoch:".format(epoch)) print("ACC:", sum(acc) / 10) print("PRE: ", sum(pre) /
10) print("REC: ", sum(rec) / 10) print("F1: ", sum(f1) / 10)
The overall code is as follows ：
Epoch): """ :param Weak_clf: :param X_train: :param X_test: :param Y_train:
:param Y_test: :param Epoch: :return: """ n_train, n_test = len(X_train), len(
X_test) W = np.ones(n_train) / n_train # Sample weight initialization # W = np.ones(n) / n pred_train
, pred_test = [np.zeros(n_train), np.zeros(n_test)] # pred = [np.zeros(n)] for i
in range(Epoch): # Training classifiers with specific weights Weak_clf.fit(X_train, Y_train, sample_weight=W)
pred_train_i= weak_clf.predict(X_train) pred_test_i = weak_clf.predict(X_test)
# pred_i = cross_val_predict(Weak_clf, X, Y, cv=10) # Number of samples with incorrect prediction , Calculation accuracy miss = [int
(x) for x in (pred_train_i != Y_train)] # Calculate the error rate under the current weight miss_w = np.dot(W, miss)
# calculation alpha alpha = 0.5 * np.log(float(1 - miss_w) / float(miss_w + 0.01)) # Coefficient of weight
factor= [x if x == 1 else -1 for x in miss] # Update sample weight W = np.multiply(W, np.exp(
[float(x) * alpha for x in factor])) W = W / sum(W) # normalization # predict
pred_train_i= [1 if x == 1 else -1 for x in pred_train_i] # pred_i = [1 if x ==
1 else -1 for x in pred_i] pred_test_i = [1 if x == 1 else -1 for x in
pred_test_i] pred_test = pred_test + np.multiply(alpha, pred_test_i) pred_train
= pred_train + np.multiply(alpha, pred_train_i) # pred = pred +
np.multiply(alpha, pred_i) pred_train = (pred_train > 0) * 1 pred_test = (
pred_test> 0) * 1 # pred = (pred > 0) * 1 return pred_test

Technology
Daily Recommendation