Testing the model using the K-fold cross-validation technique
The K-fold cross-validation technique consists of assessing how good the model will be on an independent dataset.
To test the model, the dataset is split into
k subsets and the
Random forest algorithm is ran
At each iteration, one of the
ksubsets is retained as the validation set and the remaining
k-1subsets are the training set.
A score for each of the
kruns is computed and then the scores obtained are averaged to calculate a global score.
Tuning the Random forest algorithm hyper-parameters using grid search
You can specify values for the two Random forest algorithm hyper-parameters:
The number of decision trees
The maximum depth of a decision tree
To improve the quality of the model and tune the hyper-parameters, grid search builds models for each combination of the two Random forest algorithm hyper-parameter values within the limits you specified.
The number of trees ranges from 5 to 50 with a step of 5; and
the tree depth goes from 5 to 10 with a step of 1.
In this example, there will be 60 different combinations (10 × 6).
Only the best combination of the two hyper-parameters values used to train the best model is retained. This measure is reported by the K-fold cross-validation.