Skip to main content Skip to complementary content

Tuning hyper-parameters and using K-fold cross-validation to improve the matching model

Testing the model using the K-fold cross-validation technique

The K-fold cross-validation technique consists of assessing how good the model will be on an independent dataset.

To test the model, the dataset is split into k subsets and the Random forest algorithm is ran k times:

  • At each iteration, one of the k subsets is retained as the validation set and the remaining k-1 subsets are the training set.
  • A score for each of the k runs is computed and then the scores obtained are averaged to calculate a global score.

Tuning the Random forest algorithm hyper-parameters using grid search

You can specify values for the two Random forest algorithm hyper-parameters:

  • The number of decision trees
  • The maximum depth of a decision tree

To improve the quality of the model and tune the hyper-parameters, grid search builds models for each combination of the two Random forest algorithm hyper-parameter values within the limits you specified.

For example:

  • The number of trees ranges from 5 to 50 with a step of 5; and
  • the tree depth goes from 5 to 10 with a step of 1.

In this example, there will be 60 different combinations (10 × 6).

Only the best combination of the two hyper-parameters values used to train the best model is retained. This measure is reported by the K-fold cross-validation.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – let us know how we can improve!