Evaluating and generating a classification model - 7.3

Natural Language Processing

Version
7.3
Language
English (United States)
Product
Talend Big Data Platform
Talend Data Fabric
Talend Real-Time Big Data Platform
Module
Talend Studio
Content
Data Governance > Third-party systems > Natural Language Processing
Data Quality and Preparation > Third-party systems > Natural Language Processing
Design and Development > Third-party systems > Natural Language Processing
The tNLPModel component reads training data in CoNLL format to evaluate and generate a classification model.

Procedure

  1. Double-click the tNLPModel component to open its Basic settings view and define its properties.
    1. Click the [+] button under the Feature template table to add rows to the table.
    2. Click in the Features column to select the features to be generated.
    3. For each feature, specify the relative position.

      For example -2,-1,0,1,2 means that you use the current token, the preceding two and the following two context tokens as features.

    4. From the NLP Library list, select the same library you used for preprocessing the training text data.
  2. To evaluate the model, select the Run cross validation evaluation check box.
  3. Select the Save the model on file system and the Store model in a single file check boxes to save the model locally in the folder specified in the Folder field.
  4. Optional: Change the logging output level for the execution of the Job to output the best weighted F1-score for each improvement of the model in the Run view:
    1. In the Run view, click the Advanced settings tab.
    2. Select the log4jLevel check box, and select Info from the list.
  5. Press F6 to save and execute the Job.

Results

If you set the log4jLevel value to Info, the best weighted F1-score is output to the console of the Run view for each improvement of the model.

The following items are also output to the console of the Run view:

Category Item
For each class The class name
True Positive: the number of elements that were predicted correctly as elements of this class.
Predicted True: the number of elements that were predicted as elements of this class.
Labeled True: the number of elements belonging this class.
Precision score: this score varies from 0 to 1 and indicates how relevant the elements selected by the classification are to a given class.
Recall score: this score varies from 0 to 1 and indicates how many relevant elements are selected.
F1-score: the harmonic mean of the Precision score and the Recall score.
For the best model The global weighted F1-score

The model file is stored in the specified folder. You can now use the generated model with the tNLPPredict component to predict named entities and label text data automatically.