Evaluating and generating a classification model - 7.0

Natural Language Processing

author
Talend Documentation Team
EnrichVersion
7.0
EnrichProdName
Talend Big Data Platform
Talend Data Fabric
Talend Real-Time Big Data Platform
task
Data Governance > Third-party systems > Natural Language Processing
Data Quality and Preparation > Third-party systems > Natural Language Processing
Design and Development > Third-party systems > Natural Language Processing
EnrichPlatform
Talend Studio
The tNLPModel component reads training data in CoNLL format to evaluate and generate a classification model.

Procedure

  1. Double click the tNLPModel component to open its Basic settings view and define its properties.
    1. Click the [+] button under the Feature template table to add rows to the table.
    2. Click in the Features column to select the features to be generated.
    3. For each feature, specify the relative position.

      For example -2,-1,0,1,2 means that you use the current token, the preceding two and the following two context tokens as features.

    4. From the NLP Library list, select the same library you used for preprocessing the training text data.
  2. To evaluate the model, select the Run cross validation evaluation check box and enter 2 in the Fold field.

    This means the training data is partitioned into two pieces: the training data set and the test data set. The validation process is repeated twice.

  3. Press F6 to save and execute the Job.
    The results from the K-fold cross-validation process are displayed on the Run view:
    • Precision is the ratio of correctly predicted named entities to the total number of predicted named entities.
    • Recall is the ratio of correctly predicted named entities to the total number of named entities.
    • F1 score is the harmonic mean between recall and precision.
  4. Clear the Run cross validation evaluation check box.
  5. Select the Save the model on file system and the Store model in a single file check boxes to save the model locally in the folder specified in the Folder field.
  6. Press F6 to save and execute the Job.

Results

The model file is stored in the specified folder. You can now use the generated model with the tNLPPredict component to predict named entities and label text data automatically.