Extracting named entities from text data - 7.0

Natural Language Processing

author
Talend Documentation Team
EnrichVersion
7.0
EnrichProdName
Talend Big Data Platform
Talend Data Fabric
Talend Real-Time Big Data Platform
task
Data Governance > Third-party systems > Natural Language Processing
Data Quality and Preparation > Third-party systems > Natural Language Processing
Design and Development > Third-party systems > Natural Language Processing
EnrichPlatform
Talend Studio
In this Job, the tNLPPredict component predicts named entities and automatically labels text data, using a classification model generated by the tNLPModel component.

Procedure

  1. Double click the tNLPPredict component to open its Basic settings view and define its properties.
    1. Click Sync columns to retrieve the schema from the previous component connected in the Job.
    2. From the Original text column list, select the column that holds the text to be labeled, which is text in this example.
    3. From the Token column list, select the column used for feature construction and prediction, which is tokens in this example
    4. From the NLP Library list, select the same library you used for generating the model.
    5. If the named entity recognition model is stored in a single file, select the Use the model file check box.
    6. Specify the path to the model in the NLP model path.
  2. Double click the tFilterColumns component to open its Basic settings view and define its properties.
    1. Click Sync columns to retrieve the schema from the previous component connected in the Job.
    2. Set the Schema as Built-in and click Edit schema to keep only the columns that hold the original text, the labeled text and the labels.
  3. Double click the tFileOutputDelimited component to open its Basic settings view and define its properties.
    1. Click Sync columns to retrieve the schema from the previous component connected in the Job.
    2. Specify the path to the folder where you want to store the labeled text and the labels, in the Folder field.
    3. Enter "\n" in the Row separator field and ";" in the Field separator field.
  4. Press F6 to save and execute the Job.

Results

The output files contain the original text, the labeled text and the labels. The named entity recognition task was performed correctly, since person names were extracted from the original text.