Read the log file to be analyzed through the Pig chain - 7.0

Big Data Job Examples

author
Talend Documentation Team
EnrichVersion
7.0
EnrichProdName
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Open Studio for Big Data
Talend Real-Time Big Data Platform
task
Design and Development > Designing Jobs
Design and Development > Designing Jobs > Hadoop distributions
Design and Development > Designing Jobs > Job Frameworks > Standard
EnrichPlatform
Talend Studio

Procedure

  1. Double-click the tPigLoad component to open its Basic settings view.
  2. Click the Property Type list box and select Repository, and then click the [...] button to open the [Repository Content] dialog box to use a centralized HDFS connection.
  3. Select the HDFS connection defined for connecting to the HDFS system and click OK.

    All the connection details are automatically filled in the respective fields.

  4. Select the generic schema of access_log from the Repository tree view and then drag and drop it onto this component to apply the schema.
  5. From the Load function list, select PigStorage, and fill the Input file URI field with the file path defined in the previous Job, /user/hdp/weblog/access_log/out.log in this example.