Double-click the tPigLoad component
to open its Basic settings view.
- Click the Property Type list box and select Repository, and then click the [...] button to open the [Repository Content] dialog box to use a centralized HDFS connection.
Select the HDFS connection defined for connecting to the HDFS system and click
All the connection details are automatically filled in the respective fields.
- Select the generic schema of access_log from the Repository tree view and then drag and drop it onto this component to apply the schema.
- From the Load function list, select PigStorage, and fill the Input file URI field with the file path defined in the previous Job, /user/hdp/weblog/access_log/out.log in this example.