Loading the traffic data

Pig

author
Talend Documentation Team
EnrichVersion
6.5
EnrichProdName
Talend Data Fabric
Talend Open Studio for Big Data
Talend Big Data Platform
Talend Big Data
Talend Real-Time Big Data Platform
task
Data Governance > Third-party systems > Processing components (Integration) > Pig components
Design and Development > Third-party systems > Processing components (Integration) > Pig components
Data Quality and Preparation > Third-party systems > Processing components (Integration) > Pig components
EnrichPlatform
Talend Studio

Procedure

  1. Double-click the tPigLoad labeled traffic to open its Component view.
  2. Click the button next to Edit schema to open the schema editor.
  3. Click the button three times to add three rows and in the Column column, rename them as date, street and traffic, respectively.
  4. Click OK to validate these changes.
  5. In the Mode area, select the Map/Reduce option, as we need the Studio to connect to a remote Hadoop distribution.
  6. In the Distribution list and the Version field, select the Hadoop distribution to be used. In this example, it is Hortonworks Data Platform V1.0.0.
  7. In the Load function list, select the PigStorage function to read the source data, as the data is a structured file in human-readable UTF-8 format.
  8. In the NameNode URI and the Resource Manager fields, enter the locations of the master node and the Resource Manager of the Hadoop distribution to be used, respectively. If you are using WebHDFS, the location should be webhdfs://masternode:portnumber; if this WebHDFS is secured with SSL, the scheme should be swebhdfs and you need to use a tLibraryLoad in the Job to load the library required by the secured WebHDFS.
  9. In the Input file URI field, enter the directory where the data about the traffic situation is stored. As explained earlier, the directory in this example is /user/ychen/tpigmap/date&traffic.
  10. In the Field separator field, enter ; depending on the separator used by the source data.