Loading the traffic data
- Talend Documentation Team
- Talend Real-Time Big Data Platform
- Talend Open Studio for Big Data
- Talend Big Data
- Talend Data Fabric
- Talend Big Data Platform
- Data Quality and Preparation > Third-party systems > Processing components (Integration) > Pig components
- Data Governance > Third-party systems > Processing components (Integration) > Pig components
- Design and Development > Third-party systems > Processing components (Integration) > Pig components
- Talend Studio
Double-click the tPigLoad labeled
traffic to open its Component view.
button next to Edit
schema to open the schema editor.
button three times to add three rows and in the
Column column, rename them as date, street
and traffic, respectively.
Click OK to validate these
In the Mode area, select the Map/Reduce option, as we need the Studio to
connect to a remote Hadoop distribution.
In the Distribution list and the
Version field, select the Hadoop
distribution to be used. In this example, it is Hortonworks Data Platform V1.0.0.
In the Load function list, select the
PigStorage function to read the source
data, as the data is a structured file in human-readable UTF-8
In the NameNode URI and the
Resource Manager fields, enter the
locations of the master node and the Resource Manager of the Hadoop distribution to
be used, respectively. If you are using WebHDFS, the location should be
webhdfs://masternode:portnumber; if this WebHDFS is secured
with SSL, the scheme should be swebhdfs and you need to use
a tLibraryLoad in the Job to load the library required by
the secured WebHDFS.
In the Input file URI field, enter the
directory where the data about the traffic situation is stored. As explained
earlier, the directory in this example is /user/ychen/tpigmap/date&traffic.
In the Field separator field, enter
; depending on the separator used by
the source data.