Configuring tPigLoad
Pig
- author
- Talend Documentation Team
- EnrichVersion
- 6.5
- EnrichProdName
- Talend Real-Time Big Data Platform
- Talend Open Studio for Big Data
- Talend Big Data
- Talend Data Fabric
- Talend Big Data Platform
- task
- Data Quality and Preparation > Third-party systems > Processing components (Integration) > Pig components
- Data Governance > Third-party systems > Processing components (Integration) > Pig components
- Design and Development > Third-party systems > Processing components (Integration) > Pig components
- EnrichPlatform
- Talend Studio
Procedure
-
Double-click tPigLoad to open its
Component view.
-
Click the
button next to Edit schema to open the schema editor.
-
Click the
button twice to add two rows and name them Name and State, respectively.
-
Click OK to validate these changes and
accept the propagation prompted by the pop-up dialog box.
-
In the Mode area, select Map/Reduce because the Hadoop to be used in this
scenario is installed in a remote machine. Once selecting it, the parameters
to be set appear.
-
In the Distribution and the Version lists, select the Hadoop distribution to
be used.
-
In the Load function list, select
PigStorage
-
In the NameNode URI field
and the Resource Manager field, enter the
locations of the NameNode and the ResourceManager to be used for Map/Reduce,
respectively. If you are using WebHDFS, the location should be
webhdfs://masternode:portnumber; if this WebHDFS is secured
with SSL, the scheme should be swebhdfs and you need to use
a tLibraryLoad in the Job to load the library required by
the secured WebHDFS.
-
In the Input file URI field, enter the
location of the data to be read from HDFS. In this example, the location is
/user/ychen/raw/NameState.csv.
-
In the Field separator field, enter the
semicolon ;.