After you have created a Hadoop Cluster and a Structure, design the Big Data Batch
Job including the tHDFSConfiguration, tHMapInput, and
tLogRow components.
Procedure
-
Open the Integration perspective and navigate
to .
-
Right-click Big Data Batch and select Create
Big Data Batch Job.
-
Enter the necessary details to create the Job.
-
Drag the Hadoop Cluster metadata you created into the Job Design and select the
tHDFSConfiguration component.
-
Add a tHMapInput and a tLogRow
and connect these using connection.
-
Enter Output, when prompted for the output
name.
-
Double-click the tLogRow and define its schema:
-
Click the […] button next to Edit
schema.
-
In the Output (Input) section, click the
+ to add three new columns and name them
firstName, lastName
and age.
-
Click the button to copy the columns
to tLogRow_1 (Output).
-
Click the tHMapInput and open the Basic
Settings tab.
-
Select the Define a storage configuration
component check box and select the
tHDFSConfiguration component as the chosen
storage.
-
Specify the input file in the Input field.
-
Click the […] button next to
Configure Component and select the structure
you created earlier.
-
Select CSV in the Input
Representation drop-down list.
-
Click Next and add the input file in the
Sample File field, then click
Run to check the number records found.
-
Click Finish.