Skip to main content Skip to complementary content

Creating the Big Data Batch Job

Create a Job with a tHMapInput and two output components to convert a JSON file to two CSV files.

About this task

This example uses a local file as input, but you can also create an HDFS connection. For more information, see HDFS components.


  1. In the Integration perspective, right-click the Job Designs node and click Create Big Data Batch Job.
  2. Enter a name, purpose and description for your Job, then click Finish.
  3. Add the following components to your design workspace:
    • A tHMapInput
    • A tFileOutputDelimited
  4. Click the tHMapInput and go to the Components tab to configure the component:
    1. If you are working with local files, clear the Define a storage configuration component.
    2. In the Input field, enter the path to your input file.


  5. Double-click the tFileOutputDelimited and configure it:
    1. Clear the Define a storage configuration component.
    2. Click the ... button next to Edit schema and create two columns named id and title.
    3. Enter the path to the folder where the output files should be created.


  6. Right-click the tFileOutputDelimited component and click Copy, then paste it on your design workspace to create a second one with the same configuration.
  7. Double-click tFileOutputDelimited_2 and change the value of the Folder field.


  8. Link the tHMapInput to the two tFIleOutputDelimited components with Row > Main connections named modules and sections and click Yes when asked if you want to get the schema from the target component.
    Your Job should look like this:
  9. Double-click the tHMapInput and follow the wizard to generate the map.
    1. Select the structure created in Creating the input structure for your Big Data Batch Job and click Next.
    2. Select Start/End with.
      In this example, the following regular expression is automatically added to the Start with field: \{\s*(\'course\'|\"course\").
    3. Optional: Click the ... button to add your sample input file and click Run to check the records found.
      In this case, you should have three records.


    4. Click Finish.


The map is generated, it uses the input structure previously created and generated an output structure from the schema defined in the tFileOutputDelimited components. You can now map the elements.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – let us know how we can improve!