For more technologies supported by Talend, see Talend components.
The tDataprepRun component allows you to reuse an existing preparation made in Talend Data Preparation, directly in a Big Data Job. In other words, you can operationalize the process of applying a preparation to input data with the same model.
The following scenario creates a simple Job that :
- Reads a small sample of customer data,
- applies an existing preparation on this data,
- shows the result of the execution in the console.
This assumes that a preparation has been created beforehand, on a dataset with the same schema as your input data for the Job. In this case, the existing preparation is called datapreprun_spark. This simple preparation puts the customer last names into upper case and applies a filter to isolate the customers from California, Texas and Florida.
James;Butt;California Daniel;Fox;Connecticut Donna;Coleman;Alabama Thomas;Webb;Illinois William;Wells;Florida Ann;Bradley;California Sean;Wagner;Florida Elizabeth;Hall;Minnesota Kenneth;Jacobs;Florida Kathleen;Crawford;Texas Antonio;Reynolds;California Pamela;Bailey;Texas Patricia;Knight;Texas Todd;Lane;New Jersey Dorothy;Patterson;Virginia
Prerequisite: ensure that the Spark cluster has been properly installed and is running.