Step 1: Job creation, input definition, file reading - 7.0

Data Integration Job Examples

author
Talend Documentation Team
EnrichVersion
7.0
EnrichProdName
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Open Studio for Big Data
Talend Open Studio for Data Integration
Talend Open Studio for ESB
Talend Open Studio for MDM
Talend Real-Time Big Data Platform
task
Design and Development > Designing Jobs
EnrichPlatform
Talend Studio

Procedure

  1. Launch Talend Studio, and create a local project or import the demo project if you are launching Talend Studio for the first time.
  2. To create the Job, right-click Job Designs in the Repository tree view and select Create Job.
  3. In the dialog box displaying then, only the first field (Name) is required. Type in California1 and click Finish.

    An empty Job then opens on the main window and the Palette of technical components (by default, to the right of the Studio) comes up showing a dozen of component families such as: Databases, Files, Internet, Data Quality and so on, hundreds of components are already available.

  4. To read the file California_Clients, let's use the tFileInputDelimited component. This component can be found in the File > Input group of the Palette. Click this component then click to the left of the design workspace to place it on the design area.
  5. Let's define now the reading properties for this component: File path, column delimiter, encoding... To do so, let's use the Metadata Manager. This tool offers numerous wizards that will help us to configure parameters and allow us to store these properties for a one-click re-use in all future Jobs we may need.
  6. As our input file is a delimited flat file, let's select File Delimited on the right-click list of the Metadata folder in the Repository tree view. Then select Create file delimited.

    A wizard dedicated to delimited file thus displays:

    • At Step 1, only the Name field is required: simply type in California_clients and go to the next Step.

    • At Step 2, select the input file (California_Clients.csv) via the Browse... button. Immediately an extract of the file shows on the Preview, at the bottom of the screen so that you can check its content. Click Next.

    • At Step 3, we will define the file parameters: file encoding, line and column delimiters... As our input file is pretty standard, most default values are fine. The first line of our file is a header containing column names. To retrieve automatically these names, click Set heading row as column names then click Refresh Preview. And click Next to the last step.

    • At Step 4, each column of the file is to be set. The wizard includes algorithms which guess types and length of the column based on the file first data rows. The suggested data description (called schema in Talend Studio) can be modified at any time. In this particular scenario, they can be used as is.

    There you go, the California_clients metadata is complete!

    We can now use it in our input component. Select the tFileInputDelimited you had dropped on the design workspace earlier, and select the Component view at the bottom of the window.

  7. Select the vertical tab Basic settings. In this tab, you'll find all technical properties required to let the component work. Rather than setting each one of these properties, let's use the Metadata entry we just defined.
  8. Select Repository as Property type in the list. A new field shows: Repository, click "..." button and select the relevant Metadata entry on the list: California_clients.

    You can notice now that all parameters get automatically filled out.

    At this stage, we will terminate our flow by simply sending the data read from this input file onto the standard output (StdOut).

  9. To do so, add a tLogRow component (from the Logs & Errors group). To link both components, right-click the input component and select Row > Main. Then click the output component: tLogRow.
  10. This Job is now ready to be executed. To run it, select the Run tab on the bottom panel.
  11. Enable the statistics by selecting the Statistics check box in the Advanced Settings vertical tab of the Run view, then run the Job by clicking Run in the Basic Run tab.

    The content of the input file display thus onto the console.