Scenario: Checking the existence of a file in HDFS - 6.3

Talend Open Studio for Big Data Components Reference Guide

EnrichVersion
6.3
EnrichProdName
Talend Open Studio for Big Data
task
Data Governance
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

In this scenario, the two-component Job checks whether a specific file exists in HDFS and returns a message to indicate the result of the verification.

In the real-world practice, you can take further action to process the file checked according to the verification result, using the other HDFS components provided with the Studio.

Launch the Hadoop distribution in which you want to check the existence of a particular file. Then, proceed as follows:

Linking the components

  1. In the Integration perspective of Talend Studio, create an empty Job, named hdfsexist_file for example, from the Job Designs node in the Repository tree view.

    For further information about how to create a Job, see the Talend Studio User Guide.

  2. Drop tHDFSExist and tMsgBox onto the workspace.

  3. Connect them using the Trigger > Run if link.

Configuring the connection to HDFS

  1. Double-click tHDFSExist to open its Component view.

  2. In the Version area, select the Hadoop distribution you are connecting to and its version.

  3. In the Connection area, enter the values of the parameters required to connect to the HDFS.

    In the real-world practice, you may use tHDFSConnection to create a connection and reuse it from the current component. For further information, see tHDFSConnection.

  4. In the HDFS Directory field, browse to, or enter the path to the folder where the file to be checked is. In this example, browse to /user/ychen/data/hdfs/out/dest.

  5. In the File name or relative path field, enter the name of the file you want to check the existence. For example, output.csv.

Defining the message to be returned

  1. Double-click tMsgBox to open its Component view.

  2. In the Title field, enter the title to be used for the pop-up message box to be created.

  3. In the Buttons list, select OK. This defines the button to be displayed on the message box.

  4. In the Icon list, select Icon information.

  5. In the Message field, enter the message you want to displayed once the file checking is done. In this example, enter "This file does not exist!".

Defining the condition

  1. Click the If link to open the Basic settings view, where you are able to define the condition for checking the existence of this file.

  2. In the Condition box, press Ctrl+Space to access the variable list and select the global variable EXISTS. Type an exclamation mark before the variable to negate the meaning of the variable.

Executing the Job

  • Press F6 to execute this Job.

Once done, a message box pops up to indicate that this file called output.csv does not exist in the directory you defined earlier.

In the HDFS we check the existence of the file, browse to this directory specified, you can see that this file does not exist.