Setting up the Job - Cloud - 8.0

Samba

Version
Cloud
8.0
Language
English (United States)
Product
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Open Studio for Big Data
Talend Open Studio for Data Integration
Talend Open Studio for ESB
Talend Real-Time Big Data Platform
Module
Talend Studio
Content
Data Governance > Third-party systems > Internet components (Integration) > Samba components
Data Quality and Preparation > Third-party systems > Internet components (Integration) > Samba components
Design and Development > Third-party systems > Internet components (Integration) > Samba components

Procedure

  1. Open the Basic settings view of the tSambaConnection component by double-clicking the component and do the following.
    1. Enter the IP address of the Samba host in the host field;
    2. If user authentication is enabled in the Samba host, enter user name and password in the username and password fields; otherwise, leave these two fields empty;
    3. Enter the domain name in the Domain field.
      If the Samba host is not configured with a domain, leave this field empty.
  2. Open the Basic settings view of the tSambaList component by double-clicking the component and do the following.
    1. Select Use an existing connection and select the tSambaConnection component from the Component List drop-down list;
    2. Enter the name of the shared folder set in the Samba host in the Share directory field (SmbShare in this example);
    3. Enter the path to the directory whose files you want to process in the Remote path field (/abc in this example);
    4. Click the Guess schema button and then click OK to accept the schema generated.
      The schema generated contains seven columns, as shown in the following figure. You can edit the schema based on your actual needs by removing undesired columns;
      Note:
      • The tSambaList component does not pass data to any columns other than the seven columns.
      • The FileName_with_Path column is used by the subsequent components in this scenario. Make sure the column is present in the schema.
      • For data that is of Date type, information about hours, minutes, seconds, and so on is hidden in Talend Studio by default. You can have such information displayed by changing the setting in the Date Pattern column (as shown in the Change_Time row in the following figure). You can access a list of all the supported date patterns by clicking the Date Pattern field of a row that is of Date type and pressing Ctrl + Space.
    5. Select Includes subdirectories if you want to process the CSV files in all the directories under the Smbshare/abc directory;
    6. Add a row in the File mask field by clicking the plus button on the bottom of the field and enter "*.csv" in the row.
  3. Open the Basic settings view of the tLogRow component by double-clicking the component and do the following.
    1. Click the Sync columns button to synchronize the schema with that of the tSambaList component;
    2. Select Table (print values in cells of a table);
    3. Leave the other options as they are.
  4. Open the Basic settings view of the tFlowToIterate component by double-clicking the component and do the following.
    1. Clear the Use the default (key, value) in global variables check box;
    2. Add a row in the Customize table by clicking the plus button on the bottom of the field; enter "CURRENT_FILE_PATH" in the key column and select FileName_with_Path from the value column;

      This row creates a global variable named CURRENT_FILE_PATH, which holds the path to the current file.

    3. Leave the other options as they are.
  5. Open the Basic settings view of the tSambaDelete component by double-clicking the component and do the following.
    1. Select Use an existing connection and select the tSambaConnection component from the Component List drop-down list;
    2. Enter the name of the shared folder set in the Samba host in the Share directory field (SmbShare in this example);
    3. Enter a string to retrieve the paths to the files to be removed (((String)globalMap.get("CURRENT_FILE_PATH")) in this example) in the Remote path field;
      Note: You can also enter the string by placing the cursor in the Remote path field, pressing Ctrl + Space, and then selecting tFlowToIterate_1.CURRENT_FILE_PATH from the list that appears.
    4. Select the Remove directory check box if you also want to remove the directory specified in the Remote path field;
    5. Leave the other options as they are.
  6. Save the Job.