Configuring the input data - 7.3

Talend Big Data Getting Started Guide

Version
7.3
Language
English
Operating system
Big Data
Product
Talend Big Data
Module
Talend Administration Center
Talend Installer
Talend Runtime
Talend Studio
Content
Design and Development
Installation and Upgrade
Last publication date
2023-07-24
The tFileInputDelimited components are configured to load data from DBFS into the Job.

Before you begin

Procedure

  1. Expand the File delimited node under the Metadata node in the Repository and then the movies file connection node and its child node to display the movies schema metadata node.
  2. Double-click this schema metadata node to open its wizard.
  3. Click the button to export the schema to a local directory.
  4. Double-click the movie tFileInputDelimited component to open its Component view.
  5. Ensure that the Define a storage configuration component check box is clear. This allows this component to directly read data from the file system of the Spark cluster to be defined later in the Spark configuration tab; In this scenario, this file system is DBFS.
  6. Click Edit schema to open the editor of the schema and click the button to import the schema of the movie data you exported previously from the File delimited metadata in Repository.
  7. In the Folder/File field, enter the path pointing to the movie data stored in DBFS.
  8. In the Header field, enter 1 without any quotation marks. This allows the component to recognize the first row of the data as data header.
  9. Double-click the director tFileInputDelimited component to open its Component view.
  10. Ensure that the Define a storage configuration component check box is clear for the same reason as explained in the previous steps.
  11. Click the [...] button next to Edit schema to open the schema editor.
  12. Click the [+] button twice to add two rows and in the Column column, rename them to ID and Name, respectively.
  13. Click OK to validate these changes and accept the propagation prompted by the pop-up dialog box.
  14. In the Folder/File field, enter the directory where the director data is stored. As is explained in Uploading files to DBFS (Databricks File System), this data has been written in /FileStore/ychen/movie_library/directors.txt.
  15. In Field separator field, enter a comma (,) as this is the separator used by the director data.

Results

The input components are now configured to load the movie data and the director data to the Job.