Centralize the schema for the access log file for reuse in Job configurations - 7.0

Big Data Job Examples

author
Talend Documentation Team
EnrichVersion
7.0
EnrichProdName
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Open Studio for Big Data
Talend Real-Time Big Data Platform
task
Design and Development > Designing Jobs
Design and Development > Designing Jobs > Hadoop distributions
Design and Development > Designing Jobs > Job Frameworks > Standard
EnrichPlatform
Talend Studio

To handle the access log file to be analyzed on the Hadoop system, you needed to define an appropriate schema in the relevant components.

To simplify the configuration, before we start to configure the Jobs, we can save the read-only schema of the tApacheLogInput component as a generic schema that can be reused across Jobs.

Procedure

  1. In the Job B_HCatalog_Read, double-click the tApacheLogInput component to open its Basic settings view.
  2. Click the [...] button next to the Edit schema to open the [Schema] dialog box.
  3. Click the button to open the [Select folder] dialog box.
  4. In this example we have not created any folder under the Generic schemas node, so simply click OK to close the dialog box and open the generic schema setup wizard.
  5. Give your generic schema a name, access_log in this example, and click Finish to close the wizard and save the schema.
  6. Click OK to close the [Schema] dialog box. Now the generic schema appears under the Generic schemas node of the Repository view and is ready for use where it is needed in your Job configurations.