Upload the access log file to HCatalog - 7.0

Big Data Job Examples

author
Talend Documentation Team
EnrichVersion
7.0
EnrichProdName
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Open Studio for Big Data
Talend Real-Time Big Data Platform
task
Design and Development > Designing Jobs
Design and Development > Designing Jobs > Hadoop distributions
Design and Development > Designing Jobs > Job Frameworks > Standard
EnrichPlatform
Talend Studio
In this step, we will configure the second Job, B_HCatalog_Load, to upload the access log file to the Hadoop system.

Procedure

  1. Double-click the tApacheLogInput component to open its Basic settings view, and specify the path to the access log file to be uploaded in the File Name field.

    In this example, we store the log file access_log in the directory C:/Talend/BigData.

  2. Double-click the tFilterRow component to open its Basic settings view.
  3. From the Logical operator used to combine conditions list box, select AND.
  4. Click the [+] button to add a line in the Filter configuration table, and set filter parameters to send records that contain the code of "301" to the Reject flow and pass the rest records on to the Filter flow:
    1. In the InputColumn field, select the code column of the schema.
    2. In the Operator field, select Not equal to.
    3. In the Value field, enter 301.
  5. Double-click the tHCatalogOutput component to open its Basic settings view.
  6. Click the Property Type list box and select Repository, and then click the [...] button to open the [Repository Content] dialog box to use a centralized HCatalog connection.
  7. Select the HCatalog connection defined for connecting to the HCatalog database and click OK.

    All the connection details are automatically filled in the respective fields.

  8. Click the [...] button to verify that the schema has been properly propagated from the preceding component. If needed, click Sync columns to retrieve the schema.
  9. From the Action list, select Create to create the file or Overwrite if the file already exists.
  10. In the Partition field, enter the partition name-value pair between double quotation marks, ipaddresses='192.168.1.15' in this example.
  11. In the File location field, enter the path where the data will be save, /user/hdp/weblog/access_log in this example.
  12. Double-click the tLogRow component to open its Basic settings view, and select the Vertical option to display each row of the output content in a list for better readability.
  13. Upon completion of the component settings, press Ctrl+S to save your Job configurations.