First steps with Big Data in Talend Studio
In this tutorial, learn how to take your first steps with Big Data in Talend Studio.
This tutorial makes use of a Hadoop cluster. You must have a Hadoop cluster available to you.
Creating a Talend Studio project
Creating a project is the first step to using Talend Studio. Projects allow you to better organize your work.
Procedure
Results
Creating a Job to use a Hadoop cluster connection
Talend Studio projects contain Jobs. In Jobs, you can build workflows through components, which allow you to complete specific actions.
Before you begin
Procedure
Results
Creating a Hadoop cluster metadata definition
You can create a Hadoop cluster metadata definition to be able to quickly configure component with your Hadoop cluster information. Talend Studio also allows you to import a cluster metadata definition.
Before you begin
- This tutorial makes use of a Hadoop cluster. You must have a Hadoop cluster available to you.
- Select the Integration perspective ( ).
Procedure
Results
Importing a Hadoop cluster metadata definition
You can import your Hadoop cluster configuration to create a Hadoop cluster metadata definition to be able to quickly configure components with its information. Talend Studio also allows you to create a cluster metadata definition from scratch.
Before you begin
- This tutorial makes use of a Hadoop cluster. You must have a Hadoop cluster available to you.
- Select the Integration perspective ( ).
Procedure
Results
Writing and reading data in HDFS
In this tutorial, discover how to write data to HDFS using automatically generated random data. Next, learn how to read data from HDFS, how to sort it and how to display the results in the console.
Generating random data
With the help of the tRowGenerator component, Talend Studio can create random data to help you test its features.
About this task
Procedure
Results
What to do next
Writing data to HDFS using metadata
Using the tHDFSOutput component, you can write data to HDFS.
Before you begin
- This tutorial makes use of a Hadoop cluster. You must have a Hadoop cluster available to you.
- You must also have HDFS metadata configured (see Creating a Hadoop cluster metadata definition and Importing a Hadoop cluster metadata definition).
Procedure
Results
Reading data from HDFS using metadata
Using the tHDFSInput component, you can read data from HDFS.
Before you begin
- This tutorial makes use of a Hadoop cluster. You must have a Hadoop cluster available to you.
- You must also have HDFS metadata configured (see Creating a Hadoop cluster metadata definition and Importing a Hadoop cluster metadata definition).
- You must have written data to HDFS (see Writing data to HDFS using metadata).