How to repair the missing winutils.exe error in the Big Data Jobs

author
Talend Documentation Team
EnrichVersion
6.5
EnrichProdName
Talend Open Studio for Big Data
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Real-Time Big Data Platform
task
Design and Development > Designing Jobs > Job Frameworks > Spark Batch
Design and Development > Designing Jobs > Job Frameworks > MapReduce
Design and Development > Designing Jobs > Job Frameworks > Spark Streaming
Design and Development > Designing Jobs > Job Frameworks > Standard
EnrichPlatform
Talend Studio

The missing winutils.exe program in the Big Data Jobs

When you execute a Talend Big Data Job from Windows, the missing winutils.exe error is often returned. Even though most of those Jobs can nevertheless run successfully, you are still recommended to repair this error to prevent any unexpected Job failure related to it.

This is a known issue to many Hadoop related projects and has been asked about in many different online communities such as Stack Overflow or Cloudera's issue tracking system.

java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries

How to reproduce the missing winutils.exe program in the Big Data Jobs

This issue can be reproduced in several ways. The following procedure presents one example.

Procedure

  1. In Talend Studio, Create a MapReduce Job or use Hadoop-enabled components such as Pig, Hive or Sqoop to create a Standard Job in order to run MapReduce computations. The Talend MapReduce Job is available only in the subscription-based Big Data products of Talend.
  2. In this Job, select for example Hortonworks Data Platform or Cloudera as the target Hadoop cluster to work with.
  3. Run this Job in Windows.

How to repair the missing winutils.exe error in the Big Data Jobs

You can use one of the three solutions to repair the missing winutils.exe error in the Big Data Jobs.

Adding a specific JVM argument

  • You have designed your Talend Big Data Job such as a Standard Job using the Big Data components or a Talend Map/Reduce Job or a Spark Job.
  1. Download the winutils.exe program from, for example, http://public-repo-1.hortonworks.com/hdp-win-alpha/winutils.exe, and put it in a bin folder in the Windows client machine, such as C:/tmp/winutils/bin.
  2. In your Talend Studio, open the Big Data Job to be run in the Designer and click Run to open its view.
  3. Click the Advanced settings tab to open its view.

    The image above is for demonstration purposes only. The actual look of this view in your Studio can be different depending on the version of your Studio.

  4. In the JVM settings area, select the Use specific JVM arguments check box to activate the Argument list.
  5. Click the New button to open the [Set the VM Argument] dialog box and add the argument to indicate the directory where the bin folder of the winutils.exe program is located. In this example, the argument is -Dhadoop.home.dir=C:/tmp/winutils.
  6. Click OK to validate this change and then you can see this new argument appear in the Argument list.

Now when you run the Job, the missing wintutils.exe issue does not occur any more.

Note: This change is effective on the per-Job basis. If you want to make this implementation available to all Jobs, you can go to Window > Preferences > Talend > Run/Debug and repeat the operations described above about adding new arguments to add this -Dhadoop.home.dir=C:/tmp/winutils argument to the Argument list in this Run/Debug window.

Using the tSetEnv component

An alternative solution is to use the tSetEnv component in the Job to be run to define the same -Dhadoop.home.dir=C:/tmp/winutils parameter.

However, this solution can be applied to the Standard Jobs only because this tSetEnv component is available only to this type of Jobs.

Leveraging Talend JobServer

A third solution is to leverage Talend JobServer, a subscription-based feature, to execute your Job in Linux.

  1. Install Talend JobServer in the machine you want to use to run the Job.

    For more information, see Installing Jobservers on Talend Help Center.

  2. In Talend Studio, configure the distant run mode. This allows you to select the installed Talend JobServer as the target server for Job execution.

    For more information, see Status settings on Talend Help Center.

  3. In the Run view of the Job to be run, select the Target Exce tab and select the remote server you have configured for Job execution.
  4. Press F6 to run the Job.