The missing winutils.exe program in the Big Data Jobs
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries
How to reproduce the missing winutils.exe program in the Big Data Jobs
This issue can be reproduced in several ways. The following procedure presents one example.
- In Talend Studio, Create a MapReduce Job or use Hadoop-enabled components such as Pig, Hive or Sqoop to create a Standard Job in order to run MapReduce computations. The Talend MapReduce Job is available only in the subscription-based Big Data products of Talend.
- In this Job, select for example Hortonworks Data Platform or Cloudera as the target Hadoop cluster to work with.
- Run this Job in Windows.
How to repair the missing winutils.exe error in the Big Data Jobs
Adding a specific JVM argument
- You have designed your Talend Big Data Job such as a Standard Job using the Big Data components or a Talend Map/Reduce Job or a Spark Job.
- Download the winutils.exe program from, for example, http://public-repo-1.hortonworks.com/hdp-win-alpha/winutils.exe, and put it in a bin folder in the Windows client machine, such as C:/tmp/winutils/bin.
- In your Talend Studio, open the Big Data Job to be run in the Designer and click Run to open its view.
- Click the Advanced settings tab to open its view.
The image above is for demonstration purposes only. The actual look of this view in your Studio can be different depending on the version of your Studio.
- In the JVM settings area, select the Use specific JVM arguments check box to activate the Argument list.
- Click the New button to open the [Set the VM Argument] dialog box and add the argument to indicate the directory where the bin folder of the winutils.exe program is located. In this example, the argument is -Dhadoop.home.dir=C:/tmp/winutils.
- Click OK to validate this change and then you can see this new argument appear in the Argument list.
Now when you run the Job, the missing wintutils.exe issue does not occur any more.
Using the tSetEnv component
An alternative solution is to use the tSetEnv component in the Job to be run to define the same -Dhadoop.home.dir=C:/tmp/winutils parameter.
However, this solution can be applied to the Standard Jobs only because this tSetEnv component is available only to this type of Jobs.
Leveraging Talend JobServer
A third solution is to leverage Talend JobServer, a subscription-based feature, to execute your Job in Linux.
- Install Talend JobServer in
the machine you want to use to run the Job.
For more information, see Installing Jobservers on Talend Help Center.
- In Talend Studio,
configure the distant run mode. This allows you to select the installed Talend JobServer as
the target server for Job execution.
For more information, see Status settings on Talend Help Center.
- In the Run view of the Job to be run, select the Target Exce tab and select the remote server you have configured for Job execution.
- Press F6 to run the Job.