Tag Archives: MapReduce

How to Solve Hadoop Missing Hadoop.dll and winutils.exe file error

The problem encountered today in running mapreduce locally:

Could not locate executable null \bin\winutils.exe in the hadoop binaries
Unable to load native-hadoop library for your platform… using builtin-Java classes where applicable

 

Reason:

  1. Miss winutils.exe file: Could not locate executable null \bin\winutils.exe in the hadoop binaries
  2. Miss hadoop.dll File: Unable to load native-hadoop library for your platform… using builtin-Java classes where applicable

 

Solution: Download these two files, download and import these two files into the bin directory under the hadoop directory

HIVE Error: Error: Error while processing statement: FAILED: Execution Error, return code 2 from org.apac

Cause: the container running on the slave machine tried to use too much memory and was killed by the nodemanager.

Solution: add memory

Set the memory configuration for map and reduce tasks in the configuration file mapred-site.xml of hadoop as follows: (The actual memory configured in the value needs to be modified according to the memory size of your machine and the application)

<property>
  <name>mapreduce.map.memory.mb</name>
  <value>1536</value>
</property>
<property>
  <name>mapreduce.map.java.opts</name>
  <value>-Xmx1024M</value>
</property>
<property>
  <name>mapreduce.reduce.memory.mb</name>
  <value>3072</value>
</property>
<property>
  <name>mapreduce.reduce.java.opts</name>
  <value>-Xmx2560M</value>
</property>

[Solved] Hadoop Error: Error: Could not find or load main class org.apache.hadoop.mapreduce.v2.app.MRAppMaster

Problem Description:

When testing yarn , starting the wordcount test case fails, and the following prompt appears

Error: Could not find or load main class org.apache.hadoop.mapreduce.v2.app.MRAppMaster

Please check whether your etc/hadoop/mapred-site.xml contains the below configuration:
<property>
  <name>yarn.app.mapreduce.am.env</name>
  <value>HADOOP_MAPRED_HOME=${full path of your hadoop distribution directory}<alue>
</property>
<property>
  <name>mapreduce.map.env</name>
  <value>HADOOP_MAPRED_HOME=${full path of your hadoop distribution directory}<alue>
</property>
<property>
  <name>mapreduce.reduce.env</name>
  <value>HADOOP_MAPRED_HOME=${full path of your hadoop distribution directory}<alue>
</property>

For more detailed output, check the application tracking page: http://hadoop103:8088/cluster/app/application_1638539388325_0001 Then click on links to logs of each attempt.
. Failing the application.

Cause analysis

Cannot find the main classpath

Solution:

Follow the prompts to add a classpath

At yen site XML and mapred site XML, add the following

<property>
	<name>yarn.application.classpath</name>
	<value>
		${HADOOP_HOME}/etc/*,
		${HADOOP_HOME}/etc/hadoop/*,
		${HADOOP_HOME}/lib/*,
		${HADOOP_HOME}/share/hadoop/common/*,
		${HADOOP_HOME}/share/hadoop/common/lib/*,
		${HADOOP_HOME}/share/hadoop/mapreduce/*,
		${HADOOP_HOME}/share/hadoop/mapreduce/lib-examples/*,
		${HADOOP_HOME}/share/hadoop/hdfs/*,
		${HADOOP_HOME}/share/hadoop/hdfs/lib/*,
		${HADOOP_HOME}/share/hadoop/yarn/*,
		${HADOOP_HOME}/share/hadoop/yarn/lib/*,
	</value>
</property>

Because ${hadoop_home} is used, you need to inherit environment variables in Yard site XML is added as follows, where Hadoop_Home is what we need. Just put the rest as needed. I put some commonly used ones here

<!--Inheritance of environment variables-->
<property>
  <name>yarn.nodemanager.env-whitelist</name>,
  <value>JAVA_HOME,HADOOP_HOME,HADOOP_COMMON_HOME, HADOOP_ HDFS_HOME, HADOOP_ CONF_DIR, CLASSPATH_PREPEND_DISTCACHE, HADOOP_YARN_HOME, HADOOP_MAPRED_HOME
  </value>
</property>

If the yarn service related to multiple servers is enabled, remember to configure each server

[Solved] Mapreducer Class Conversion error: java.lang.ClassCastException

An error was reported while writing the mapreducer.
java.lang.ClassCastException: class date2021_11_27_5.Commodity
    at java.lang.Class.asSubclass(Unknown Source)
    at org.apache.hadoop.mapred.JobConf.getOutputKeyComparator(JobConf.java:887)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:1004)
    at org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:402)
    at org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:81)
    at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:698)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:770)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
    at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
    at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
    at java.util.concurrent.FutureTask.run(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.lang.Thread.run(Unknown Source)
2021-11-29 10:00:26,301 INFO [org.apache.hadoop.mapred.LocalJobRunner] - map task executor complete.
  2021-11-29 10:00:26,302 WARN [org.apache.hadoop.mapred.LocalJobRunner] - job_local49120036_0001
  java.lang.Exception: java.io.IOException: Initialization of all the collectors failed. Error in last collector was :class date2021_11_27_5.Commodity
    at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: java.io.IOException: Initialization of all the collectors failed. Error in last collector was :class date2021_11_27_5.Commodity
    at org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:415)
    at org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:81)
    at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:698)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:770)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
    at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
    at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
    at java.util.concurrent.FutureTask.run(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.ClassCastException: class date2021_11_27_5.Commodity
    at java.lang.Class.asSubclass(Unknown Source)
    at org.apache.hadoop.mapred.JobConf.getOutputKeyComparator(JobConf.java:887)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:1004)
    at org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:402)
    ... 10 more

 

How to Solve:
This is because the custom entity class does not implement the WritableComparable<Commodity> interface, or does not override the compareTo method, resulting in a class conversion exception due to lack of serialization.
There is another possibility that
Implementing the WritableComparable<Commodity> interface instead of the WritableComparable<Commodity> interface because of the mapreducer rules
This implementation of two interfaces will result in the above error message when using a custom entity class for the value without problems, but when using a custom entity class for the key
Just reimplement the WritableComparable<Commodity> interface and guide the package
Hadoop specifies that if you implement the Writable and Comparable<Commodity> interfaces, the data of the custom entity class can only be used as value, but if you implement the WritableComparable<Commodity> interface, the data of the custom entity class can be used as both key and value.

Hadoop reports an error. Cannot access scala.serializable and python MapReduce reports an error

Record the problems encountered when doing school Hadoop homework. The homework is more basic, that is, calling Hadoop through makefile to execute the MapReduce program written in advance

Error 1

An error occurred in the Hadoop wordcount code

java: cannot access scala.Serializable class file for scala.Serializable not found

An error is reported

Solution:
through this Q & A on stack overflow, I guess that the scala version is incompatible with the Hadoop version, so rollback to 2.7 will solve the problem

Error report 2

Attempting to run Python on Hadoop. But an error is reported. The error information is not detailed:
insert a picture description here

solution:
add the following at the beginning of the source code:

#!/usr/bin/env python
# -*-coding:utf-8 -*

(the problem with the coding format is really that I don’t know how to debug it.)

Mr local operation error: nativeio $windows.access0

Article catalog

1. Problem 2. Solution

1. Problems

When running Mr task in win10 environment, the error is as follows:

D:\Java\jdk1.8.0_201\bin\java.exe "-javaagent:D:\Program Files\JetBrains\IntelliJ IDEA 2020.1\lib\idea_rt.jar=64181:D:\Program Files\JetBrains\IntelliJ IDEA 2020.1\bin" -Dfile.encoding=UTF-8 -classpath D:\Java\jdk1.8.0_201\jre\lib\charsets.jar;D:\Java\jdk1.8.0_201\jre\lib\deploy.jar;D:\Java\jdk1.8.0_201\jre\lib\ext\access-bridge-64.jar;D:\Java\jdk1.8.0_201\jre\lib\ext\cldrdata.jar;D:\Java\jdk1.8.0_201\jre\lib\ext\dnsns.jar;D:\Java\jdk1.8.0_201\jre\lib\ext\jaccess.jar;D:\Java\jdk1.8.0_201\jre\lib\ext\jfxrt.jar;D:\Java\jdk1.8.0_201\jre\lib\ext\localedata.jar;D:\Java\jdk1.8.0_201\jre\lib\ext\nashorn.jar;D:\Java\jdk1.8.0_201\jre\lib\ext\sunec.jar;D:\Java\jdk1.8.0_201\jre\lib\ext\sunjce_provider.jar;D:\Java\jdk1.8.0_201\jre\lib\ext\sunmscapi.jar;D:\Java\jdk1.8.0_201\jre\lib\ext\sunpkcs11.jar;D:\Java\jdk1.8.0_201\jre\lib\ext\zipfs.jar;D:\Java\jdk1.8.0_201\jre\lib\javaws.jar;D:\Java\jdk1.8.0_201\jre\lib\jce.jar;D:\Java\jdk1.8.0_201\jre\lib\jfr.jar;D:\Java\jdk1.8.0_201\jre\lib\jfxswt.jar;D:\Java\jdk1.8.0_201\jre\lib\jsse.jar;D:\Java\jdk1.8.0_201\jre\lib\management-agent.jar;D:\Java\jdk1.8.0_201\jre\lib\plugin.jar;D:\Java\jdk1.8.0_201\jre\lib\resources.jar;D:\Java\jdk1.8.0_201\jre\lib\rt.jar;D:\code_java\BigDataLearningDemo\MapReduceDemo\target\classes;D:\MavenRepos\junit\junit\4.13\junit-4.13.jar;D:\MavenRepos\org\hamcrest\hamcrest-core\1.3\hamcrest-core-1.3.jar;D:\MavenRepos\org\slf4j\slf4j-log4j12\1.7.30\slf4j-log4j12-1.7.30.jar;D:\MavenRepos\org\slf4j\slf4j-api\1.7.30\slf4j-api-1.7.30.jar;D:\MavenRepos\log4j\log4j\1.2.17\log4j-1.2.17.jar;D:\MavenRepos\org\apache\hadoop\hadoop-client\3.1.1\hadoop-client-3.1.1.jar;D:\MavenRepos\org\apache\hadoop\hadoop-common\3.1.1\hadoop-common-3.1.1.jar;D:\MavenRepos\com\google\guava\guava\11.0.2\guava-11.0.2.jar;D:\MavenRepos\commons-cli\commons-cli\1.2\commons-cli-1.2.jar;D:\MavenRepos\org\apache\commons\commons-math3\3.1.1\commons-math3-3.1.1.jar;D:\MavenRepos\org\apache\httpcomponents\httpclient\4.5.2\httpclient-4.5.2.jar;D:\MavenRepos\org\apache\httpcomponents\httpcore\4.4.4\httpcore-4.4.4.jar;D:\MavenRepos\commons-codec\commons-codec\1.11\commons-codec-1.11.jar;D:\MavenRepos\commons-io\commons-io\2.5\commons-io-2.5.jar;D:\MavenRepos\commons-net\commons-net\3.6\commons-net-3.6.jar;D:\MavenRepos\commons-collections\commons-collections\3.2.2\commons-collections-3.2.2.jar;D:\MavenRepos\org\eclipse\jetty\jetty-servlet\9.3.19.v20170502\jetty-servlet-9.3.19.v20170502.jar;D:\MavenRepos\org\eclipse\jetty\jetty-security\9.3.19.v20170502\jetty-security-9.3.19.v20170502.jar;D:\MavenRepos\org\eclipse\jetty\jetty-webapp\9.3.19.v20170502\jetty-webapp-9.3.19.v20170502.jar;D:\MavenRepos\org\eclipse\jetty\jetty-xml\9.3.19.v20170502\jetty-xml-9.3.19.v20170502.jar;D:\MavenRepos\javax\servlet\jsp\jsp-api\2.1\jsp-api-2.1.jar;D:\MavenRepos\com\sun\jersey\jersey-servlet\1.19\jersey-servlet-1.19.jar;D:\MavenRepos\commons-logging\commons-logging\1.1.3\commons-logging-1.1.3.jar;D:\MavenRepos\commons-lang\commons-lang\2.6\commons-lang-2.6.jar;D:\MavenRepos\commons-beanutils\commons-beanutils\1.9.3\commons-beanutils-1.9.3.jar;D:\MavenRepos\org\apache\commons\commons-configuration2\2.1.1\commons-configuration2-2.1.1.jar;D:\MavenRepos\org\apache\commons\commons-lang3\3.4\commons-lang3-3.4.jar;D:\MavenRepos\org\apache\avro\avro\1.7.7\avro-1.7.7.jar;D:\MavenRepos\org\codehaus\jackson\jackson-core-asl\1.9.13\jackson-core-asl-1.9.13.jar;D:\MavenRepos\org\codehaus\jackson\jackson-mapper-asl\1.9.13\jackson-mapper-asl-1.9.13.jar;D:\MavenRepos\com\thoughtworks\paranamer\paranamer\2.3\paranamer-2.3.jar;D:\MavenRepos\org\xerial\snappy\snappy-java\1.0.5\snappy-java-1.0.5.jar;D:\MavenRepos\com\google\re2j\re2j\1.1\re2j-1.1.jar;D:\MavenRepos\com\google\protobuf\protobuf-java\2.5.0\protobuf-java-2.5.0.jar;D:\MavenRepos\com\google\code\gson\gson\2.2.4\gson-2.2.4.jar;D:\MavenRepos\org\apache\hadoop\hadoop-auth\3.1.1\hadoop-auth-3.1.1.jar;D:\MavenRepos\com\nimbusds\nimbus-jose-jwt\4.41.1\nimbus-jose-jwt-4.41.1.jar;D:\MavenRepos\com\github\stephenc\jcip\jcip-annotations\1.0-1\jcip-annotations-1.0-1.jar;D:\MavenRepos\net\minidev\json-smart\2.3\json-smart-2.3.jar;D:\MavenRepos\net\minidev\accessors-smart\1.2\accessors-smart-1.2.jar;D:\MavenRepos\org\ow2\asm\asm\5.0.4\asm-5.0.4.jar;D:\MavenRepos\org\apache\curator\curator-framework\2.12.0\curator-framework-2.12.0.jar;D:\MavenRepos\org\apache\curator\curator-client\2.12.0\curator-client-2.12.0.jar;D:\MavenRepos\org\apache\curator\curator-recipes\2.12.0\curator-recipes-2.12.0.jar;D:\MavenRepos\com\google\code\findbugs\jsr305\3.0.0\jsr305-3.0.0.jar;D:\MavenRepos\org\apache\htrace\htrace-core4\4.1.0-incubating\htrace-core4-4.1.0-incubating.jar;D:\MavenRepos\org\apache\commons\commons-compress\1.4.1\commons-compress-1.4.1.jar;D:\MavenRepos\org\tukaani\xz\1.0\xz-1.0.jar;D:\MavenRepos\org\apache\kerby\kerb-simplekdc\1.0.1\kerb-simplekdc-1.0.1.jar;D:\MavenRepos\org\apache\kerby\kerb-client\1.0.1\kerb-client-1.0.1.jar;D:\MavenRepos\org\apache\kerby\kerby-config\1.0.1\kerby-config-1.0.1.jar;D:\MavenRepos\org\apache\kerby\kerb-core\1.0.1\kerb-core-1.0.1.jar;D:\MavenRepos\org\apache\kerby\kerby-pkix\1.0.1\kerby-pkix-1.0.1.jar;D:\MavenRepos\org\apache\kerby\kerby-asn1\1.0.1\kerby-asn1-1.0.1.jar;D:\MavenRepos\org\apache\kerby\kerby-util\1.0.1\kerby-util-1.0.1.jar;D:\MavenRepos\org\apache\kerby\kerb-common\1.0.1\kerb-common-1.0.1.jar;D:\MavenRepos\org\apache\kerby\kerb-crypto\1.0.1\kerb-crypto-1.0.1.jar;D:\MavenRepos\org\apache\kerby\kerb-util\1.0.1\kerb-util-1.0.1.jar;D:\MavenRepos\org\apache\kerby\token-provider\1.0.1\token-provider-1.0.1.jar;D:\MavenRepos\org\apache\kerby\kerb-admin\1.0.1\kerb-admin-1.0.1.jar;D:\MavenRepos\org\apache\kerby\kerb-server\1.0.1\kerb-server-1.0.1.jar;D:\MavenRepos\org\apache\kerby\kerb-identity\1.0.1\kerb-identity-1.0.1.jar;D:\MavenRepos\org\apache\kerby\kerby-xdr\1.0.1\kerby-xdr-1.0.1.jar;D:\MavenRepos\com\fasterxml\jackson\core\jackson-databind\2.7.8\jackson-databind-2.7.8.jar;D:\MavenRepos\com\fasterxml\jackson\core\jackson-core\2.7.8\jackson-core-2.7.8.jar;D:\MavenRepos\org\codehaus\woodstox\stax2-api\3.1.4\stax2-api-3.1.4.jar;D:\MavenRepos\com\fasterxml\woodstox\woodstox-core\5.0.3\woodstox-core-5.0.3.jar;D:\MavenRepos\org\apache\hadoop\hadoop-hdfs-client\3.1.1\hadoop-hdfs-client-3.1.1.jar;D:\MavenRepos\com\squareup\okhttp\okhttp\2.7.5\okhttp-2.7.5.jar;D:\MavenRepos\com\squareup\okio\okio\1.6.0\okio-1.6.0.jar;D:\MavenRepos\com\fasterxml\jackson\core\jackson-annotations\2.7.8\jackson-annotations-2.7.8.jar;D:\MavenRepos\org\apache\hadoop\hadoop-yarn-api\3.1.1\hadoop-yarn-api-3.1.1.jar;D:\MavenRepos\javax\xml\bind\jaxb-api\2.2.11\jaxb-api-2.2.11.jar;D:\MavenRepos\org\apache\hadoop\hadoop-yarn-client\3.1.1\hadoop-yarn-client-3.1.1.jar;D:\MavenRepos\org\apache\hadoop\hadoop-mapreduce-client-core\3.1.1\hadoop-mapreduce-client-core-3.1.1.jar;D:\MavenRepos\org\apache\hadoop\hadoop-yarn-common\3.1.1\hadoop-yarn-common-3.1.1.jar;D:\MavenRepos\javax\servlet\javax.servlet-api\3.1.0\javax.servlet-api-3.1.0.jar;D:\MavenRepos\org\eclipse\jetty\jetty-util\9.3.19.v20170502\jetty-util-9.3.19.v20170502.jar;D:\MavenRepos\com\sun\jersey\jersey-core\1.19\jersey-core-1.19.jar;D:\MavenRepos\javax\ws\rs\jsr311-api\1.1.1\jsr311-api-1.1.1.jar;D:\MavenRepos\com\sun\jersey\jersey-client\1.19\jersey-client-1.19.jar;D:\MavenRepos\com\fasterxml\jackson\module\jackson-module-jaxb-annotations\2.7.8\jackson-module-jaxb-annotations-2.7.8.jar;D:\MavenRepos\com\fasterxml\jackson\jaxrs\jackson-jaxrs-json-provider\2.7.8\jackson-jaxrs-json-provider-2.7.8.jar;D:\MavenRepos\com\fasterxml\jackson\jaxrs\jackson-jaxrs-base\2.7.8\jackson-jaxrs-base-2.7.8.jar;D:\MavenRepos\org\apache\hadoop\hadoop-mapreduce-client-jobclient\3.1.1\hadoop-mapreduce-client-jobclient-3.1.1.jar;D:\MavenRepos\org\apache\hadoop\hadoop-mapreduce-client-common\3.1.1\hadoop-mapreduce-client-common-3.1.1.jar;D:\MavenRepos\org\apache\hadoop\hadoop-annotations\3.1.1\hadoop-annotations-3.1.1.jar com.learning.nokerberos.mapreduce2.sgg9reduceJoin.TableDriver
2021-08-20 08:53:01,390 WARN [org.apache.hadoop.util.NativeCodeLoader] - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2021-08-20 08:53:01,852 INFO [org.apache.commons.beanutils.FluentPropertyBeanIntrospector] - Error when creating PropertyDescriptor for public final void org.apache.commons.configuration2.AbstractConfiguration.setProperty(java.lang.String,java.lang.Object)! Ignoring this property.
2021-08-20 08:53:01,864 WARN [org.apache.hadoop.metrics2.impl.MetricsConfig] - Cannot locate configuration: tried hadoop-metrics2-jobtracker.properties,hadoop-metrics2.properties
2021-08-20 08:53:01,912 INFO [org.apache.hadoop.metrics2.impl.MetricsSystemImpl] - Scheduled Metric snapshot period at 10 second(s).
2021-08-20 08:53:01,912 INFO [org.apache.hadoop.metrics2.impl.MetricsSystemImpl] - JobTracker metrics system started
2021-08-20 08:53:02,382 WARN [org.apache.hadoop.mapreduce.JobResourceUploader] - Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
2021-08-20 08:53:02,438 WARN [org.apache.hadoop.mapreduce.JobResourceUploader] - No job jar file set.  User classes may not be found. See Job or Job#setJar(String).
2021-08-20 08:53:02,457 INFO [org.apache.hadoop.mapreduce.JobSubmitter] - Cleaning up the staging area file:/tmp/hadoop/mapred/staging/264741960596537/.staging/job_local1960596537_0001
Exception in thread "main" java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z
	at org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Native Method)
	at org.apache.hadoop.io.nativeio.NativeIO$Windows.access(NativeIO.java:640)
	at org.apache.hadoop.fs.FileUtil.canRead(FileUtil.java:1223)
	at org.apache.hadoop.fs.FileUtil.list(FileUtil.java:1428)
	at org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:468)
	at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1919)
	at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1961)
	at org.apache.hadoop.fs.ChecksumFileSystem.listStatus(ChecksumFileSystem.java:678)
	at org.apache.hadoop.fs.Globber.listStatus(Globber.java:77)
	at org.apache.hadoop.fs.Globber.doGlob(Globber.java:235)
	at org.apache.hadoop.fs.Globber.glob(Globber.java:149)
	at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:2085)
	at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:303)
	at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:274)
	at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:396)
	at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:313)
	at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:330)
	at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:203)
	at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1570)
	at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1567)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
	at org.apache.hadoop.mapreduce.Job.submit(Job.java:1567)
	at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1588)
	at com.learning.nokerberos.mapreduce2.sgg9reduceJoin.TableDriver.main(TableDriver.java:37)

Process finished with exit code 1

2. Solution

Copy the Hadoop. DLL file in the D: \ hadoop-3.1.1 \ bin directory to the C: \ windows \ system32 directory, and re run the MR task successfully

Some problems in the development of HBase MapReduce

Recently in the course design, the main process is to collect data from the CSV file, store it in HBase, and then use MapReduce for statistical analysis of the data. During this period, we encountered some problems, which were finally solved through various searches. Record these problems and their solutions here.

1. HBase hmaster auto close problem

Enter zookeeper, delete HBase data (use with caution), and restart HBase

./zkCli.sh
rmr /hbase
stop-hbase.sh 
start-hbase.sh 

2. Dealing with multi module dependency when packaging with Maven

The project structure is shown in the figure below

ETL and statistics both refer to the common module. When they are packaged separately, they are prompted that the common dependency cannot be found and the packaging fails.

Solution steps:

1. To do Maven package and Maven install for common, I use idea to operate directly in Maven on the right.

2. Run Maven clean and Maven install commands in the outermost total project (root)

After completing these two steps, the problem can be solved.

3. When the Chinese language is stored in HBase, it becomes a form similar to “ XE5  x8f  x91  Xe6  x98  x8e”

The classic Chinese encoding problem can be solved by calling the following method before using.

public static String decodeUTF8Str(String xStr) throws UnsupportedEncodingException {
        return URLDecoder.decode(xStr.replaceAll("\\\\x", "%"), "utf-8");
    }

4. Error in job submission of MapReduce

Write the code locally, type it into a jar package and run it on the server. The error is as follows:

Exception in thread "main" java.lang.NoSuchMethodError: org.apache.hadoop.mapreduce.Job.getArchiveSharedCacheUploadPolicies(Lorg/apache/hadoop/conf/Configuration;)Ljava/util/Map;
    at org.apache.hadoop.mapreduce.v2.util.MRApps.setupDistributedCache(MRApps.java:491)
    at org.apache.hadoop.mapred.LocalDistributedCacheManager.setup(LocalDistributedCacheManager.java:92)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.<init>(LocalJobRunner.java:172)
    at org.apache.hadoop.mapred.LocalJobRunner.submitJob(LocalJobRunner.java:788)
    at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:240)
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762)
    at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
    at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1308)
    at MapReduce.main(MapReduce.java:49)

Solution: add dependencies

<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-mapreduce-client-core</artifactId>
    <version>3.1.3</version>
</dependency>

 <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-mapreduce-client-common</artifactId>
            <version>3.1.3</version>
        </dependency>

<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-common</artifactId>
    <version>3.1.3</version>
</dependency>

<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-mapreduce-client-jobclient</artifactId>
    <version>3.1.3</version>
    <scope>provided</scope>
</dependency>

Among them

Hadoop-mapreduce-client-core.jar supports running on a cluster

Hadoop-mapreduce-client-common.jar supports running locally

After solving the above problems, my code can run smoothly on the server.

Finally, it should be noted that the output path of MapReduce cannot already exist, otherwise an error will be reported.

I hope this article can help you with similar problems.

[Solved] Caused by: java.sql.SQLException: Access denied for user ‘root‘@‘hadoop102‘ (using password: YES)

Initialize hive metabase   When schematool – initschema – dbtype MySQL – verb is used, the following error occurs:

org.apache.hadoop.hive.metastore.HiveMetaException: Failed to get schema version.
Underlying cause: java.sql.SQLException : Access denied for user ‘root’@’hadoop102’ (using password: YES)     SQL Error code: 1045


Cause analysis:

“Access denied” means that hive is refused to connect to MySQL database, and there is a problem in account permission or password configuration to connect to MySQL database.

When configuring the Metastore to MySQL, configure the hive-site.xml. It is found that the password connecting to the MySQL database in the figure below is forgotten to be modified when pasting the configuration;

After correction, initialize hive metadata and execute the following command:

 schematool -initSchema -dbType mysql -verbose

Initialization complete.

This error report is due to my carelessness, which may not be universal. It is only used to remind you to check these two configurations!

[Solved] Hadoop error java.lang.nosuchmethoderror

Record a Hadoop error:

Exception in thread "main" java.lang.NoSuchMethodError: org.apache.hadoop.mapreduce.Job.getArchiveSharedCacheUploadPolicies(Lorg/apache/hadoop/conf/Configuration;)Ljava/util/Map;
	at org.apache.hadoop.mapreduce.v2.util.MRApps.setupDistributedCache(MRApps.java:491)
	at org.apache.hadoop.mapred.LocalDistributedCacheManager.setup(LocalDistributedCacheManager.java:93)
	at org.apache.hadoop.mapred.LocalJobRunner$Job.<init>(LocalJobRunner.java:172)
	at org.apache.hadoop.mapred.LocalJobRunner.submitJob(LocalJobRunner.java:794)
	at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:240)
	at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
	at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1746)
	at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
	at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1308)
	at hadoop.mapjoin.MapJoinDriver.main(MapJoinDriver.java:59)

At this point, the dependency introduced in POM. XML is

 <dependency>
            <groupId>org.springframework</groupId>
            <artifactId>spring-web</artifactId>
            <version>4.3.16.RELEASE</version>
        </dependency>
        <!--Dependencies used by hbase-->
        <dependency>
            <groupId>org.apache.hbase</groupId>
            <artifactId>hbase-client</artifactId>
            <version>2.0.0</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hbase</groupId>
            <artifactId>hbase-server</artifactId>
            <version>2.0.0</version>
        </dependency>

        <!--hadoop dependencies-->
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-client</artifactId>
            <version>3.2.2</version>
        </dependency>
        <dependency>
            <groupId>org.projectlombok</groupId>
            <artifactId>lombok</artifactId>
            <version>1.16.18</version>
        </dependency>

        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <version>4.12</version>
        </dependency>
        <!--Log information-->
        <dependency>
            <groupId>org.slf4j</groupId>
            <artifactId>slf4j-log4j12</artifactId>
            <version>1.7.30</version>
        </dependency>

However, after the dependency of HBase is removed, wordcount can run normally. Is it a problem with both versions
the corresponding relationship of the recommended version on the official website, and the link to view on the official website http://hbase.apache.org/book.html#java

Common errors and solutions in MapReduce stage

1) Package guide is error prone. Especially text and combinetextinputformat. 2) The first input parameter in mapper must be longwritable or nullwritable, not intwritable.
the error reported is a type conversion exception.
3) java.lang.Exception : java.io.IOException : Legal partition for 13926435656 (4), indicating that partition
is not matched with the number of reducetask, so adjust the number of reducetask. 4) If the number of partitions is not 1, but reductask is 1, do you want to execute the partitioning process. The answer is: no partitioning.
Because in the source code of maptask, the premise of partition execution is to determine whether the number of reducenums is greater than 1. No more than 1 is definitely not implemented.
5) import the jar package compiled in Windows environment into linux environment to run,
Hadoop jar wc.jar com.atguigu.mapreduce . wordcount.WordCountDriver /User/atguigu/
/user/atguigu/output
reports the following error:
exception in thread “main” java.lang.UnsupportedClassVersionError :
com/atguigu/mapreduce/wordcount/WordCountDriver : Unsupported major.minor Version 52.0
the reason is that jdk1.7 is used in Windows environment and JDK1.8 is used in Linux environment.
Solution: unified JDK version.
6) cache pd.txt In the case of small files, the report cannot be found pd.txt File
reason: most of them are path writing errors. There is also to check pd.txt.txt It’s a matter of time. There are also some computers that write relative paths
that cannot be found pd.txt , which can be changed to absolute path. 7) Report type conversion exception.
It’s usually a writing error when setting the map output and final output in the driver function.
If the key output from map is not sorted, an exception of type conversion will be reported.
8) running in cluster wc.jar An error occurred when unable to get the input file.
Reason: the input file of wordcount case cannot be placed in the root directory of HDFS cluster.
9) the following related exceptions appear
exception in thread “main” java.lang.UnsatisfiedLinkError :
org.apache.hadoop . io.nativeio.NativeIO

W

i

n

d

o

w

s

.

a

c

c

e

s

s

0

(

L

j

a

v

a

/

l

a

n

g

/

S

t

r

i

n

g

;

I

)

Z

a

t

o

r

g

.

a

p

a

c

h

e

.

h

a

d

o

o

p

.

i

o

.

n

a

t

i

v

e

i

o

.

N

a

t

i

v

e

I

O

Windows.access0 (Ljava/lang/String;I)Z at org.apache.hadoop . io.nativeio.NativeIO

Windows.access0 (Ljava/lang/String; I) Zatorg.apache.hadoop . io.nativeio.NativeIOWindows .access0(Native Method)
at org.apache.hadoop . io.nativeio.NativeIO $ Windows.access ( NativeIO.java:609 )
at org.apache.hadoop . fs.FileUtil.canRead ( FileUtil.java:977 )
java.io.IOException : Could not locate executable null\bin\ winutils.exe in the Hadoop binaries.
at org.apache.hadoop . util.Shell.getQualifiedBinPath ( Shell.java:356 )
at org.apache.hadoop . util.Shell.getWinUtilsPath ( Shell.java:371 )
at org.apache.hadoop . util.Shell .( Shell.java:364 )
solution: copy hadoop.dll File to the windows directory C::?Windows?System32. Some students need to modify the Hadoop source code.
Scheme 2: create the following package name and NativeIO.java Copy to the package name
10) when customizing the output format, note that the close method in recordwirter must close the stream resource. Otherwise, the data in the output file is empty.
@Override
public void close(TaskAttemptContext context) throws IOException,
InterruptedException {
if (atguigufos != null) {
atguigufos.close ();
}
if (otherfos != null) {
otherfos.close ();
} }

java.lang.IllegalArgumentException : URI scheme is not “file” error resolution

Java. Lang. IllegalArgumentException: URI scheme is not “file” error

The code in
Map side setup is as follows

        URI[] uris = context.getCacheFiles();
        File file = new File(uris[0]);
        BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream(file)));

Code cached in the Driver phase

job.addCacheFile(new URI("/MR/job/input/com.txt"));

After thinking about it and looking for some forums on the Internet, I thought it might be what I wanted to do, but I could not do it with File, so I should do it with Stream.
file f = new file (UI)
le f = new file (UI)
file f = new file (UI)
file f = new file (UI); The file protocol
is now the URL(” http://… “) ) it is impossible to document
so will some changes have been made the setup phase of the code

        URI[] cacheFiles = context.getCacheFiles();
        FileSystem fileSystem = FileSystem.get(cacheFiles[0], context.getConfiguration());
        FSDataInputStream inputStream = fileSystem.open(new Path(cacheFiles[0]));
        BufferedReader br = new BufferedReader(new InputStreamReader(inputStream));

Found that after the change was really good