Tag Archives: Hadoop

[Solved] Mapreducer Class Conversion error: java.lang.ClassCastException

An error was reported while writing the mapreducer.
java.lang.ClassCastException: class date2021_11_27_5.Commodity
    at java.lang.Class.asSubclass(Unknown Source)
    at org.apache.hadoop.mapred.JobConf.getOutputKeyComparator(JobConf.java:887)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:1004)
    at org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:402)
    at org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:81)
    at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:698)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:770)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
    at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
    at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
    at java.util.concurrent.FutureTask.run(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.lang.Thread.run(Unknown Source)
2021-11-29 10:00:26,301 INFO [org.apache.hadoop.mapred.LocalJobRunner] - map task executor complete.
  2021-11-29 10:00:26,302 WARN [org.apache.hadoop.mapred.LocalJobRunner] - job_local49120036_0001
  java.lang.Exception: java.io.IOException: Initialization of all the collectors failed. Error in last collector was :class date2021_11_27_5.Commodity
    at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: java.io.IOException: Initialization of all the collectors failed. Error in last collector was :class date2021_11_27_5.Commodity
    at org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:415)
    at org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:81)
    at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:698)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:770)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
    at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
    at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
    at java.util.concurrent.FutureTask.run(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.ClassCastException: class date2021_11_27_5.Commodity
    at java.lang.Class.asSubclass(Unknown Source)
    at org.apache.hadoop.mapred.JobConf.getOutputKeyComparator(JobConf.java:887)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:1004)
    at org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:402)
    ... 10 more

 

How to Solve:
This is because the custom entity class does not implement the WritableComparable<Commodity> interface, or does not override the compareTo method, resulting in a class conversion exception due to lack of serialization.
There is another possibility that
Implementing the WritableComparable<Commodity> interface instead of the WritableComparable<Commodity> interface because of the mapreducer rules
This implementation of two interfaces will result in the above error message when using a custom entity class for the value without problems, but when using a custom entity class for the key
Just reimplement the WritableComparable<Commodity> interface and guide the package
Hadoop specifies that if you implement the Writable and Comparable<Commodity> interfaces, the data of the custom entity class can only be used as value, but if you implement the WritableComparable<Commodity> interface, the data of the custom entity class can be used as both key and value.

wind System Black Windows Start hadoop Error [How to Solve]

wind system black window start hadoop error solution (continuously updated)
cmd command start process error cmd start prompt io error

cmd command start process error

Failed to setup local dir /tmp/hadoop-GK/nm-local-dir, which was marked as good. org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Permissions incorrectly set for dir /tmp/hadoop-GK/nm-local-dir/nmPrivate, should be rwx------, actual value = rwxrwx-

Solution: Run cmd as administrator

cmd startup prompt io error

IOException: Incompatible clusterIDs in D:\hadoop\3.0.3\data\dfs\datanode: namenode clusterID = CID-45d4d17f-96fd-4644-b0ee- 7835ef5bc790; datanode clusterID = CID-01f27c2a-6229-4a10-b098-89e89d4c62e4

Solution: Delete the data directory in hadoop and restart

Hadoop Startup Error: sbin/start-dfs.sh [How to Solve]

HADOOP Command sbin/start-dfs.sh Start Error
Error: Cannot find configuration directory: /etc/hadoop
JAVA_HME is not set and could not be found
Solution:
Configure hadoop-env.sh file in hadoop-2.7.1/etc/hadoop. write your own jdk and hadoop paths.
export JAVA_HOME=/usr/jdk1.8.0_221
export HADOOP_CONF_DIR=/usr/hadoop-2.7.1/etc/hadoop/

Datanode startup failed with an error: incompatible clusterids

Article catalog

Datanode failed to start and reported an error incompatible clusterids information error summary problem description problem cause analysis steps solution reference

Datanode startup failed with an error: incompatible clusterids

Information

Environment version: Hadoop 3.3.1 system version: CentOS 7.4 java version: Java se 1.8.0_ three hundred and one

Error report summary

java.io.IOException: Incompatible clusterIDs in /opt/module/hadoop-3.3.1/data/dfs/data: namenode clusterID = CID-aa23cfe4-9ad3-4c06-87fc-e862c8f3a722; datanode clusterID = CID-55fa9a51-7777-4ff4-87d6-4df7cf2cb8b9

Problem description

An error is reported when datanode is started. The contents of the error reported in/opt/module/hadoop-3.3.1/logs/hadoop-bordy-datanode-hadoop 102.log log are as follows:

2021-11-29 21:58:51,350 INFO org.apache.hadoop.hdfs.server.common.Storage: Using 1 threads to upgrade data directories (dfs.datanode.parallel.volumes.load.threads.num=1, dataDirs=1)
2021-11-29 21:58:51,354 INFO org.apache.hadoop.hdfs.server.common.Storage: Lock on /opt/module/hadoop-3.3.1/data/dfs/data/in_use.lock acquired by nodename 13694@hadoop102
2021-11-29 21:58:51,356 WARN org.apache.hadoop.hdfs.server.common.Storage: Failed to add storage directory [DISK]file:/opt/module/hadoop-3.3.1/data/dfs/data
java.io.IOException: Incompatible clusterIDs in /opt/module/hadoop-3.3.1/data/dfs/data: namenode clusterID = CID-aa23cfe4-9ad3-4c06-87fc-e862c8f3a722; datanode clusterID = CID-55fa9a51-7777-4ff4-87d6-4df7cf2cb8b9
        at org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:746)
        at org.apache.hadoop.hdfs.server.datanode.DataStorage.loadStorageDirectory(DataStorage.java:296)
        at org.apache.hadoop.hdfs.server.datanode.DataStorage.loadDataStorage(DataStorage.java:409)
        at org.apache.hadoop.hdfs.server.datanode.DataStorage.addStorageLocations(DataStorage.java:389)
        at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:561)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1753)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1689)
        at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:394)
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:295)
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:854)
        at java.lang.Thread.run(Thread.java:748)
2021-11-29 21:58:51,358 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for Block pool <registering> (Datanode Uuid a4eeff59-0192-4402-8278-4743158fa405) service to hadoop101/192.168.2.101:8020. Exiting.
java.io.IOException: All specified directories have failed to load.
        at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:562)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1753)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1689)
        at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:394)
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:295)
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:854)
        at java.lang.Thread.run(Thread.java:748)
2021-11-29 21:58:51,359 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Ending block pool service for: Block pool <registering> (Datanode Uuid a4eeff59-0192-4402-8278-4743158fa405) service to hadoop101/192.168.2.101:8020
2021-11-29 21:58:51,363 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Removed Block pool <registering> (Datanode Uuid a4eeff59-0192-4402-8278-4743158fa405)
2021-11-29 21:58:53,364 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Exiting Datanode
2021-11-29 21:58:53,424 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down DataNode at hadoop102/192.168.2.102
************************************************************/

Cause of problem

The upgrade function of Hadoop requires data node to store a permanent clusterid in its version file. When datanode starts, it will check and match the clusterid in the version file of namenode. If the two do not match, an exception of “incompatible clusterids” will appear. See the official CCR [hdfs-107]

Analysis steps

    View clusterid in the version file under /opt/module/hadoop-3.3.1/data/DFs/data/current in the datanode directory /opt/module/hadoop-3.3.1/data/DFs/name/current .
    view clusterid in the version file under /code> in the namnode directory /opt/module/hadoop-3.3.1/data/name/current . Br> found two files The clusterid in is missing and does not match. It is understood that in the HDFS architecture, each datanode needs to communicate with the namenode, and the clusterid is the unique ID of the namenode.

    terms of settlement

    Modify the clusterid value of the failed datanode to the clusterid of the primary namenode

    reference resources

    Hadoop failed to start datanode. There is a problem with clusterid - Wang Shen - blog Garden (cnblogs. Com)

Spring integrated HBase error [How to Solve]

Problem 1
ClassNotFoundException:org/springframework/data/hadoop/configuration/ConfigurationFactoryBean
Solution
Replace the jar package with spring-data-hadoop-1.0.0.RELEASE version
Problem 2
ClassNotFoundException:org/apache/hadoop/conf/Configuration
Solution
Introduce hadoop-client-3.1.3.jar and hadoop-common-3.1.3.jar
Problem 3
java.lang.NoClassDefFoundError: org/apache/commons/configuration2/ConfigurationSolution
Introduce commons-configuration2-2.3.jar
Problem 4
java.lang.NoClassDefFoundError: org/apache/hadoop/util/PlatformName
Solution
Introduce hadoop-auth-3.1.3.jar
Problem 5
java.lang.NoClassDefFoundError: org/apache/hadoop/mapred/JobConf
Solution
Introduce hadoop-mapreduce-client-common-3.1.3.jar, hadoop-mapreduce-client-core-3.1.3.jar and
hadoop-mapreduce-client-jobclient-3.1.3.jar
Problem 6
java.lang.NoClassDefFoundError: com/ctc/wstx/io/SystemId
Solution
Introduce woodstox-core-5.0.3.jar
Problem 7
java.lang.NoClassDefFoundError: com/google/common/collect/Interners
Solution
Introduce guava-30.1.1-jre.jar
Problem 8
java.lang.NoSuchMethodError: com.google.common.collect.MapMaker.keyEquivalence(Lcom/google/common/base/Equivalence;)Lcom/google/ common/collect/MapMaker
Solution
Remove the google-collect-1.0.jar package, guava conflict
Problem 9
java.lang.NoClassDefFoundError: com/fasterxml/jackson/core/JsonGenerator
Solution
Introduce jackson-annotations-2.12.4.jar, jackson-core-2.12.4.jar and jackson-databind-2.12.4.jar
Problem 10
java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/HBaseConfiguration
Solution
Introduce hbase-common-2.2.4.jar
Problem 11
java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/client/HTableInterface
Solution
After searching for a long time, I found that it is written in the configuration file
<bean id=”htemplate” class=”org.springframework.data.hadoop.hbase.HbaseTemplate”>
<property name=”configuration” ref=”hbaseConfiguration”>
</property>
</bean>
Comment it out Summary
Most of the problem is the lack of jar packages, Spring integration with Hbase requires 15 packages.
Among them.
spring-data-hadoop-1.0.0.RELEASE.jar
hadoop-client-3.1.3.jar
hadoop-common-3.1.3.jar
hadoop-auth-3.1.3.jar
hadoop-mapreduce-client-common-3.1.3.jar
hadoop-mapreduce-client-core-3.1.3.jar
hadoop-mapreduce-client-jobclient-3.1.3.jar
commons-configuration2-2.3.jar
guava-30.1.1-jre.jar
jackson-annotations-2.12.4.jar
jackson-core-2.12.4.jar
jackson-databind-2.12.4.jar
These packages are also required when integrating HDFS

Hadoop ERROR: Attempting to operate on hdfs namenode as root ERROR: but there is no HDFS_NAMENODE_US

In the sbin directory under the hadoop installation directory, respectively modify /start-dfs.sh /stop-dfs.sh /start-yarn.sh /stop-yarn.shThe content of the modified file should be added in the file #!/usr/bin/env bash

/start-dfs.sh /stop-dfs.sh Modified content (HADOOP_SECURE_DN_USER is replaced by HADOOP_SECURE_DN_USER.) above hadoop 3.2 version.)

HDFS_DATANODE_USER=root
HADOOP_SECURE_DN_USER=hdfs
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root

/start-yarn.sh  /stop-yarn.sh Modify content

YARN_RESOURCEMANAGER_USER=root
HADOOP_SECURE_DN_USER=yarn
YARN_NODEMANAGER_USER=root

[Solved] Beeline Error: Error: Could not open client transport with JDBC Failed to Connection

 

Error: Could not open client transport with JDBC Uri: jdbc:hive2://node01:10000: java.net.ConnectException: Connection refused (Connection refused) (state=08S01,code=0)

Error: unable to open client transport with JDBC URI: JDBC: hive2:// node01:10000: java.net.connectexception: connection rejected (connection rejected) (state = 08s01, code = 0)

Solution

When connecting to beeline through the command, it is found that the client connection fails

[ root@node03 ~]# beeline -u jdbc:hive2://node01:10000 -n root

Check port 10000 and find that it is not started

[ root@node01 ~]# netstat -anp|grep 10000

It takes time for hiveserver2 to start. You need to wait for a while. It will not start until hiveserver2 displays four hive session IDs (I just started four successfully).

Then I realized that no wonder the teacher mentioned that he had to wait a while to connect beeline.

This is a successful start, so don’t worry and deal with it calmly when you report an error.

Error in configuring Hadoop 3.1.3: attempting to operate on yarn nodemanager as root error

This may occur when HDFS and yarn services are turned on, and when HDFS and yarn services are turned off using scripts, the

solution may also occur

Add the following parameters to the top of start-dfs.sh and stop-dfs.sh (in SBIN of Hadoop installation directory)

HDFS_DATANODE_USER=root
HADOOP_SECURE_DN_USER=hdfs
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root

Add the following parameters to the top of start-yarn.sh and stop-yarn.sh (in SBIN of Hadoop installation directory)

YARN_RESOURCEMANAGER_USER=root
HADOOP_SECURE_DN_USER=yarn
YARN_NODEMANAGER_USER=root

Successfully solved the problem

[Solved] Sqoop Mysqltohive error: Error: java.lang.RuntimeException: java.lang.RuntimeException…

Problem Description
Execute statementbin/sqoop import --connect jdbc:mysql://localhost:3306/gdcmxy --username root --password root --table 2019bigdata --fields-terminated-by '\t' --delete-target-dir --num-mappers 1 --hive-import --hive-database gdcmxy --hive-table 2019bigdata

Error: java.lang.RuntimeException: java.lang.RuntimeException: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure

After seeing the error report, it was preliminarily judged that the main problem was the database connection problem. At first, I thought the firewall was not turned off, but considering that hadoo had certainly turned off the firewall before, it should have nothing to do with the firewall.

Cluster environment:

Solution:

Change the localhost of the execution statement to the IP address of the local machine bin/sqoop import -- connect JDBC: mysql://192.168.112.81:3306/gdcmxy --Username root -- password root -- table 2019bigdata -- fields terminated by '\ t' -- delete target dir -- num mappers 1 -- hive import -- hive database gdcmxy -- hive table 2019bigdata

successfully imported

Namenode startup error: outofmemoryerror: Java heap space

1. Find problems

Phenomenon: restart the Hadoop cluster, and the namenode reports an error and cannot be started.

Error reported:

2. Analyze problems

         As soon as you see the word “outofmemoryerror: Java heap space” in the error report, it should be the problem of JVM related parameters. Go to the hadoop-env.sh configuration file when. The configuration file settings are as follows:

export HADOOP_NAMENODE_OPTS="-Dhadoop.security.logger=${HADOOP_SECURITY_LOGGER:-INFO,RFAS} -Dhdfs.audit.logger=${HDFS_AUDIT_LOGGER:-INFO,NullAppender} $HADOOP_NAMENODE_OPTS" 
export HADOOP_DATANODE_OPTS="-Dhadoop.security.logger=ERROR,RFAS $HADOOP_DATANODE_OPTS"

         It can be seen from the above that the size of heap memory is not set in the parameter.

         The default heap memory size of roles (namenode, secondarynamenode, datanode) in the HDFS cluster is 1000m

3. Problem solving

         Change the parameters to the following, start the cluster again, and the start is successful.

export HADOOP_NAMENODE_OPTS="-Xms4096m -Xmx4096m -Dhadoop.security.logger=${HADOOP_SECURITY_LOGGER:-INFO,RFAS} -Dhdfs.audit.logger=${HDFS_AUDIT_LOGGER:-INFO,NullAppender} $HADOOP_NAMENODE_OPTS" 
export HADOOP_SECONDARYNAMENODE_OPTS="-Xms4096m -Xmx4096m -Dhadoop.security.logger=${HADOOP_SECURITY_LOGGER:-INFO,RFAS} -Dhdfs.audit.logger=${HDFS_AUDIT_LOGGER:-INFO,NullAppender} $HADOOP_NAMENODE_OPTS" 
export HADOOP_DATANODE_OPTS="-Xms2048M -Xmx2048M -Dhadoop.security.logger=ERROR,RFAS -Xmx4096m $HADOOP_DATANODE_OPTS"

Parameter Description:

        – Xmx4096m   Maximum heap memory available

        – Xms4096m   Initial heap memory

Reference: HDFS memory configuration – flowers are not fully opened * months are not round – blog Park

Sqoop connection gbase data error [How to Solve]

1. When sqoop accesses the gbase database through the command, the connection error is as follows:;

Just add the — driver parameter to the command

sqoop list-tables -connect jdbc: gbase://10.100.111.48:8010/dm -Driver com.gbase.jdbc.driver – username gbase – password gbase2010531
run successfully!

[Solved] Hive 2.3.9 Error: Error: Unrecognized column type: UNIONTYPE (state=,code=0)

Import uniontype data into test table with CSV file in hive_ After serializer, use select * from test_ An error occurred in the serializer

Error: Unrecognized column type: UNIONTYPE (state=,code=0)

Hive version: 2.3.9

After investigation, this is a bug that hive JDBC has solved in version 3.0.0  :  HIVE-17259

I want to try to solve the bug without upgrading the version. Imagine using the bug repair code of version 3.0.0 in the source code of hive JDBC of version 2.3.9, that is, make the following modifications:

1. Find the corresponding version of hive JDBC source code on the Apache website and download it

Apache Downloads

2. Find jdbccolumn.java and modify it

Add two lines of code as follows:

    } else if ("uniontype".equalsIgnoreCase(type)) {
      return Type.UNION_TYPE;

3. Package and generate a new JDBC jar file and copy it to hive server

Navigate to the JDBC directory in CMD and use the MVN package command to package

After packaging, find hive-jdbc-2.3.9-standalone.jar and hive-jdbc-2.3.9.jar in the target directory under the JDBC directory, copy them to {$hive_home}/JDBC and {$hive_home}/lib directories respectively, and restart hiverver2:

Implementation after repair: