Tag Archives: Hadoop

[Solved] Mapreducer Class Conversion error: java.lang.ClassCastException

An error was reported while writing the mapreducer.

java.lang.ClassCastException: class date2021_11_27_5.Commodity
    at java.lang.Class.asSubclass(Unknown Source)
    at org.apache.hadoop.mapred.JobConf.getOutputKeyComparator(JobConf.java:887)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:1004)
    at org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:402)
    at org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:81)
    at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:698)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:770)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
    at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
    at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
    at java.util.concurrent.FutureTask.run(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.lang.Thread.run(Unknown Source)
2021-11-29 10:00:26,301 INFO [org.apache.hadoop.mapred.LocalJobRunner] - map task executor complete.
  2021-11-29 10:00:26,302 WARN [org.apache.hadoop.mapred.LocalJobRunner] - job_local49120036_0001
  java.lang.Exception: java.io.IOException: Initialization of all the collectors failed. Error in last collector was :class date2021_11_27_5.Commodity
    at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: java.io.IOException: Initialization of all the collectors failed. Error in last collector was :class date2021_11_27_5.Commodity
    at org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:415)
    at org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:81)
    at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:698)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:770)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
    at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
    at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
    at java.util.concurrent.FutureTask.run(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.ClassCastException: class date2021_11_27_5.Commodity
    at java.lang.Class.asSubclass(Unknown Source)
    at org.apache.hadoop.mapred.JobConf.getOutputKeyComparator(JobConf.java:887)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:1004)
    at org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:402)
    ... 10 more

How to Solve:
This is because the custom entity class does not implement the WritableComparable<Commodity> interface, or does not override the compareTo method, resulting in a class conversion exception due to lack of serialization.
There is another possibility that
Implementing the WritableComparable<Commodity> interface instead of the WritableComparable<Commodity> interface because of the mapreducer rules
This implementation of two interfaces will result in the above error message when using a custom entity class for the value without problems, but when using a custom entity class for the key
Just reimplement the WritableComparable<Commodity> interface and guide the package
Hadoop specifies that if you implement the Writable and Comparable<Commodity> interfaces, the data of the custom entity class can only be used as value, but if you implement the WritableComparable<Commodity> interface, the data of the custom entity class can be used as both key and value.

wind System Black Windows Start hadoop Error [How to Solve]

wind system black window start hadoop error solution (continuously updated)
cmd command start process error cmd start prompt io error

cmd command start process error

Failed to setup local dir /tmp/hadoop-GK/nm-local-dir, which was marked as good. org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Permissions incorrectly set for dir /tmp/hadoop-GK/nm-local-dir/nmPrivate, should be rwx------, actual value = rwxrwx-

Solution: Run cmd as administrator

cmd startup prompt io error

IOException: Incompatible clusterIDs in D:\hadoop\3.0.3\data\dfs\datanode: namenode clusterID = CID-45d4d17f-96fd-4644-b0ee- 7835ef5bc790; datanode clusterID = CID-01f27c2a-6229-4a10-b098-89e89d4c62e4

Solution: Delete the data directory in hadoop and restart

Datanode startup failed with an error: incompatible clusterids

Article catalog

Datanode failed to start and reported an error incompatible clusterids information error summary problem description problem cause analysis steps solution reference

Datanode startup failed with an error: incompatible clusterids

Information

Environment version: Hadoop 3.3.1 system version: CentOS 7.4 java version: Java se 1.8.0_ three hundred and one

Error report summary

java.io.IOException: Incompatible clusterIDs in /opt/module/hadoop-3.3.1/data/dfs/data: namenode clusterID = CID-aa23cfe4-9ad3-4c06-87fc-e862c8f3a722; datanode clusterID = CID-55fa9a51-7777-4ff4-87d6-4df7cf2cb8b9

Problem description

An error is reported when datanode is started. The contents of the error reported in/opt/module/hadoop-3.3.1/logs/hadoop-bordy-datanode-hadoop 102.log log are as follows:

2021-11-29 21:58:51,350 INFO org.apache.hadoop.hdfs.server.common.Storage: Using 1 threads to upgrade data directories (dfs.datanode.parallel.volumes.load.threads.num=1, dataDirs=1)
2021-11-29 21:58:51,354 INFO org.apache.hadoop.hdfs.server.common.Storage: Lock on /opt/module/hadoop-3.3.1/data/dfs/data/in_use.lock acquired by nodename 13694@hadoop102
2021-11-29 21:58:51,356 WARN org.apache.hadoop.hdfs.server.common.Storage: Failed to add storage directory [DISK]file:/opt/module/hadoop-3.3.1/data/dfs/data
java.io.IOException: Incompatible clusterIDs in /opt/module/hadoop-3.3.1/data/dfs/data: namenode clusterID = CID-aa23cfe4-9ad3-4c06-87fc-e862c8f3a722; datanode clusterID = CID-55fa9a51-7777-4ff4-87d6-4df7cf2cb8b9
        at org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:746)
        at org.apache.hadoop.hdfs.server.datanode.DataStorage.loadStorageDirectory(DataStorage.java:296)
        at org.apache.hadoop.hdfs.server.datanode.DataStorage.loadDataStorage(DataStorage.java:409)
        at org.apache.hadoop.hdfs.server.datanode.DataStorage.addStorageLocations(DataStorage.java:389)
        at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:561)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1753)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1689)
        at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:394)
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:295)
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:854)
        at java.lang.Thread.run(Thread.java:748)
2021-11-29 21:58:51,358 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for Block pool <registering> (Datanode Uuid a4eeff59-0192-4402-8278-4743158fa405) service to hadoop101/192.168.2.101:8020. Exiting.
java.io.IOException: All specified directories have failed to load.
        at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:562)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1753)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1689)
        at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:394)
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:295)
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:854)
        at java.lang.Thread.run(Thread.java:748)
2021-11-29 21:58:51,359 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Ending block pool service for: Block pool <registering> (Datanode Uuid a4eeff59-0192-4402-8278-4743158fa405) service to hadoop101/192.168.2.101:8020
2021-11-29 21:58:51,363 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Removed Block pool <registering> (Datanode Uuid a4eeff59-0192-4402-8278-4743158fa405)
2021-11-29 21:58:53,364 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Exiting Datanode
2021-11-29 21:58:53,424 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down DataNode at hadoop102/192.168.2.102
************************************************************/

Cause of problem

The upgrade function of Hadoop requires data node to store a permanent clusterid in its version file. When datanode starts, it will check and match the clusterid in the version file of namenode. If the two do not match, an exception of “incompatible clusterids” will appear. See the official CCR [hdfs-107]

Analysis steps

clusterid

version

/opt/module/hadoop-3.3.1/data/DFs/data/current

/opt/module/hadoop-3.3.1/data/DFs/name/current

clusterid

version

/code> in the namnode directory /opt/module/hadoop-3.3.1/data/name/current . Br>  found two files The clusterid in is missing and does not match. It is understood that in the HDFS architecture, each datanode needs to communicate with the namenode, and the clusterid is the unique ID of the namenode. 
terms of settlement
Modify the clusterid value of the failed datanode to the clusterid of the primary namenode
reference resources
Hadoop failed to start datanode. There is a problem with clusterid - Wang Shen - blog Garden (cnblogs. Com)


		
		
			This entry was posted in How to Fix and tagged Big data, Hadoop, hdfs on 2021-12-03 by Robins.



	
				
			
						
				Spring integrated HBase error [How to Solve]
			
								


				
			Problem 1

ClassNotFoundException:org/springframework/data/hadoop/configuration/ConfigurationFactoryBean

Solution

Replace the jar package with spring-data-hadoop-1.0.0.RELEASE version

Problem 2

ClassNotFoundException:org/apache/hadoop/conf/Configuration

Solution

Introduce hadoop-client-3.1.3.jar and hadoop-common-3.1.3.jar

Problem 3

java.lang.NoClassDefFoundError: org/apache/commons/configuration2/ConfigurationSolution

Introduce commons-configuration2-2.3.jar

Problem 4

java.lang.NoClassDefFoundError: org/apache/hadoop/util/PlatformName

Solution

Introduce hadoop-auth-3.1.3.jar

Problem 5

java.lang.NoClassDefFoundError: org/apache/hadoop/mapred/JobConf

Solution

Introduce hadoop-mapreduce-client-common-3.1.3.jar, hadoop-mapreduce-client-core-3.1.3.jar and

hadoop-mapreduce-client-jobclient-3.1.3.jar

Problem 6

java.lang.NoClassDefFoundError: com/ctc/wstx/io/SystemId

Solution

Introduce woodstox-core-5.0.3.jar

Problem 7

java.lang.NoClassDefFoundError: com/google/common/collect/Interners

Solution

Introduce guava-30.1.1-jre.jar

Problem 8

java.lang.NoSuchMethodError: com.google.common.collect.MapMaker.keyEquivalence(Lcom/google/common/base/Equivalence;)Lcom/google/ common/collect/MapMaker

Solution

Remove the google-collect-1.0.jar package, guava conflict

Problem 9

java.lang.NoClassDefFoundError: com/fasterxml/jackson/core/JsonGenerator

Solution

Introduce jackson-annotations-2.12.4.jar, jackson-core-2.12.4.jar and jackson-databind-2.12.4.jar

Problem 10

java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/HBaseConfiguration

Solution

Introduce hbase-common-2.2.4.jar

Problem 11

java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/client/HTableInterface

Solution

After searching for a long time, I found that it is written in the configuration file

<bean id=”htemplate” class=”org.springframework.data.hadoop.hbase.HbaseTemplate”>

<property name=”configuration” ref=”hbaseConfiguration”>

</property>

</bean>

Comment it out Summary

Most of the problem is the lack of jar packages, Spring integration with Hbase requires 15 packages.

Among them.

spring-data-hadoop-1.0.0.RELEASE.jar

hadoop-client-3.1.3.jar

hadoop-common-3.1.3.jar

hadoop-auth-3.1.3.jar

hadoop-mapreduce-client-common-3.1.3.jar

hadoop-mapreduce-client-core-3.1.3.jar

hadoop-mapreduce-client-jobclient-3.1.3.jar

commons-configuration2-2.3.jar

guava-30.1.1-jre.jar

jackson-annotations-2.12.4.jar

jackson-core-2.12.4.jar

jackson-databind-2.12.4.jar

These packages are also required when integrating HDFS
					

		
		
			This entry was posted in JAVA and tagged Big data, Hadoop, hbase, java, spring on 2021-11-24 by Robins.								

	


	
				
			
						
				Hadoop ERROR: Attempting to operate on hdfs namenode as root ERROR: but there is no HDFS_NAMENODE_US
			
								


				
			In the sbin directory under the hadoop installation directory, respectively modify /start-dfs.sh /stop-dfs.sh /start-yarn.sh /stop-yarn.shThe content of the modified file should be added in the file #!/usr/bin/env bash
/start-dfs.sh /stop-dfs.sh Modified content (HADOOP_SECURE_DN_USER is replaced by HADOOP_SECURE_DN_USER.) above hadoop 3.2 version.)
HDFS_DATANODE_USER=root

HADOOP_SECURE_DN_USER=hdfs

HDFS_NAMENODE_USER=root

HDFS_SECONDARYNAMENODE_USER=root
/start-yarn.sh  /stop-yarn.sh Modify content
YARN_RESOURCEMANAGER_USER=root

HADOOP_SECURE_DN_USER=yarn

YARN_NODEMANAGER_USER=root

					

		
		
			This entry was posted in Error and tagged Big data, front end, Hadoop on 2021-11-21 by Robins.								

	


	
				
			
						
				[Solved] Beeline Error: Error: Could not open client transport with JDBC Failed to Connection
			
								


				
			 
Error: Could not open client transport with JDBC Uri: jdbc:hive2://node01:10000: java.net.ConnectException: Connection refused (Connection refused) (state=08S01,code=0)
Error: unable to open client transport with JDBC URI: JDBC: hive2:// node01:10000: java.net.connectexception: connection rejected (connection rejected) (state = 08s01, code = 0)

Solution
When connecting to beeline through the command, it is found that the client connection fails
[ root@node03 ~]# beeline -u jdbc:hive2://node01:10000 -n root
Check port 10000 and find that it is not started
[ root@node01 ~]# netstat -anp|grep 10000

It takes time for hiveserver2 to start. You need to wait for a while. It will not start until hiveserver2 displays four hive session IDs (I just started four successfully).

Then I realized that no wonder the teacher mentioned that he had to wait a while to connect beeline.

This is a successful start, so don’t worry and deal with it calmly when you report an error.
					

		
		
			This entry was posted in Linux and tagged Big data, Error correction set, Hadoop, hdfs, linux on 2021-11-21 by Robins.								

	


	
				
			
						
				Error in configuring Hadoop 3.1.3: attempting to operate on yarn nodemanager as root error
			
								


				
			
This may occur when HDFS and yarn services are turned on, and when HDFS and yarn services are turned off using scripts, the 
  
 solution may also occur
Add the following parameters to the top of start-dfs.sh and stop-dfs.sh (in SBIN of Hadoop installation directory)
HDFS_DATANODE_USER=root
HADOOP_SECURE_DN_USER=hdfs
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root

Add the following parameters to the top of start-yarn.sh and stop-yarn.sh (in SBIN of Hadoop installation directory)
YARN_RESOURCEMANAGER_USER=root
HADOOP_SECURE_DN_USER=yarn
YARN_NODEMANAGER_USER=root

Successfully solved the problem 
 

					

		
		
			This entry was posted in How to Fix and tagged Cloud computing and big data, Hadoop, other, windows, Yarn on 2021-11-20 by Robins.								

	


	
				
			
						
				[Solved] Sqoop Mysqltohive error: Error: java.lang.RuntimeException: java.lang.RuntimeException…
			
								


				
			
Problem Description

Execute statementbin/sqoop import --connect jdbc:mysql://localhost:3306/gdcmxy --username root --password root --table 2019bigdata --fields-terminated-by '\t' --delete-target-dir --num-mappers 1 --hive-import --hive-database gdcmxy --hive-table 2019bigdata
Error: java.lang.RuntimeException: java.lang.RuntimeException: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure


After seeing the error report, it was preliminarily judged that the main problem was the database connection problem. At first, I thought the firewall was not turned off, but considering that hadoo had certainly turned off the firewall before, it should have nothing to do with the firewall.
Cluster environment:

Solution:
Change the localhost of the execution statement to the IP address of the local machine  bin/sqoop import -- connect JDBC: mysql://192.168.112.81:3306/gdcmxy  --Username root -- password root -- table 2019bigdata -- fields terminated by '\ t' -- delete target dir -- num mappers 1 -- hive import -- hive database gdcmxy -- hive table 2019bigdata 



successfully imported

					

		
		
			This entry was posted in How to Fix and tagged Error: java.lang.RuntimeException: java.lang.RuntimeException, Hadoop, Hive, sqoop, Sqoop Mysqltohive error on 2021-11-17 by Robins.								

	


	
				
			
						
				Namenode startup error: outofmemoryerror: Java heap space
			
								


				
			
1. Find problems
Phenomenon: restart the Hadoop cluster, and the namenode reports an error and cannot be started.
Error reported: 
2. Analyze problems
         As soon as you see the word “outofmemoryerror: Java heap space” in the error report, it should be the problem of JVM related parameters. Go to the hadoop-env.sh configuration file when. The configuration file settings are as follows:
export HADOOP_NAMENODE_OPTS="-Dhadoop.security.logger=${HADOOP_SECURITY_LOGGER:-INFO,RFAS} -Dhdfs.audit.logger=${HDFS_AUDIT_LOGGER:-INFO,NullAppender} $HADOOP_NAMENODE_OPTS" 
export HADOOP_DATANODE_OPTS="-Dhadoop.security.logger=ERROR,RFAS $HADOOP_DATANODE_OPTS"
         It can be seen from the above that the size of heap memory is not set in the parameter.
         The default heap memory size of roles (namenode, secondarynamenode, datanode) in the HDFS cluster is 1000m
3. Problem solving
         Change the parameters to the following, start the cluster again, and the start is successful.
export HADOOP_NAMENODE_OPTS="-Xms4096m -Xmx4096m -Dhadoop.security.logger=${HADOOP_SECURITY_LOGGER:-INFO,RFAS} -Dhdfs.audit.logger=${HDFS_AUDIT_LOGGER:-INFO,NullAppender} $HADOOP_NAMENODE_OPTS" 
export HADOOP_SECONDARYNAMENODE_OPTS="-Xms4096m -Xmx4096m -Dhadoop.security.logger=${HADOOP_SECURITY_LOGGER:-INFO,RFAS} -Dhdfs.audit.logger=${HDFS_AUDIT_LOGGER:-INFO,NullAppender} $HADOOP_NAMENODE_OPTS" 
export HADOOP_DATANODE_OPTS="-Xms2048M -Xmx2048M -Dhadoop.security.logger=ERROR,RFAS -Xmx4096m $HADOOP_DATANODE_OPTS"
Parameter Description:
        – Xmx4096m   Maximum heap memory available
        – Xms4096m   Initial heap memory
Reference: HDFS memory configuration – flowers are not fully opened * months are not round – blog Park

					

		
		
			This entry was posted in How to Fix and tagged Big data, Hadoop, java on 2021-11-16 by Robins.								

	


	
				
			
						
				Sqoop connection gbase data error [How to Solve]
			
								


				
			
1. When sqoop accesses the gbase database through the command, the connection error is as follows:;

Just add the — driver parameter to the command
sqoop list-tables -connect jdbc: gbase://10.100.111.48:8010/dm -Driver com.gbase.jdbc.driver – username gbase – password gbase2010531

run successfully!

					

		
		
			This entry was posted in Error and tagged Hadoop, Hive, sqoop on 2021-11-16 by Robins.								

	


	
				
			
						
				[Solved] Hive 2.3.9 Error: Error: Unrecognized column type: UNIONTYPE (state=,code=0)
			
								


				
			
Import uniontype data into test table with CSV file in hive_ After serializer, use select * from test_ An error occurred in the serializer
Error: Unrecognized column type: UNIONTYPE (state=,code=0)
Hive version: 2.3.9
After investigation, this is a bug that hive JDBC has solved in version 3.0.0  :  HIVE-17259
I want to try to solve the bug without upgrading the version. Imagine using the bug repair code of version 3.0.0 in the source code of hive JDBC of version 2.3.9, that is, make the following modifications:
1. Find the corresponding version of hive JDBC source code on the Apache website and download it
Apache Downloads

2. Find jdbccolumn.java and modify it

Add two lines of code as follows:

    } else if ("uniontype".equalsIgnoreCase(type)) {
      return Type.UNION_TYPE;
3. Package and generate a new JDBC jar file and copy it to hive server
Navigate to the JDBC directory in CMD and use the MVN package command to package
After packaging, find hive-jdbc-2.3.9-standalone.jar and hive-jdbc-2.3.9.jar in the target directory under the JDBC directory, copy them to {$hive_home}/JDBC and {$hive_home}/lib directories respectively, and restart hiverver2:

Implementation after repair:


					

		
		
			This entry was posted in Error and tagged Hadoop, Hive on 2021-11-13 by Robins.								

	

			
				Post navigation
				← Older posts
				Newer posts →